POLY-OMIC PROFILES OF AND

Item Type Dissertation

Authors Hess, Jonathan

Rights Attribution-NonCommercial-NoDerivatives 4.0 International

Download date 06/10/2021 13:48:56

Item License http://creativecommons.org/licenses/by-nc-nd/4.0/

Link to Item http://hdl.handle.net/20.500.12648/2034

POLY-OMIC PROFILES OF SCHIZOPHRENIA AND BIPOLAR DISORDER

By: Jonathan Hess Department of Neuroscience & Physiology

May 2016

Dissertation Prepared for the Degree of Doctor of Philosophy

______Approved by: Stephen J. Glatt, Ph.D. (5/25/2017)

1 TABLE OF CONTENTS

Common Abbreviations...... 9

Copyrighted Works in this Dissertation ...... 10

References to Copyrighted Works ...... 10

Abstract ...... 11

Bibliography ...... 15

Genetics of schizophrenia: Historical insights and prevailing evidence ...... 17

Abstract ...... 17

Introduction ...... 18

From Linkage Studies to Genome-Wide Association Studies ...... 20

Into Genome-Wide Association Studies of Common Variants ...... 22

Insights from Rare and De Novo Mutation Events ...... 28

Discussion ...... 33

Figure 1. GWAS Associations in Linkage Peaks ...... 36

Figure 2. GWAS Associations in CNV Loci...... 37

Bibliography ...... 38

Genetics of bipolar disorder: Clues from genome-wide studies of common and rare variants ...... 51

Abstract ...... 51

Introduction ...... 52

Candidate for BD Related to Neurotransmission, Plasticity, and Cellular

Resilience ...... 53

Evidence from GWAS and Functional Analyses ...... 55

Cav1.2 L-Type Calcium Channel ...... 56

Zinc Finger 804A ...... 58

Ankyrin G ...... 60

Tetratricopeptide Repeat and Ankyrin Repeat Containing 1 ...... 62

2 Teneurin Transmembrane 4 ...... 63

Spectrin Repeat Containing Nuclear Envelope Protein 1 ...... 63

Interferon Induced Protein 44 Like ...... 65

MAD1 Mitotic Arrest Deficient Like 1 ...... 66

Rare and De novo Mutations in BD Candidate Genes ...... 67

Discussion ...... 71

Table 1. Promising BD risk genes that emerged from large family-based candidate

studies or genome-wide association studies...... 75

Figure 1. CACNA1C regional association plot...... 76

Bibliography ...... 77

Enrichment of common risk variants for schizophrenia and bipolar disorder in pathways and regulatory elements ...... 91

Abstract ...... 91

Introduction ...... 92

Methods ...... 93

Pre-processing GWASs of SZ and BD ...... 93

Reference Data for Regulatory Elements and Population-level Genotypes ...... 94

Enrichment Tests ...... 95

Control and other Psychiatric GWASs ...... 97

Results ...... 97

Discussion ...... 101

Overview of FLEET ...... 101

Biological Relevance of Findings ...... 102

Table 1. Summary statistics for top annotations that were associated with SZ and

BD (FDRp < 0.05, total of 329)...... 108

Figure 2. Enrichment of individual histone modifications across cells and tissues

with risk variants for SZ and BD...... 110

Bibliography ...... 111

3 Transcriptome-wide mega-analyses reveal joint dysregulation of immunologic genes and transcription regulators in brain and blood in schizophrenia ...... 117

Abstract ...... 118

Introduction ...... 119

Methods ...... 121

Literature Search and Study Selection...... 121

Data Import, Normalization, Quality Control and Probe Matching ...... 122

Mixed-Effect Linear Modeling and Gene Set Analysis ...... 124

Expression Quantitative Trait Loci and GWAS Enrichment Analysis ...... 126

Constructing Networks of Co-Expressed Genes in Brain and Blood ...... 126

Evaluating Preservation of Co-Expression Modules Across Diagnostic Groups and Tissues

...... 128

Enrichment Analysis of Biological Annotations and Cell-Type Signatures ...... 129

Gene Set and Network Module Heterogeneity Analyses ...... 129

Machine-Learning Classification using Blood Transcriptomic Data ...... 130

Results ...... 130

Dysregulated Genes and Gene Sets in Brain ...... 130

Dysregulated Genes and Gene Sets in Blood ...... 131

Cross-Tissue Comparison of Dysregulated Genes and Gene Sets ...... 132

Enrichment of eQTL and GWAS Association Signals among Dysregulated Genes ...... 133

Network Co-expression Analysis identifies SZ-Associated Modules in Brain and Blood .... 134

Enrichment of GWAS Association Signal in SZ-Associated Modules ...... 135

Cross-Tissue Overlap of Genes and Functional Annotations in SZ-Associated Modules .... 135

Network Co-Expression Analysis Identifies Cross-Group and Cross-Tissue Preservation . 136

Machine-Learning Classification using Blood Transcriptome Data...... 136

Discussion ...... 137

Limitations of Gene Network Preservation Analysis ...... 137

Conclusions and Remarks ...... 139

Table 1. Schizophrenia Microarray Studies Included in the Mega-Analyses...... 146

4 Table 2. Schizophrenia Studies Excluded from Mega-Analyses.

...... 147

Table 3. Genes Significantly Dysregulated (FDRp < 0.05) in Schizophrenia across

Studies of Postmortem Brain Tissue ...... 148

Table 4. Gene Sets Significantly Dysregulated at a Bonferroni p < 0.05 Based on

Permutations of Single-Gene Test Statistics from the Brain Mega-Analysis...... 151

Table 5. Relative proportion of 17 circulating immune cells in SZ cases and unaffected comparison subjects estimated from expression levels of cell-specific genes...... 182

Table 6. Genes Significantly Dysregulated (Bonferroni p < 0.05) in Schizophrenia across Studies of Blood Tissue (k = 220)...... 183

Table 7. Gene Sets Significantly Dysregulated at a Bonferroni p < 0.05 Based on

Permutations of Single-Gene Test Statistics from the Blood Mega-Analysis...... 191

Table 8. Permutation Tests of Enrichment for eQTLs among Differentially

Expressed Genes Defined at a Relatively Conservative Threshold from SZ Brain and Blood Mega-Analyses...... 213

Table 9. Dysregulated Genes from Mega-Analyses (FDRp < 0.1) Overlapping with

Genes Proximal to the 108 Genome-Wide Significant and Linkage Disequilibrium-

Independent Loci...... 214

Table 10. SZ-Associated Blood Network Modules Over-Representation of Immune

Cell-Type Signatures...... 215

Table 11. Overlap of Genes Among SZ-Associated Co-Expression Network Modules

Identified in Brain and Blood...... 216

Table 12. Performance of Machine-Learning Classifiers Using Blood-Based Gene

Expression Data...... 217

Figure 1. Effect of Normalization on Brain Tissue Gene Expression Values...... 218

Figure 2. Cross-Tissue Comparison of Significantly Dysregulated Genes (FDR p

<0.05)...... 219

5 Figure 3. Cross-Tissue Comparison of Significantly Dysregulated Gene Sets

(Bonferroni p <0.05)...... 220

Figure 4. Comparison of gene-level differential expression statistics across brain

and blood...... 222

Figure 5. Co-expression Modules Nominally Associated with SZ in Brain (p <

0.05)...... 224

Figure 6. A Co-expression Module that was Significantly Associated with SZ in

Blood (FDRp < 0.05)...... 226

Figure 7. Enrichment of risk genes from GWAS in modules that are differentially

expressed in SZ cases...... 228

Figure 8. Preservation of gene co-expression networks across tissues and

diagnostic groups...... 230

Bibliography ...... 232

Transcriptomic abnormalities in bipolar disorder and discrimination of the major psychoses ...... 242

Abstract ...... 242

Introduction ...... 243

Methods and Materials...... 245

Study Design ...... 245

Description of Included Microarray Studies, Data Import, and Quality Control ...... 246

Linear Mixed Models using Clinical Covariates (Stage 1) ...... 248

Systematic Comparison of Differential Expression Statistics across Regression Approaches

...... 249

Permutation-Based Gene Set Enrichment ...... 250

Gene Co-expression Networks and Preservation across BD Cases and Unaffected

Comparisons ...... 251

Testing for Enrichment of GWAS Signals across Differentially Expressed Genes in BD .... 252

Cross-Referencing Results with Previous Mega-Analyses of Schizophrenia ...... 253

6 Classifier of BD and unaffected comparison subjects ...... 254

Classifier of SZ and BD ...... 256

Results ...... 257

Linear Mixed Models Identify Genes Associated with Bipolar Disorder ...... 257

Gene Set Enrichment Analysis ...... 258

Gene Co-expression Networks ...... 259

Differentially Expressed Genes in BD with GWAS Associations...... 260

Genes Concordantly and Discordantly Dysregulated across SZ and BD ...... 261

Blood-based Microarray Classifiers of BD, SZ, and Unaffected Comparison Subjects ...... 263

Discussion ...... 264

Table 1. Demographics of the six microarray studies that were included in the mega-analysis...... 268

Table 2. Estimated proportion of peripherally circulating immune cells in bipolar disorder cases and unaffected subjects...... 269

Table 3. Top most differentially expressed genes (n = 41) in bipolar disorder based on Stage 1 meta-analysis (coverage = 14,942 genes; FDRp < 0.1)...... 270

Table 4. Comparison of differential expression estimates across statistical models...... 272

Table 5. Permutation-based gene set enrichment analysis implicated 230 biologically-annotated genes in bipolar disorder...... 273

Table 6. Gene Sets and Biological Annotations Significantly Enriched in “Grey60”

Co-expression Module Associated with Bipolar Disorder...... 281

Table 7. Overlap between transcriptome- and genome-wide association evidence linking immune, histone, and synaptic genes with SZ and BD...... 282

Table 8. Potential of blood-based RNA expression profiles for disorder-specific classifiers of the major psychoses...... 283

Figure 1. Box-and-whisker plots of gene expression distribution across 158 samples included in the mega-analysis...... 284

7 Figure 2. Normalization of microarray intensity files per study to adjust for

technical sources of within- and between- study variation...... 285

Figure 3. Forest plots comparing the estimates of differential expression between

bipolar disorder cases and unaffected comparison subjects...... 287

Figure 4. Concordance of differentially expressed genes in BD and SZ from brain

transcriptome-wide mega-analyses (present study and Seifuddin et al. 2013). .. 288

Figure 5. Discordance of differentially expressed genes in BD and SZ from blood-

based transcriptome-wide mega-analyses (present study and Hess et al. 2016). 290

Figure 6. Gene co-expression network associated with BD...... 292

Figure 7. Machine learning classifiers of the major psychoses...... 294

Bibliography ...... 295

Discussion and final remarks ...... 304

Highlights and Main Findings ...... 304

Towards Poly-Omic Predictors of SZ and BD ...... 306

Person-centered Approaches in the Future of Psychiatry ...... 311

Final Remarks ...... 312

Bibliography ...... 315

8 COMMON ABBREVIATIONS

Abbreviation Meaning

AUC Area under the curve BH Benjamini-Hochberg procedure BD Bipolar Disorder CNS Central nervous system CNV Copy number variant eQTL Expression quantitative trait FDR False discovery rate GSEA Gene set enrichment analysis GWAS Genome-wide association study GWLS Genome-wide linkage study MHC Major histocompatibility complex MAF Minor allele frequency NB Naïve Bayes PFC Prefrontal cortex PGC Psychiatric Genomics Consortium RF Random Forest ROC Receiver operating characteristic SZ Schizophrenia SNP Single nucleotide polymorphism SVM Support Vector Machine TWAS Transcriptome-wide association study WGCNA Weighted gene co-expression analysis

9 COPYRIGHTED WORKS IN THIS DISSERTATION

I obtained permissions to reuse copyrighted materials from Elsevier Inc. for two of my peer-reviewed publications in this dissertation. The first work is a book chapter that I co-first authored with Dr. Joyce van de Leemput titled “Genetics of Schizophrenia:

Historical Insights and Prevailing Evidence” published in Advances in Genetics. I am only reusing materials from this book chapter that I primarily wrote that are germane to this dissertation, and features a newly written Abstract and revised Discussion. The second work is a first author original research article titled “Transcriptome-wide mega- analyses reveal joint dysregulation of immunologic genes and transcription regulators in brain and blood in schizophrenia” published in Schizophrenia Research.

REFERENCES TO COPYRIGHTED WORKS

Hess, J. L., Tylee, D. S., Barve, R., de Jong, S., Ophoff, R. A., Kumarasinghe, N., …

Glatt, S. J. (2016). Transcriptome-wide mega-analyses reveal joint dysregulation of

immunologic genes and transcription regulators in brain and blood in schizophrenia.

Schizophrenia Research, 176(2–3), 114–124.

http://doi.org/10.1016/j.schres.2016.07.006 van de Leemput, J., Hess, J. L., Glatt, S. J., & Tsuang, M. T. (2016). Genetics of

Schizophrenia: Historical Insights and Prevailing Evidence. Advances in Genetics,

96, 99–141. http://doi.org/10.1016/bs.adgen.2016.08.001

10 ABSTRACT

In 1899, psychiatrist Emil Krapaelin introduced a separation between schizophrenia (SZ) and bipolar disorder (BD), formerly “dementia praecox” and “manic- depressive disorder”, which came to be known as the Krapaelinian dichotomy, and has prevailed over the past century (Kraepelin, 1904). Although Emil Krapaelin postulated that these are distinct entities, multiple converging lines of evidence suggest that SZ and

BD have a shared etiology: (1) first-degree relatives of a SZ-affected individual are at higher risk for BD than the general population, and vice versa (Lichtenstein et al., 2009),

(2) recent work from genome-wide association studies (GWAS) and rare variant studies revealed that SZ and BD share common risk genes, suggesting that these disorders share a set of molecular substrates, and (3) second-generation exhibit effectiveness in ameliorating psychosis and mania (Buckley, 2008).

SZ and BD are highly heritable mental illnesses with a lifetime prevalence near

1%. Onset typically occurs in late adolescence to early adulthood. Their etiology is complex and multi-factorial. SZ and BD are among the leading causes of disability around the globe (Global Burden of Disease Study 2013 Collaborators, 2015). There is a constellation of symptoms related to SZ, including hallucinations (e.g., auditory, olfactory, visual), delusions (e.g., persecutions, grandiosity), thought disturbances, affective flattening, and anhedonia. SZ and BD have clinical resemblances like psychosis, though this is more widely recognized as a hallmark of SZ. The core feature of BD is extreme changes in mood ranging from periods of mania followed by severe depression, which is also referred to as “switching”. Drugs for treating SZ and BD have changed very

11 little over the past 50 years, and those that are used today are not always effective and can elicit severe side effects.

SZ and BD research is evolving rapidly but our understanding of these disorders is still in its infancy. One of the major advances in the field has been the “big data” revolution. Technological advances have been a critical driving force of this revolution, including emergence of DNA microarray chips for high-throughput genome-wide genotyping and gene expression profiling. These technologies became widely adopted in psychiatry and led to a proliferation of genome- and transcriptome- wide studies in psychiatry to aid in the discovery of novel genes and pathways related to mental illness.

Despite SZ and BD having a strong genetic basis, identifying susceptibility genes was a significant challenge. Combining data across laboratories became a fundamental strategy to overcome inherent weaknesses with statistical power and methodological biases, which has proven be to a fruitful strategy for GWAS (Cross Disorder Group of the

Psychiatric Genomics Consortium, 2013; Ripke et al., 2014; Ruderfer et al., 2013; Sklar et al., 2011). Yet, a robust methodological and statistical framework for analyzing combined collections of gene expression data has been lacking in psychiatry. Microarray studies of SZ and BD suffered from low statistical power and drawbacks that affected their reproducibility (Draghici, Khatri, Eklund, & Szallasi, 2006; Evans, Watson, & Akil,

2003). Combining gene expression data from numerous sources and addressing methodological issues may help to uncover reliable molecular associations. Even though the relevance of gene expression to physiology is not always clear, gene expression abnormalities in mental illness can provide fundamental insight into gene regulatory networks in brain and peripheral tissues, and provide a framework for interpreting

12 genomics data. Integrating findings between GWAS and gene expression studies has the potential to elucidate the etiological overlap of SZ and BD. Moreover, gene expression signatures of mental illnesses may have biomarker utility and set up a foundation for identifying better drug targets. Data sharing is now a commonplace. Although microarrays are gradually being replaced by more sensitive and precise technologies such as next-generation sequencing, data harmonizing will be a pervasive issue unless dealt with now.

In this dissertation, I present two review papers describing the current state of SZ and BD genetics research followed by three primary research studies that I performed to answer these prevailing questions: (1) what are the genes, pathways, and regulatory elements that relate to risk for SZ and BD, and are these similar or different across disorders? (2) what genes and pathways are abnormally expressed in SZ and BD, and might these differences converge with genomic evidence? (3) can differences between SZ and BD reflected in gene expression profiles offer biomarker utility and a basis for developing disorder-specific classifiers? My primary hypothesis for this work is SZ and

BD exhibit overlapping abnormalities across pathways related to neurodevelopment, neurotransmission, and immunity/cellular response to stressors; furthermore, these abnormalities are relevant for pathophysiology. My dissertation work encapsulates the development of methodologies and computational tools to analyze large “poly-omics” data sets, i.e., jointly analyzing genomic, epigenomic, and transcriptomic data to identify abnormalities gene expression regulation and molecular substrates that are common between and unique to SZ and BD. My work uncovered convergent evidence of dysregulation among genes, pathways, and regulatory molecules associated with SZ and

13 BD. Major outcomes of this thesis may help to lay the groundwork for causal inference of the effect of genetic variants on cellular phenotypes, biological sub-typing of mental illness through gene expression profiling, and rational drug design.

14 BIBLIOGRAPHY

Buckley, P. F. (2008). Update on the treatment and management of schizophrenia and

bipolar disorder. CNS Spectrums, 13(2 Suppl 1), 1-10–2. Retrieved from

http://www.ncbi.nlm.nih.gov/pubmed/18227747

Cross Disorder Group of the Psychiatric Genomics Consortium. (2013). Identification of

risk loci with shared effects on five major psychiatric disorders: a genome-wide

analysis. Lancet, 381(9875), 1371–9. http://doi.org/10.1016/S0140-6736(12)62129-

1

Draghici, S., Khatri, P., Eklund, A. C., & Szallasi, Z. (2006). Reliability and

reproducibility issues in DNA microarray measurements. Trends in Genetics : TIG,

22(2), 101–9. http://doi.org/10.1016/j.tig.2005.12.005

Evans, S. J., Watson, S. J., & Akil, H. (2003). Evaluation of Sensitivity, Performance and

Reproducibility of Microarray Technology in Neuronal Tissue. Integrative and

Comparative Biology, 43(6), 780–785. http://doi.org/10.1093/icb/43.6.780

Global Burden of Disease Study 2013 Collaborators, G. B. of D. S. 2013. (2015). Global,

regional, and national incidence, prevalence, and years lived with disability for 301

acute and chronic diseases and injuries in 188 countries, 1990-2013: a systematic

analysis for the Global Burden of Disease Study 2013. Lancet (London, England),

386(9995), 743–800. http://doi.org/10.1016/S0140-6736(15)60692-4

Kraepelin, E. (1904). Psychiatrie: ein Lehrbuch für Studirende und Aerzte. In

Psychiatrie: ein Lehrbuch für Studirende und Aerzte (pp. 815–841).

Lichtenstein, P., Yip, B. H., Björk, C., Pawitan, Y., Cannon, T. D., Sullivan, P. F., &

Hultman, C. M. (2009). Common genetic determinants of schizophrenia and bipolar

15 disorder in Swedish families: a population-based study. Lancet, 373(9659), 234–

239. http://doi.org/10.1016/S0140-6736(09)60072-6

Ripke, S., Neale, B. M., Corvin, A., Walters, J. T. R., Farh, K.-H., Holmans, P. A., …

O’Donovan, M. C. (2014). Biological insights from 108 schizophrenia-associated

genetic loci. Nature, 511(7510), 421–427. http://doi.org/10.1038/nature13595

Ruderfer, D. M., Fanous, A. H., Ripke, S., McQuillin, A., Amdur, R. L., Gejman, P. V,

… Kendler, K. S. (2013). Polygenic dissection of diagnosis and clinical dimensions

of bipolar disorder and schizophrenia. Molecular Psychiatry.

http://doi.org/10.1038/mp.2013.138

Sklar, P., Ripke, S., Scott, L. J., Andreassen, O. A., Cichon, S., Craddock, N., …

Psychiatric GWAS Consortium Bipolar Disorder Working Group. (2011). Large-

scale genome-wide association analysis of bipolar disorder identifies a new

susceptibility locus near ODZ4. Nat Genet, 43(10), 977–983. http://doi.org/ng.943

[pii] 10.1038/ng.943

16 GENETICS OF SCHIZOPHRENIA: HISTORICAL INSIGHTS AND PREVAILING EVIDENCE

Authors: Joyce van de Leemput1*, Jonathan L. Hess2*, Stephen J. Glatt2, and Ming T. Tsuang1

1 University of California, San Diego, La Jolla, CA, United States 2 SUNY Upstate Medical University, Syracuse, NY, United States * These authors contributed equally to this work.

Notes: Authors are listed in the same order as appeared in the book chapter (Advances in Genetics, Volume 96).

ABSTRACT

There is a long history of epidemiological and genetic studies of schizophrenia

(SZ) that has provided key insights into the etiology of this disorder. SZ is a debilitating mental illness with a lifetime risk of ~1% in the general population and well-documented heritability of ~60 – 80%, and can arise through familial transmission (i.e., affected individuals arising in families with a history of SZ) or in sporadic cases (i.e., individuals diagnosed without any known family history of SZ). The past two decades of genomics research has fundamentally influenced our knowledge on SZ etiology from the result of many unexpected risk genes emerging from unbiased genome-wide association studies

(GWAS). Early conceptualizations once viewed SZ a manifestation of discrete neurochemical imbalances, but this since evolved into a perspective that SZ is a multifaceted disorder involving interactions between hundreds to thousands of genes, molecules, and brain circuits with interplay between neurodevelopmental processes, plasticity, neurotransmission, and the immune system. The collection and interpretation of GWAS signals is providing fertile ground for future research into pathophysiological mechanisms underlying SZ. Taken together, this work has clinical relevance in the sense

17 of being able to better classify SZ based on biological and genetic substrates compared to symptom checklists, and is paving the way toward identification of rationally designed drug treatments.

INTRODUCTION

The etiology of SZ was subject to conjecture during the pre-genomic era. Evidence from early family and twin studies provided critical insight into the heritability and familial transmission of SZ (Shields & Gottesman, 1972; Tsuang, 2000). However, molecular genetics was limited at the time as was the understanding of SZ etiology. For almost 50 years, D2 receptor sub-types have been intensely studied for their contributions to SZ risk after the discovery of antipsychotics and tracing their pharmacological effects in the brain. These drugs were found to alter levels of in the synapses, thus ameliorating positive symptoms of SZ (i.e., delusions, hallucinations, thought disturbances), which helped to advocate the popular “dopamine hypothesis” of SZ.

Attention was shifted onto other candidate genes, some of which were found to have emerging relationships with dopamine synthesis, release, and signaling, namely:

Monoamine Oxidase A (MAOA), Catechol-O-Methyltransferase (COMT), Solute Carrier

Family 6 Member 3 (SLC6A3), , Regulatory Subunit 1B

(PPP1R1B or DARPP32), Dystrobrevin Binding Protein 1 (DTNBP1), and Neuregulin 1

(NRG1), among many others.

Genes involved in neurodevelopmental processes, glutamate regulation, and inflammation underpinned other prevailing hypotheses in SZ and rationalization of many candidate genes. Unraveling the etiology of SZ one gene at a time proved to be an arduous and expensive task. Meta-analyses of candidate gene studies were a step forward

18 to identifying reliable risk factors for SZ, but was still a laborious process. Decades ago, wagering on the success of genetic hits that came from candidate gene studies was problematic due to low rates of replication. The underbelly of the problem was that earlier studies were heavily underpowered, limiting their ability to separate true signals from noise. Studies in later years proved that sample sizes in the hundreds of thousands are required to uncover the small effects of common variants associated with SZ, and to have sufficient statistical power for those associations to reach a level of significance that investigators would believe to be genuine.

Technological advancements, including integrative maps of the and variation, allowed for cost-effective means of studying complex traits through genome- wide assays. With this revolution came a major breakthrough in SZ genetics and a new set of clues helping to decipher SZ etiology. Genome-wide linkage studies tapered off in the post-genomic era, which gave rise to genome-wide association studies screening millions of common variants and high-throughput sequencing studies hunting for highly pathogenic rare variants. Heritable risk for SZ is now known to be molded by genes, often indexed by single nucleotide polymorphisms (SNPs) residing along the minor allele frequency (MAF) spectrum, including: common (MAF > 5%), low-frequency (MAF between 0.5 – 5 %), and rare variants (MAF < 0.5 %) according to rates in the general population (Panoutsopoulou, Tachmazidou, & Zeggini, 2013). An increased burden of chromosomal duplications or deletions called copy number variants (CNVs) has also been demonstrated in SZ, which has helped expand our understanding of pathogenic factors in SZ. Exome-sequencing and whole-genome SNP genotyping have proved highly effective means of identifying susceptibility genes or genomic regions associated

19 with SZ. The near future may bring about cost-effective routes for whole-genome sequencing, which would promote a deeper annotation of genes and mutations along with clearer portrait of the SZ genetic architecture.

FROM LINKAGE STUDIES TO GENOME-WIDE ASSOCIATION STUDIES

Elucidating the genetic architecture of SZ has been an arduous task that was met with limited success. SZ genetics saw a boom during the 1990s with a rapid expansion of genome-wide linkage studies (GWLSs). GWLSs are based on the interrogation of pedigrees and creation of linkage maps spanning the genome, with the goal of identifying significant co-segregation relationships between alleles and phenotypes. Typically, several hundred microsatellite markers spaced at ~10 centimorgan intervals are evaluated in GWLSs. Microsatellites constitute short repeating DNA motifs, which vary between individuals and populations, and are valuable to study due to their historically high rate of mutation and heterozygosity.

Genome-wide scans of SZ were proceeding at a steady pace throughout the 1990s and into the early 2000s, however their viability was brought into question as studies were yielding conflicting results. Reliable genome-wide significant findings were scant and exhibited low success in replication. How were these shortcomings explained? A possibility was that linkage studies were not sensitive to detecting small effects of common variants if spread diffusely through the genome. Issues of statistical power were also prevalent in the literature. Meta-analysis, or the combining of published results and statistical analysis of pooled estimates, was not a new concept in molecular genetics, but techniques were presented that made possible the combination of genome-wide linkage

20 scans that circumvent the need for raw genotyping data and combined linkage maps

(Badner & Gershon, 2002b; Badner & Gershon, 2002; Levinson, Levinson, Segurado, &

Lewis, 2003; Lewis et al., 2003; Seguardo Detera-Wadleigh, Levinson, et al., 2003).

Meta-analysis also enabled combination of association signals across multiple disorders for identification of cross-disorder genetic risk factors.

A meta-analysis of 18 genome-wide linkage scans predominantly of European ancestry (681 pedigrees and approximate n = 1930 cases) for SZ found strongest evidence of linkage on 8p, 13q, and 22q (Badner & Gershon, 2002a).

Linkage data were combined across pedigrees to test for susceptibility regions jointly associated with SZ and bipolar disorder. 22q ranked highest in association with SZ at a significance level of P < 9.0e-4 after replication, and showed a prominent association in a combined analysis of SZ and bipolar disorder studies (P < 2.0E-5 from replication) followed by chromosome 13q (P < 4.0E-04 from replication) and 7q (P < .02 from replication). This study provided supportive evidence of common susceptibility loci for SZ and bipolar disorder, which suggests that these disorders have shared etiological roots. The investigation of cross-disorder genetic risk factors remains of central interest in psychiatric genetics, and has grown well beyond the boundaries of SZ and bipolar disorder (Bulik-Sullivan et al., 2015; Cross Disorder Group of the Psychiatric Genomics

Consortium, 2013). A separate two-stage meta-analysis combined results from 32 genomic scans (3,255 pedigrees and n = 7,413 cases), which validated the associations located in 8p and 13q observed by Badner and Gershon, and provided supportive evidence for additional chromosomes: 1 (multiple regions), 2q, 3q, 4q, 5q, and 10q (Ng et al., 2009). However, chromosome 22 showed a relatively weak set of associations, none

21 of which trended to genome-wide significance. A companion paper to Ng et al. comprising overlapping authors reported a high-density linkage analysis of approximately 6000 SNP markers genotyped in 8 studies with a mixture of European,

African American, and other ancestries (total 807 pedigrees, n = 1900 cases) (Holmans et al., 2009). This separate study reported suggestive evidence of linkage at chromosome 8q and 10q with SZ, but did not find strong evidence of linkage at 22q even after adjusting model parameters for study-wise heterogeneity. In fact, the authors identified significant heterogeneity in the absence of linkage in close proximity to the region associated with

22q11.2 deletion syndrome.

INTO GENOME-WIDE ASSOCIATION STUDIES OF COMMON VARIANTS Meta-analysis helped to address critical shortcomings of genome-wide scans, but it did not mend all of its major pitfalls. Linkage evidence remained difficult to interpret without the support of fine-mapping or functional studies to locate the true risk- conferring gene(s) and variant(s). This pitfall also applies to GWAS, however, it is markedly easier to interpret association signals from GWAS due to its precision over discrete regions of genes. The explosive rise of GWAS paralleled those of efforts for deeper genome annotations including the mapping of non-coding functional elements of the human genome (Kavanagh, Dwyer, O’Donovan, & Owen, 2013; Kent et al., 2002;

Lonsdale et al., 2013; Stamatoyannopoulos, 2012). GWAS has been woven into the fabric of complex trait genetics and medicine. GWAS technologies enable genotyping of hundreds of thousands of common variants, with some coverage of copy number variants.

Imputation of genotypes using reference panels (i.e., HapMap, 1000 Genomes) enables the combination of data across multiple studies in spite of disparate genotyping

22 platforms. Advancements in GWAS data pre-processing methods thereby boost power of

GWASs through combined-sample analyses and unlocking information for millions of un-typed variants.

GWASs of SZ have flourished in most recent years, vastly outperforming GWLS in terms of statistical power and achievements. However, GWAS did not garner success overnight. Several attempts were made to identify susceptibility genes for SZ via genome-wide analyses (Kirov et al., 2009; Lavedan et al., 2009; Lencz et al., 2007; Mah et al., 2006; Moskvina et al., 2009; O’Donovan et al., 2008; Potkin et al., 2009; Potkin et al., 2009; Shi et al., 2009; Shifman et al., 2008; Stefansson et al., 2009; Sullivan et al.,

2008). Victories started to trickle down from these efforts, including the identification of novel candidate genes such as ZNF804A, which increased risk jointly for SZ and bipolar disorder. ZNF804A is located in chromosome 2q32, which was shown to harbor a peak linkage signal in association with SZ (Ng et al., 2009). ZNF804A is expressed throughout the brain though it has been shown to preferentially localize to pyramidal cells (Tao et al.,

2014), and may play a role in neurodevelopment (Chang et al., 2015; Schultz et al.,

2013). We provided an in-depth review of ZNF804A genetics and neurobiology in other articles (Hess & Glatt, 2014; Hess, Quinn, Akbarian, & Glatt, 2015). The identification of novel candidates emphasizes the potential of GWAS technology and aids in the emergence of new theories, and resurrection of old suspicions, in relation to SZ pathophysiology. For example, the identification of a strong association peak in the MHC region of chromosome 6q21-6q22 rekindled curiosity in the immune system and its potential involvement in the pathophysiology of SZ (Ganguli, Brar, & Rabin, 1994;

Jones, Mowry, Pender, & Greer, 2005). Focus was also drawn onto neurodevelopmental,

23 calcium channel, and glutamatergic signaling pathway genes and their involvement in SZ pathophysiology vis-à-vis association signals residing in NRGN, TCF4, CACNA1C, and

ANK3 (Ripke et al., 2011; Stefansson et al., 2009). Excitement was building around pathways involved in the regulation of gene expression by small non-coding RNA, based on association of MIR137 and interactions of its gene product (miR-137) and major susceptibility genes for SZ, including ZNF804A and CACNA1C (Collins et al., 2014;

Guan et al., 2013; Kim et al., 2012; Ripke et al., 2011).

The establishment of multi-site collaborative GWAS projects was a momentous shift for psychiatry, and can be credited to efforts spearheaded by the International SZ

Consortium (S M Purcell et al., 2009), which served as an precursor for larger collaborations. In 2009, The Psychiatric Genomics Consortium (PGC, formerly

Psychiatric GWAS Consortium) presented a practical framework for conducting powerful GWAS by combining labor and resources across laboratories worldwide to identify additional susceptibility genes for SZ, depression, bipolar disorder, autism, and

ADHD (Psychiatric GWAS Consortium Steering Committee, 2009). The Consortium has now taken firm hold and rapidly evolved to meet the demands of exponentially larger

GWAS, enabling the group to expand its focus into other psychiatric disorders: anorexia nervosa, obsessive-compulsive disorder, Tourette’s syndrome, substance use disorders, and post-traumatic stress disorder. Collaborative GWAS of SZ has yielded definitive susceptibility loci, though the search continues for causal risk variants. Though genomic linkage studies have all but been replaced by GWAS and high-through resequencing, we can look back to that past to reevaluate candidate genes and see which (if any) gene(s) stood the test of time. Toward this end, we cross-referenced chromosomal loci associated

24 with SZ from the GWLS by Holmans et al. (high resolution of linkage scan) with evidence from a the largest GWAS meta-analysis of SZ (34,241 cases and 45,604 controls and 1,235 parent-offspring trios) to date (Ripke et al., 2014) in effort to show how both sets of findings stack up to each other. We assigned GWAS scores to genes

(maximum association peak) based on a 20 kb window of SNPs around genes. Gene-wide associations within the loci 8q21, 8q24.1, and 10q12 are presented in Figure 1. Two genes, CACNB2 (calcium voltage-gated channel auxiliary subunit beta 2) and MMP16

(matrix metallopeptidase 16), reached genome-wide significance (p < 5.0E-8) in association with SZ. Evidence for NSUN6 (NOP2/Sun RNA methyltransferase family member 6) was virtually genome-wide significant (p < 5.09E-8). None of these genes was a longstanding candidate for SZ or previously thought to drive signals from linkage scans, nor were of major focus of functional studies. This was the case for practically all new genome-wide significant regions association with SZ uncovered by Ripke et al. An exception to this is DRD2 returning to the spotlight, one of the earliest candidate genes for SZ, after many failed attempts at finding a genome-wide significant peak in this risk gene. A follow-up meta-analysis from our lab also provided statistical evidence of DRD2 association with SZ in Han Chinese samples (Cohen et al., 2015).

Thus far, 108 independent loci are associated with SZ (Ripke et al., 2014), 83 of which were reported as novel associations compared to the PGC’s report from 2013

(Ripke et al., 2013). Uncovering novel candidate genes was one of many achievements of that paper. A compelling finding from this paper was an enrichment of GWAS risk variants in tissue type enhancers intimately related with immunity (CD19 and CD20 cells from B-lymphocyte lineage). These enrichment signals were independent of association

25 signals observed in the MHC extended region, which could not be localized to single gene due to high amounts of linkage disequilibrium. This suggests that genetic perturbation of immune pathways underlies the etiology of SZ, at least in some cases.

Studies of immune markers in the periphery and brain have provided additional support to the inflammatory hypothesis of SZ (Bergon et al., 2015; Fillman, Sinclair, Fung,

Webster, & Shannon Weickert, 2014; Gardiner et al., 2013; Mistry, Gillis, & Pavlidis,

2013; J. Xu et al., 2012). Neurodevelopmental and immune-related perturbations are perhaps closely related in SZ etiology, which is a viewpoint embedded in several theoretical models that describe the role of inflammatory signaling for the regulation of neurotransmission and synaptic remodeling and evidenced by mechanistic studies (Jones et al., 2005; Müller & Schwarz, 2007; Sekar et al., 2016).

Another major achievement coming out of recent SZ GWAS is the application of genome-wide risk scoring. This approach capitalizes on the additive effects of independent variants associated with SZ. This method has already shown great promise and helped to optimize the construction of models to predict complex traits. Although genetic risk scores explain a significant proportion of variability in SZ, their predictive power is insufficient for use as diagnostic classifiers due to low sensitivity and specificity. Genetic risk scoring is seen as a valuable research technology and is being widely used for studying the relationship of cumulative genetic risk with many traits associated with SZ, including behavioral, cognitive, and cytoarchitectural abnormalities

(Agerbo et al., 2015; French et al., 2015; Oertel-Knöchel et al., 2015; Power et al., 2015).

In typical applications of genetic risk scores, only the linear additive sum of risk alleles, with optional weighting of allelic effect sizes, is pursued. The additive effects of all

26 common variants explains upwards of 7% of phenotypic variance on the liability scale

(Lee, Goddard, Wray, & Visscher, 2012) for SZ, about half of which is accounted for by the 108 genome-wide significant loci (Ripke et al., 2014). Overall heritability of SZ is estimated between 70 – 80% from family and twin studies, thus there is a large chunk of residual heritability that is unaccounted for by common variants (dubbed “missing heritability”). Investigating rare variants has relevance to missing heritability issue. It is plausible that a significant fraction of missing heritability can be explained by rare variants, though conditional analyses would be needed to distinguish effects of common variants from nearby rare variants. Re-sequencing of candidate loci is one of the best- assured methods for identifying rare variants that are missed by GWAS technologies.

Merging of the two technologies for reevaluation of candidate loci with conditional analyses will become an important next step in psychiatric genetics. Chip-based technologies have made their way to market for cost-effective genotyping of common and rare variants, thus may be an optimal means for performing genome-wide conditional analyses.

Aside from the relevance of rare variants is the need to focus attention on non- linear effects of variants (i.e., epistasis). Notably, the PGC assessed non-linear interactions between top most associated independent SNPs, but failed to detect a significant association. However, insights from secondary analyses suggest that non- linear interactions of SNPs contribute to SZ. One study reported improvements in explanatory power of polygenic models after accounting for SNP-SNP interactions in the

ZNF804A pathway identified by a machine learning algorithm (Nicodemus et al., 2014).

This finding is particularly striking considering ZNF804A represents a top SZ risk gene

27 from previous GWAS (O’Donovan et al., 2008; Ripke et al., 2014), and one pair of SNP-

SNP interactions identified in the STAC gene are located in a genome-wide significant locus reported by Ripke et al. (Ripke et al., 2014).

Advancements in statistical approaches have been instrumental in studying the genetic architecture of SZ. The value of GWAS comes down to unbiased screening of the genome for identification of putative risk-conferring genes/loci, as well as molecular- genetic interplay that converges on biological pathways. Computational and functional studies have converged on transcription dysregulation as a core pathway involved in SZ pathophysiology. Enrichment of SZ GWAS signals in regulatory DNA regions, including regions that manage the three-dimensional loop structure of DNA via epigenetic modifications, have been reported (Roussos et al., 2014). Additional supporting evidence has come from studies of SNP-based heritability via stratified analyses of regulatory

DNA motifs (i.e., DHS regions, enhancers, repressors, regions enriched with histone modifications, introns, transcription start site, 3’ and 5’ untranslated regions) (Finucane et al., 2015). Altogether, the literature points to a set of functionally relevant risk SNPs that may exert consequences on transcription regulation. The insights described above have all been largely gleaned from primary or secondary analyses of common variants, but there are other opportunities to find convergent gene-to-phenotype relationships from studies of rare, de novo, and copy number variants.

INSIGHTS FROM RARE AND DE NOVO MUTATION EVENTS

The study of rare and de novo mutations presents opportunities for identifying highly penetrant variants associated with SZ. Moreover, uncovering pedigrees with rare

28 variants linked with SZ offers a unique chance to investigate the interplay between genes and environment for shaping one’s risk for the disorder. In this section, we review epidemiological evidence and findings from high-throughput sequencing studies that describe the burden and role of rare and de novo mutations (point, small insertions/deletions, and CNVs) in SZ.

The “common disease – rare variant” hypothesis of SZ was voiced as an alternative to the notion that SZ risk is largely due to cumulative effects of common variants exhibited in populations engendered in the “common disease – common variant” hypothesis (McClellan, Susser, & King, 2007). The rare variant hypothesis rests on three chief epidemiological pillars for SZ: (1) SZ occurs in families, wherein closely related individuals are at disproportionately high risk for developing the disorder, (2) SZ is linked with advanced paternal age, with suspicion being that DNA repair mechanisms in aging sperm degrade leading to transmission of deleterious variants to offspring, and (3)

SZ relates to decreased fertility, suggesting that deleterious variants face negative selection pressures to prevent disease-causing alleles from lingering in the population.

However, rare and common variants might be interwoven constructs at the molecular level. Evidence suggests that common and rare variants converge on the same biological pathways to modify SZ risk (Ambalavanan et al., 2015; Gulsuner et al., 2013; Shaun M

Purcell et al., 2014). The cumulative power of multiple common variants has been emphasized over numerous studies, thus contributions of common variants may be equally relevant to SZ in comparison to rare variants. It is problematic to conclude this early on whether a particular class of risk factors supersedes others in mediating disease.

29 We refer readers to a separate review article that evaluates potential genetic models for complex traits, which offers insights into this issue (Gibson, 2012).

As alluded to above, epidemiological studies of SZ have highlighted the relevance of parental age as a modifier of SZ risk. A large population-based study that followed psychiatric records for all registered citizens of Denmark between 1955 and 2006 identified that offspring born of older fathers (ages > 45 years) were at increased risk for developing SZ (McGrath et al., 2014). The authors of this study reported a similar relationship with advanced parental age and risk for personality disorders, pervasive developmental disorders, and neurotic/stress-related disorders; however, no significant association between parental age and bipolar disorder risk was detected. Another relationship that was found by McGrath et al. was an increased risk for SZ for offspring born of teenaged parents. The authors speculated that early parenthood might act as a mediator of parental educational attainment and socioeconomic status, which are also risk factors for SZ (Byrne, Agerbo, Eaton, & Mortensen, 2004). Are there genetic factors that could explain parental age-effects on SZ risk? Variability in de novo mutations across parental ages has been explored through a sequencing study of Icelandic families as part of the deCode Genetics project (Kong et al., 2012), which demonstrated a linear increase in the number of de novo mutations found in offspring with father’s age. The authors suspected that an alternative relationship between mutation rate and father’s age may exist, wherein mutation burden may increase exponentially as father’s age increases, but require additional data from older parents to validate that suspicion (Kong et al.,

2012).

30 A higher frequency of de novo mutations has been shown in SZ cases relative to unaffected comparison subjects, with an even larger burden present across early-onset cases of SZ, which takes into account severe mutations that may only be present in one or a few individuals (Walsh et al., 2008). Copy number variants in SZ may have high clinical relevance due to their strong pathogenicity. A study examined frequencies and penetrance of CNVs (37 deletions, 32 reciprocal duplications) previously implicated in

SZ, and demonstrated that many of the examined CNVs occur at higher rates in neurodevelopmental disorders compared to SZ (George Kirov et al., 2014). The authors also demonstrated that SZ-associated CNVs (8 deletions, 5 duplications) have higher penetrance in early-onset neurodevelopmental disorders (i.e., developmental delays, autism, and variable congenital malformations), based on estimates that controlled for the population prevalence of these disorders. Percent of penetrance for SZ-associated CNVs ranged from 8.4 – 88% in subjects with neurodevelopmental disorders versus 2.2 – 18 % in SZ cases. The SZ-associated CNV with highest penetrance was 3q29 deletion stretching approximately 1.4 megabases (Kirov et al., 2014; Mulle, 2015). Microdeletions at 3q29 have been associated with SZ, which affect approximately 19 protein-coding genes (Mulle et al., 2010). Collective evidence from CNV and linkage studies (Ng et al.,

2009) points to 3q29 as a susceptibility locus for SZ. We examined the entire 3q29 locus for gene-wide associations with SZ based on GWAS meta-analysis results (Ripke et al.,

2014), and found no genes surpassing genome-wide significance threshold (Figure 2, left panel). However, the top ranking gene in 3q29 was PAK2 (p21-activated kinase 2), which promotes formation of dendritic spines through phosphorylation of MLC (Mao et al.,

2009; Zhang, 2005). These converging lines of evidence highlight PAK2 as a potentially

31 important candidate gene for SZ, and offers insight into a potential biological mechanism whereby loss of PAK2 through hemizygous deletion or deleterious SNPs may lead to dendritic spine loss, which is a hallmark defect in SZ (Garey et al., 1998; Glantz &

Lewis, 2000).

One of the most well characterized CNVs linked with SZ to date are microdeletions at

22q11.2 (also called 22q11.2 deletion syndrome), which occur in about 1 out of 4000 live births and can be diagnosed using fluorescence in situ hybridization (FISH) or microarray for a higher-resolution assay. Approximately 23% of 22q11.2 deletion syndrome cases develop SZ or schizoaffective disorder, however, this chromosomal abnormality only exists in about 1% of all SZ patients (Kirov et al., 2014). Despite low prevalence of

22q11.2 deletion syndrome in the SZ population (~1%), studying this chromosomal region has provided fundamental knowledge of pathogenic mechanisms that affect brain development, in addition to pathways affected by a strong de novo risk factor for SZ.

Interestingly, recent evidence has come through showing that non-homologous recombination at 22q11.2 causing duplication of genes in this locus is a protective factor for SZ (Rees et al., 2014). However, 22q11.2 duplication syndrome has reciprocal outcomes as well, including increasing the susceptibility for intellectual disability and autism spectrum disorders, which are also disproportionately represented in 22q11.2 deletion syndrome (De Smedt et al., 2007; Vorstman et al., 2006). Changes to the copy number of genes the 22q11.2 has clear pathogenic consequences, but may be woven into a compensatory mechanism in SZ. No genes in 22q11.2 locus achieved genome-wide significance from GWAS meta-analysis (Figure 2, right panel). Nevertheless, 22q11.2 is

32 a pathogenic locus and points to specific neurodevelopmental and immunological deficits in a significant portion of the SZ population.

Deleterious protein-coding mutations that arise de novo have been examined in the context of sporadic cases of SZ (non-familial) using the approach of high-throughput exome-sequencing (Xu et al., 2011). The authors uncovered de novo mutations in 40 protein-coding genes and identified at least one de novo mutation about half of the SZ cases examined (~1.6 fold higher than controls, though not statistically significant).

Mutation events thought to be pathogenic were found in DGCR2, a gene located in the

22q11.2 locus. This represents a recurring event in SZ genetics wherein a subset of loci have been repeatedly associated with SZ. This emphasizes the importance of merging results or data, whenever possible, to resolve genetic hotspots of SZ. In a similar effort, determining the cumulative effect of protein-coding mutations and chromosomal rearrangements (as polygenic signals), and their interaction with variability in the backdrop of common SNPs, may help to provide further resolution of the genetic architecture of SZ.

DISCUSSION

From the outset, there was limited success of genome-wide linkage studies, but these studies provide fundamental insights into the genetic architecture of SZ and laid the groundwork for large-scale collaborative GWASs. At present, there are > 108 independent loci associated with SZ from the largest GWAS meta-analysis to date, which implicated > 300 protein-coding genes (~1.4% of genes in the genome) in SZ. These loci need to be re-sequenced to identify the one or few susceptibility variants that explain

33 each GWAS signal, and then perform experimental follow-up work to determine the biological consequences of disease-relevant variants. This is an arduous process but one that can be simplified with systematic and comprehensive biological interpretations of

GWAS signals via bioinformatics and integration of transcriptomic/epigenomic data to pinpoint regions that are likely to contain functionally relevant variants. Individual common variants contribute incremental risk to SZ, though collectively exert powerful effects that can discriminate between SZ cases and control with moderate performance.

Modeling SZ polygenic effects of all cataloged variants through in vitro or in vivo studies will be a supremely difficult challenge to overcome, if not insurmountable. Thus, prioritizing disease-relevant variants from studies of large human subject cohorts and translating their functional relevance through poly-omic approaches can narrow the search pool down to a subset of highly actionable variants.

These searches are being supplemented by genome-wide rare variant studies.

There are dozens of rare variants and chromosomal rearrangements have been associated with SZ, and some of these events occur at longstanding susceptibility genes/regions.

Discovering highly pathogenic rare variants related to SZ has strong clinical relevance as it can be more straightforward to interpret the biological effects of a single penetrant variant compared to hundreds of non-coding variants spread across numerous genes. The drug design process is well suited to diseases related to gain-of-function mutations, because drugs can be readily made to precisely dampen activation of a protein or block ectopic effects. However, it is uncertain whether drug treatments can effectively rescue the function of a protein that has been ablated by loss-of-function mutations (Ségalat,

2007). Therefore, additional work is needed both in biomedical research and the

34 pharmaceutical design process to decipher the underlying etiology of SZ and understand how to effectively treat its symptoms on a person-by-person basis. We are encouraged by the collection of findings to date, and anticipate that work in the near future will provide additional resolution to the role of SZ risk variants on fundamental dimensions of mental health and illness. Given the well-documented relationship between SZ and BD, it is possible that these disorders can be investigated in parallel using complementary and/or overlapping studies, which is the type of framework currently being applied by the PGC,

CommonMind Consortium, and the PsychENCODE project for neuropsychiatric disorders. Furthermore, the provocative connection between SZ and the immune system has potential to shed light on pleiotropic genes and their functional role in mediating SZ risk, such as the recent report of the complement component encoded by the C4 gene playing a role in synaptic pruning deficits and increasing risk for SZ (Sekar et al., 2016).

Our knowledge of the human genome is growing at a rapid rate, which is promising for the future of complex trait studies. We have yet to fully grasp its multitude of layers, but technological advances will soon make it possible to swiftly and affordably sequence thousands of genomes bring us toward a fruitful era of personalized genomics and person-based medicine. It is unlikely that a single unified model of SZ will emerge as we dig through the many layers of the (epi)genome, but we anticipate future research will provide us with concrete knowledge of susceptibility genes/pathways and drug targets.

35

FIGURE 1. GWAS ASSOCIATIONS IN LINKAGE PEAKS. Maximum gene-level associations in chromosomal regions (left) 8q21 and 8q24.1, and (right) 10q12 reported in the largest GWAS meta-analysis of SZ to date (Ripke et al., 2014). SNPs were assigned to genes with a 20 kb window, then ranked according to their significance (- log10P). The association signal for the top ranking SNP was designated as the gene-level association score. Genes with a p-value < 1.0E-4 are labeled.

36

FIGURE 2. GWAS ASSOCIATIONS IN CNV LOCI. Maximum gene-level associations in chromosomal regions (left) 3q29 and (right) 22q11.2. See Figure 1 legend for a description of the layout.

37 BIBLIOGRAPHY

Agerbo, E., Sullivan, P. F., Vilhjálmsson, B. J., Pedersen, C. B., Mors, O., Børglum, A.

D., … Mortensen, P. B. (2015). Polygenic Risk Score, Parental Socioeconomic

Status, Family History of Psychiatric Disorders, and the Risk for Schizophrenia: A

Danish Population-Based Study and Meta-analysis. JAMA Psychiatry, 72(7), 635–

41. http://doi.org/10.1001/jamapsychiatry.2015.0346

Ambalavanan, A., Girard, S. L., Ahn, K., Zhou, S., Dionne-Laporte, A., Spiegelman, D.,

… Rouleau, G. A. (2015). De novo variants in sporadic cases of childhood onset

schizophrenia. European Journal of Human Genetics, 1–5.

http://doi.org/10.1038/ejhg.2015.218

Badner, J. A., & Gershon, E. S. (2002a). Meta-analysis of whole-genome linkage scans

of bipolar disorder and schizophrenia. Mol Psychiatry, 7(4), 405–411.

http://doi.org/10.1038/sj.mp.4001012

Badner, J. A., & Gershon, E. S. (2002b). Regional meta-analysis of published data

supports linkage of autism with markers on chromosome 7. Molecular Psychiatry,

7(1), 56–66. http://doi.org/10.1038/sj/mp/4000922

Badner, J. a, & Gershon, E. S. (2002). Meta-analysis of whole-genome linkage scans of

bipolar disorder and schizophrenia. Molecular Psychiatry, 7(4), 405–411.

http://doi.org/10.1038/sj.mp.4001012

Bergon, A., Belzeaux, R., Comte, M., Pelletier, F., Hervé, M., Gardiner, E. J., …

Ibrahim, E. C. (2015). CX3CR1 is dysregulated in blood and brain from

schizophrenia patients. Schizophrenia Research.

38 http://doi.org/10.1016/j.schres.2015.08.010

Bulik-Sullivan, B., Finucane, H. K., Anttila, V., Gusev, A., Day, F. R., Consortium, R.,

… Neale, B. M. (2015). An Atlas of Genetic Correlations across Human Diseases

and Traits. bioRxiv, 1–44. http://doi.org/10.1101/014498

Byrne, M., Agerbo, E., Eaton, W. W., & Mortensen, P. B. (2004). Parental socio-

economic status and risk of first admission with schizophrenia - A Danish national

register based study. Social Psychiatry and Psychiatric Epidemiology, 39(2), 87–96.

http://doi.org/10.1007/s00127-004-0715-y

Chang, E. H., Kirtley, A., Chandon, T. S. S., Borger, P., Husain-Krautter, S., Vingtdeux,

V., & Malhotra, A. K. (2015). Postnatal neurodevelopmental expression and

glutamate-dependent regulation of the ZNF804A rodent homologue. Schizophrenia

Research, 168(1–2), 402–410. http://doi.org/10.1016/j.schres.2015.06.023

Cohen, O. S., Weickert, T. W., Hess, J. L., Paish, L. M., McCoy, S. Y., Rothmond, D. A.,

… Glatt, S. J. (2015). A splicing-regulatory polymorphism in DRD2 disrupts

ZRANB2 binding, impairs cognitive functioning and increases risk for

schizophrenia in six Han Chinese samples. Molecular Psychiatry, (August 2014), 1–

8. http://doi.org/10.1038/mp.2015.137

Collins, A. L., Kim, Y., Bloom, R. J., Kelada, S. N., Sethupathy, P., & Sullivan, P. F.

(2014). Transcriptional targets of the schizophrenia risk gene MIR137.

Translational Psychiatry, 4, e404. http://doi.org/10.1038/tp.2014.42

Cross Disorder Group of the Psychiatric Genomics Consortium. (2013). Identification of

risk loci with shared effects on five major psychiatric disorders: a genome-wide

analysis. Lancet, 381(9875), 1371–9. http://doi.org/10.1016/S0140-6736(12)62129-

39 1

De Smedt, B., Devriendt, K., Fryns, J.-P., Vogels, A., Gewillig, M., & Swillen, A.

(2007). Intellectual abilities in a large sample of children with Velo-Cardio-Facial

Syndrome: an update. Journal of Intellectual Disability Research, 51(Pt 9), 666–

670. http://doi.org/10.1111/j.1365-2788.2007.00955.x

Fillman, S. G., Sinclair, D., Fung, S. J., Webster, M. J., & Shannon Weickert, C. (2014).

Markers of inflammation and stress distinguish subsets of individuals with

schizophrenia and bipolar disorder. Translational Psychiatry, 4, e365.

http://doi.org/10.1038/tp.2014.8

Finucane, H. K., Bulik-Sullivan, B., Gusev, A., Trynka, G., Reshef, Y., Loh, P.-R., …

Price, A. L. (2015). Partitioning heritability by functional category using GWAS

summary statistics. bioRxiv, (1), 14241. http://doi.org/10.1101/014241

French, L., Gray, C., Leonard, G., Perron, M., Pike, G. B., Richer, L., … Paus, T. (2015).

Early Cannabis Use, Polygenic Risk Score for Schizophrenia and Brain Maturation

in Adolescence. JAMA Psychiatry, 72(10), 1002–1011.

http://doi.org/10.1001/jamapsychiatry.2015.1131

Ganguli, R., Brar, J. S., & Rabin, B. S. (1994). Immune abnormalities in schizophrenia:

evidence for the autoimmune hypothesis. Harvard Review of Psychiatry, 2(2), 70–

83. http://doi.org/10.3109/10673229409017120

Gardiner, E. J., Cairns, M. J., Liu, B., Beveridge, N. J., Carr, V., Kelly, B., … Tooney, P.

A. (2013). Gene expression analysis reveals schizophrenia-associated dysregulation

of immune pathways in peripheral blood mononuclear cells. Journal of Psychiatric

Research, 47(4), 425–437. http://doi.org/10.1016/j.jpsychires.2012.11.007

40 Garey, L. J., Ong, W. Y., Patel, T. S., Kanani, M., Davis, A., Mortimer, A. M., … Hirsch,

S. R. (1998). Reduced dendritic spine density on cerebral cortical pyramidal neurons

in schizophrenia. J Neurol Neurosurg Psychiatry, 65(4), 446–453.

http://doi.org/10.1136/jnnp.65.4.446

Gibson, G. (2012). Rare and common variants: twenty arguments. Nature Reviews

Genetics, 13(2), 135–145. http://doi.org/10.1038/nrg3118

Glantz, L. a., & Lewis, D. A. (2000). Decreased dendritic spine density on prefrontal

cortical pyramidal neurons in schizophrenia. Archives of General Psychiatry, 57(1),

65–73. http://doi.org/10.1001/archpsyc.57.1.65

Guan, F., Zhang, B., Yan, T., Li, L., Liu, F., Li, T., … Li, S. (2013). MIR137 gene and

target gene CACNA1C of miR-137 contribute to schizophrenia susceptibility in Han

Chinese. Schizophrenia Research. http://doi.org/10.1016/j.schres.2013.11.004

Gulsuner, S., Walsh, T., Watts, A. C., Lee, M. K., Thornton, A. M., Casadei, S., …

McClellan, J. M. (2013). Spatial and temporal mapping of de novo mutations in

schizophrenia to a fetal prefrontal cortical network. Cell, 154(3), 518–529.

http://doi.org/10.1016/j.cell.2013.06.049Article

Hess, J. L., & Glatt, S. J. (2014). How might ZNF804A variants influence risk for

schizophrenia and bipolar disorder? A literature review, synthesis, and bioinformatic

analysis. American Journal of Medical Genetics. Part B, Neuropsychiatric

Genetics : The Official Publication of the International Society of Psychiatric

Genetics, 165(1), 28–40. http://doi.org/10.1002/ajmg.b.32207

Hess, J. L., Quinn, T. P., Akbarian, S., & Glatt, S. J. (2015). Bioinformatic analyses and

conceptual synthesis of evidence linking ZNF804A to risk for schizophrenia and

41 bipolar disorder. American Journal of Medical Genetics. Part B, Neuropsychiatric

Genetics : The Official Publication of the International Society of Psychiatric

Genetics, 168(1), 14–35. http://doi.org/10.1002/ajmg.b.32284

Holmans, P. A., Riley, B., Pulver, A. E., Owen, M. J., Wildenauer, D. B., Gejman, P. V,

… Levinson, D. F. (2009). Genomewide linkage scan of schizophrenia in a large

multicenter pedigree sample using single nucleotide polymorphisms. Mol

Psychiatry, 14(8), 786–795. http://doi.org/mp200911 [pii]\r10.1038/mp.2009.11

Jones, A. L., Mowry, B. J., Pender, M. P., & Greer, J. M. (2005). Immune dysregulation

and self-reactivity in schizophrenia: Do some cases of schizophrenia have an

autoimmune basis? Immunology and Cell Biology. http://doi.org/10.1111/j.1440-

1711.2005.01305.x

Kavanagh, D. H., Dwyer, S., O’Donovan, M. C., & Owen, M. J. (2013). The ENCODE

project: implications for psychiatric genetics. Molecular Psychiatry, 18(5), 540–2.

http://doi.org/10.1038/mp.2013.13

Kent, W. J., Sugnet, C. W., Furey, T. S., Roskin, K. M., Pringle, T. H., Zahler, A. M., &

Haussler, a. D. (2002). The Human Genome Browser at UCSC. Genome Research,

12(6), 996–1006. http://doi.org/10.1101/gr.229102

Kim, A. H., Parker, E. K., Williamson, V., McMichael, G. O., Fanous, A. H., &

Vladimirov, V. I. (2012). Experimental validation of candidate schizophrenia gene

ZNF804A as target for hsa-miR-137. Schizophr Res, 141(1), 60–64.

http://doi.org/S0920-9964(12)00355-6 [pii]10.1016/j.schres.2012.06.038

Kirov, G., Rees, E., Walters, J. T. R., Escott-Price, V., Georgieva, L., Richards, A. L., …

Owen, M. J. (2014). The penetrance of copy number variations for schizophrenia

42 and developmental delay. Biological Psychiatry, 75(5), 378–385.

http://doi.org/10.1016/j.biopsych.2013.07.022

Kirov, G., Zaharieva, I., Georgieva, L., Moskvina, V., Nikolov, I., Cichon, S., …

O’Donovan, M. C. (2009). A genome-wide association study in 574 schizophrenia

trios using DNA pooling. Molecular Psychiatry, 14(8), 796–803.

http://doi.org/10.1038/mp.2008.33

Kong, A., Frigge, M. L., Masson, G., Besenbacher, S., Sulem, P., Magnusson, G., …

Stefansson, K. (2012). Rate of de novo mutations and the importance of father’s age

to disease risk. Nature, 488(7412), 471–475. http://doi.org/10.1038/nature11396

Lavedan, C., Licamele, L., Volpi, S., Hamilton, J., Heaton, C., Mack, K., …

Polymeropoulos, M. H. (2009). Association of the NPAS3 gene and five other loci

with response to the iloperidone identified in a whole genome

association study. Molecular Psychiatry, 14(8), 804–19.

http://doi.org/10.1038/mp.2008.56

Lee, S. H., Goddard, M. E., Wray, N. R., & Visscher, P. M. (2012). A better coefficient

of determination for genetic profile analysis. Genetic Epidemiology, 36(3), 214–224.

http://doi.org/10.1002/gepi.21614

Lencz, T., Morgan, T. V, Athanasiou, M., Dain, B., Reed, C. R., Kane, J. M., …

Malhotra, a K. (2007). Converging evidence for a pseudoautosomal cytokine

receptor gene locus in schizophrenia. Molecular Psychiatry, 12(6), 572–580.

http://doi.org/10.1038/sj.mp.4001983

Levinson, D. F., Levinson, M. D., Segurado, R., & Lewis, C. M. (2003). Genome scan

meta-analysis of schizophrenia and bipolar disorder, part I: Methods and power

43 analysis. Am J Hum Genet, 73(1), 17–33. http://doi.org/S0002-9297(07)63892-0

[pii]\r10.1086/376548

Lewis, C. M., Levinson, D. F., Wise, L. H., DeLisi, L. E., Straub, R. E., Hovatta, I., …

Helgason, T. (2003). Genome scan meta-analysis of schizophrenia and bipolar

disorder, part II: Schizophrenia. American Journal of Human Genetics, 73(1), 34–

48. http://doi.org/10.1086/376549

Lonsdale, J., Thomas, J., Salvatore, M., Phillips, R., Lo, E., Shad, S., … Moore, H. F.

(2013). The Genotype-Tissue Expression (GTEx) project. Nature Genetics, 45(6),

580–5. http://doi.org/10.1038/ng.2653

Mah, S., Nelson, M. R., Delisi, L. E., Reneland, R. H., Markward, N., James, M. R., …

Braun, a. (2006). Identification of the semaphorin receptor PLXNA2 as a candidate

for susceptibility to schizophrenia. Molecular Psychiatry, 11(5), 471–478.

http://doi.org/10.1038/sj.mp.4001785

Mao, Y., Ge, X., Frank, C. L., Madison, J. M., Koehler, A. N., Doud, M. K., … Tsai, L.

H. (2009). Disrupted in Schizophrenia 1 Regulates Neuronal Progenitor

Proliferation via Modulation of GSK3??/??-Catenin Signaling. Cell, 136(6), 1017–

1031. http://doi.org/10.1016/j.cell.2008.12.044

McClellan, J. M., Susser, E., & King, M. C. (2007). Schizophrenia: A common disease

caused by multiple rare alleles. British Journal of Psychiatry.

http://doi.org/10.1192/bjp.bp.106.025585

McGrath, J. J., Petersen, L., Agerbo, E., Mors, O., Mortensen, P. B., & Pedersen, C. B.

(2014). A comprehensive assessment of parental age and psychiatric disorders.

JAMA Psychiatry, 71(3), 301–9. http://doi.org/10.1001/jamapsychiatry.2013.4081

44 Mistry, M., Gillis, J., & Pavlidis, P. (2013). Meta-analysis of gene coexpression networks

in the post-mortem prefrontal cortex of patients with schizophrenia and unaffected

controls. BMC Neuroscience, 14, 105. http://doi.org/10.1186/1471-2202-14-105

Moskvina, V., Craddock, N., Holmans, P., Nikolov, I., Pahwa, J. S., Green, E., …

O’Donovan, M. C. (2009). Gene-wide analyses of genome-wide association data

sets: evidence for multiple common risk alleles for schizophrenia and bipolar

disorder and for overlap in genetic risk. Mol Psychiatry, 14(3), 252–60.

http://doi.org/10.1038/mp.2008.133

Mulle, J. G. (2015). The 3q29 deletion confers >40-fold increase in risk for

schizophrenia. Molecular Psychiatry, 20(9), 1028–9.

http://doi.org/10.1038/mp.2015.76

Mulle, J. G., Dodd, A. F., McGrath, J. A., Wolyniec, P. S., Mitchell, A. A., Shetty, A. C.,

… Warren, S. T. (2010). Microdeletions of 3q29 confer high risk for schizophrenia.

American Journal of Human Genetics, 87(2), 229–236.

http://doi.org/10.1016/j.ajhg.2010.07.013

Müller, N., & Schwarz, M. J. (2007). The immunological basis of glutamatergic

disturbance in schizophrenia: Towards an integrated view. Journal of Neural

Transmission, Supplementa. http://doi.org/10.1007/978-3-211-73574-9-33

Ng, M. Y. M., Levinson, D. F., Faraone, S. V, Suarez, B. K., DeLisi, L. E., Arinami, T.,

… Lewis, C. M. (2009). Meta-analysis of 32 genome-wide linkage studies of

schizophrenia. Molecular Psychiatry, 14(8), 774–785.

http://doi.org/10.1038/mp.2008.135

Nicodemus, K. K., Hargreaves, A., Morris, D., Anney, R., Gill, M., Corvin, A., &

45 Donohoe, G. (2014). Variability in working memory performance explained by

epistasis vs polygenic scores in the ZNF804A pathway. JAMA Psychiatry, 71(7),

778–85. http://doi.org/10.1001/jamapsychiatry.2014.528

O’Donovan, M. C., Craddock, N., Norton, N., Williams, H., Peirce, T., Moskvina, V., …

Cloninger, C. R. (2008). Identification of loci associated with schizophrenia by

genome-wide association and follow-up. Nat Genet, 40(9), 1053–1055.

http://doi.org/ng.201 [pii]10.1038/ng.201

O’Donovan, M. C., Craddock, N., Norton, N., Williams, H., Peirce, T., Moskvina, V., …

Cloninger, C. R. (2008). Identification of loci associated with schizophrenia by

genome-wide association and follow-up. Nature Genetics, 40(9), 1053–5.

http://doi.org/10.1038/ng.201

Oertel-Knöchel, V., Lancaster, T. M., Knöchel, C., Stäblein, M., Storchak, H., Reinke,

B., … Linden, D. E. J. (2015). Schizophrenia risk variants modulate white matter

volume across the psychosis spectrum: evidence from two independent cohorts.

NeuroImage. Clinical, 7, 764–70. http://doi.org/10.1016/j.nicl.2015.03.005

Panoutsopoulou, K., Tachmazidou, I., & Zeggini, E. (2013). In search of low-frequency

and rare variants affecting complex traits. Human Molecular Genetics, 22(R1).

http://doi.org/10.1093/hmg/ddt376

Potkin, S. G., Turner, J. A., Fallon, J. A., Lakatos, A., Keator, D. B., Guffanti, G., &

Macciardi, F. (2009). Gene discovery through imaging genetics: identification of

two novel genes associated with schizophrenia. Molecular Psychiatry, 14(4), 416–

28. http://doi.org/10.1038/mp.2008.127

Potkin, S. G., Turner, J. A., Guffanti, G., Lakatos, A., Fallon, J. H., Nguyen, D. D., …

46 Macciardi, F. (2009). A genome-wide association study of schizophrenia using brain

activation as a quantitative phenotype. Schizophrenia Bulletin, 35(1), 96–108.

http://doi.org/10.1093/schbul/sbn155

Power, R. A., Steinberg, S., Bjornsdottir, G., Rietveld, C. A., Abdellaoui, A., Nivard, M.

M., … Stefansson, K. (2015). Polygenic risk scores for schizophrenia and bipolar

disorder predict creativity. Nature Neuroscience, 18(7), 953–5.

http://doi.org/10.1038/nn.4040

Psychiatric GWAS Consortium Steering Committee. (2009). A framework for

interpreting genome-wide association studies of psychiatric disorders. Molecular

Psychiatry, 14(1), 10–7. http://doi.org/10.1038/mp.2008.126

Purcell, S. M., Moran, J. L., Fromer, M., Ruderfer, D., Solovieff, N., Roussos, P., …

Sklar, P. (2014). A polygenic burden of rare disruptive mutations in schizophrenia.

Nature, 506(7487), 185–90. http://doi.org/10.1038/nature12975

Purcell, S. M., Wray, N. R., Stone, J. L., Visscher, P. M., O’Donovan, M. C., Sullivan, P.

F., & Sklar, P. (2009). Common polygenic variation contributes to risk of

schizophrenia and bipolar disorder. Nature, 460(7256), 748–752.

http://doi.org/nature08185 [pii]10.1038/nature08185

Rees, E., Kirov, G., Sanders, A., Walters, J. T. R., Chambert, K. D., Shi, J., … Owen, M.

J. (2014). Evidence that duplications of 22q11.2 protect against schizophrenia.

Molecular Psychiatry, 19(1), 37–40. http://doi.org/10.1038/mp.2013.156

Ripke, S., Neale, B. M., Corvin, A., Walters, J. T. R., Farh, K.-H., Holmans, P. A., …

O’Donovan, M. C. (2014). Biological insights from 108 schizophrenia-associated

genetic loci. Nature, 511(7510), 421–7. http://doi.org/10.1038/nature13595

47 Ripke, S., O’Dushlaine, C., Chambert, K., Moran, J. L., Kähler, A. K., Akterin, S., …

Sullivan, P. F. (2013). Genome-wide association analysis identifies 13 new risk loci

for schizophrenia. Nature Genetics, 45(10), 1150–9. http://doi.org/10.1038/ng.2742

Ripke, S., Sanders, A. R., Kendler, K. S., Levinson, D., Sklar, P., & Holmans, P. A.

(2011). Genome-wide association study identifies five new schizophrenia loci.

Nature Genetics, 43(10), 969–76. http://doi.org/10.1038/ng.940

Roussos, P., Mitchell, A. C., Voloudakis, G., Fullard, J. F., Pothula, V. M., Tsang, J., …

Sklar, P. (2014). A Role for Noncoding Variation in Schizophrenia. Cell Reports,

9(4), 1417–29. http://doi.org/10.1016/j.celrep.2014.10.015

Schultz, C. C., Nenadic, I., Riley, B., Vladimirov, V. I., Wagner, G., Koch, K., … Sauer,

H. (2013). ZNF804A and Cortical Structure in Schizophrenia: In Vivo and

Postmortem Studies. Schizophrenia Bulletin, 3–5.

http://doi.org/10.1093/schbul/sbt123

Ségalat, L. (2007). Loss-of-function genetic diseases and the concept of pharmaceutical

targets. Orphanet Journal of Rare Diseases, 2, 30. http://doi.org/10.1186/1750-

1172-2-30

Seguardo Detera-Wadleigh, S.D., Levinson, D.F., et al., R. (2003). Genome scan meta-

analysis of schizophrenia and bipolar disorder, part III: bipolar disorder. American

Journal of Human Genetics, 73(1), 49–62.

Sekar, A., Bialas, A., de Rivera, H., Davis, H., Hammond, T., Kamitaki, N., …

McCarrol, S. A. (2016). Schizophrenia risk from complex variation of complement

component 4. Nature, 177–83.

Shi, J., Levinson, D. F., Duan, J., Sanders, A. R., Zheng, Y., Pe’er, I., … Gejman, P. V.

48 (2009). Common variants on chromosome 6p22.1 are associated with schizophrenia.

Nature, 460(7256), 753–7. http://doi.org/10.1038/nature08192

Shields, J., & Gottesman, I. I. (1972). Cross-national diagnosis of schizophrenia in twins.

The heritability and specificity of schizophrenia. Archives of General Psychiatry,

27(6), 725–30. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/4637890

Shifman, S., Johannesson, M., Bronstein, M., Chen, S. X., Collier, D. A., Craddock, N.

J., … Darvasi, A. (2008). Genome-wide association identifies a common variant in

the reelin gene that increases the risk of schizophrenia only in women. PLoS

Genetics, 4(2), e28. http://doi.org/10.1371/journal.pgen.0040028

Stamatoyannopoulos, J. a. (2012). What does our genome encode? Genome Research,

22(9), 1602–1611. http://doi.org/10.1101/gr.146506.112

Stefansson, H., Ophoff, R. A., Steinberg, S., Andreassen, O. A., Cichon, S., Rujescu, D.,

… Collier, D. A. (2009). Common variants conferring risk of schizophrenia. Nature,

460(7256), 744–7. http://doi.org/10.1038/nature08186

Sullivan, P. F., Lin, D., Tzeng, J.-Y., van den Oord, E., Perkins, D., Stroup, T. S., …

Close, S. L. (2008). Genomewide association for schizophrenia in the CATIE study:

results of stage 1. Molecular Psychiatry, 13(6), 570–84.

http://doi.org/10.1038/mp.2008.25

Tao, R., Cousijn, H., Jaffe, A. E., Burnet, P. W. J., Edwards, F., Eastwood, S. L., …

Kleinman, J. E. (2014). Expression of ZNF804A in Human Brain and Alterations in

Schizophrenia, Bipolar Disorder, and Major Depressive Disorder: A Novel

Transcript Fetally Regulated by the Psychosis Risk Variant rs1344706. JAMA

Psychiatry. http://doi.org/10.1001/jamapsychiatry.2014.1079

49 Tsuang, M. (2000). Schizophrenia: genes and environment. Biological Psychiatry, 47(3),

210–20. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/10682218

Vorstman, J. A., Morcus, M. E., Duijff, S. N., Klaassen, P. W., Heineman-de Boer, J. A.,

Beemer, F. A., … van Engeland, H. (2006). The 22q11.2 deletion in children: high

rate of autistic disorders and early onset of psychotic symptoms. J.Am.Acad.Child

Adolesc.Psychiatry, 45(9), 1104–1113.

Walsh, T., McClellan, J. M., McCarthy, S. E., Addington, A. M., Pierce, S. B., Cooper,

G. M., … Sebat, J. (2008). Rare structural variants disrupt multiple genes in

neurodevelopmental pathways in schizophrenia. Science (New York, N.Y.),

320(5875), 539–43. http://doi.org/10.1126/science.1155174

Xu, B., Roos, J. L., Dexheimer, P., Boone, B., Plummer, B., Levy, S., … Karayiorgou,

M. (2011). Exome sequencing supports a de novo mutational paradigm for

schizophrenia. Nature Genetics, 43(9), 864–868. http://doi.org/10.1038/ng.902

Xu, J., Sun, J., Chen, J., Wang, L., Li, a, Helm, M., … Chen, X. (2012). RNA-Seq

analysis implicates dysregulation of the immune system in schizophrenia. BMC

Genomics, 13 Suppl 8(Suppl 8), S2. http://doi.org/10.1186/1471-2164-13-S8-S2

Zhang, H. (2005). A GIT1/PIX/Rac/PAK Signaling Module Regulates Spine

Morphogenesis and Synapse Formation through MLC. Journal of Neuroscience,

25(13), 3379–3388. http://doi.org/10.1523/JNEUROSCI.3553-04.2005

50 GENETICS OF BIPOLAR DISORDER: CLUES FROM GENOME-WIDE STUDIES OF COMMON AND RARE VARIANTS

ABSTRACT

Bipolar disorder (BD) is a relatively common and highly heritable neuropsychiatric disorder. BD etiology remains mysterious and progress into developing more effective treatments for BD has been slow over the past 50 years. Identifying risk genes for BD has been a significant challenge. Candidate gene and linkage studies struggled find significant genes and reproduce signals, however, several promising genes had emerged and provided insight into potential mechanisms underlying BD pathophysiology. Technological advances set the stage for a rapid growth in genome- wide association studies (GWASs) over the past decade, which offered a cutting edge and unbiased approach to discover risk genes without prior knowledge of pathophysiology.

The promise of GWAS is coming into fruition with 73 SNPs (implicating about 60 genes) surpassing genome-wide significance (p < 5e-08) in association with BD (Welter et al.,

2014). Expanding on this, genome-wide rare variant studies provided a complementary line of evidence implicating risk genes that have been of long-standing interest to researchers. In this review, I highlight (1) the underlying biology (if known) of top candidate genes and their relevance to BD pathophysiology that emerged from large- scale candidate gene and genome-wide analyses, (2) the evolving perspective of BD etiology in light of accumulating evidence, and (3) my perspective on issues that will be of importance in the next stage of BD genetics research.

51 INTRODUCTION

Bipolar disorder (BD) is a relatively common mental illness with a lifetime prevalence at approximately 1%. The typical onset of BD is around early adulthood and can become a chronic condition that gets progressive worse with time. The core characteristic of BD is severe episodic changes in mood with switches from mania and depression, however, individuals with BD can also present with sleep disturbances, cognitive and motor abnormalities, and psychosis (Marvel & Paradiso, 2004). The overall disability burden associated with BD (disability-adjusted life-year, DALY) is the third highest for all neuropsychiatric and neurodegenerative disorders (Eriksen et al., 2002).

Individuals with BD are at elevated risk for co-morbidities including anxiety, behavioral, and substance use disorders (Merikangas et al., 2011). The lifetime rate of suicides among individuals with BD (29%) is significantly higher compared to individuals with unipolar depression and all other DSM psychiatric disorders combined. It is important to advance our understanding of BD etiology in order to improve detection, treatment, and outcomes.

Many twin and family studies provided strong support that BD is highly heritable

(60-80%) but is not completely concordant, indicating that genetic factors play an important role but genes are not deterministic (i.e., incomplete penetrance) (Polderman et al., 2015; Smoller & Finn, 2003). Several attractive candidate genes have been proposed for BD based on findings from human studies and animal models. However, classical linkage and association studies yielded weak or inconsistent findings. It was postulated that there are potentially genetically distinct but clinically indistinguishable sub-groups of

BD affected individuals (Gershon, 1977), which could obscure detection of genetic

52 associations. There has been success in identifying susceptibility genes for BD from genome-wide association studies (GWAS), however, there were significant challenges to overcome and many to still consider for follow-up studies. It is clear that there is not one or few genes that explain BD, rather complex interactions between many susceptibility genes and possibly environmental factors.

The next major challenge is deciphering the underlying biology of genetic loci associated with BD. There is work in place by ours and other labs to characterize BD genetic signals using bioinformatics approaches that integrate data from GWAS with knowledge of functional elements in the human genome. These efforts coupled with fine- mapping strategies, such as the study of synaptic pruning abnormalities associated linked with schizophrenia (SZ) risk-variants in the complement component 4 (C4) gene (Sekar et al., 2016), will help provide a richer understanding of BD etiology. In this review, I highlight major findings from the past decade leading up to and immediately following large-scale GWAS meta-analyses. I enumerate challenges and considerations for the future, and work that I have done to address burgeoning questions in BD genetics.

CANDIDATE GENES FOR BD RELATED TO NEUROTRANSMISSION, PLASTICITY, AND CELLULAR RESILIENCE

Prior to emergence of high-density genome-wide SNP genotyping arrays, our understanding of BD etiology was limited to evidence from: (1) association studies that genotyped common variants from a small pool of candidate genes, and later (2) genome- wide linkage studies that screened for BD causal genes using the high-throughput method of restriction fragment length polymorphism (RFLP) genotyping for hundreds of microsatellites. Candidate were nominated using a priori hypotheses about the

53 underlying neurobiology of BD (i.e., gene sets circumscribing pathways related to lithium signaling, neurodevelopment, and neurotransmission) or position of genes under or proximal to linkage peaks (i.e., 6q and 8q) associated with BD (Fallin et al., 2005; Perlis et al., 2008; Sklar et al., 2002).

Lithium has been used as a first-line mood stabilizer for treating mania and depression in BD for over fifty years. Clinical studies show that lithium is one of the most effective treatments for BD, and significantly reduces risk for suicide compared to other mood stabilizers (Cruceanu, Alda, & Turecki, 2009). However, lithium is effective in only a subset of patients and severe side-effects are reported with long-term usage

(e.g., renal toxicity, hypothyroidism, weight gain, or development defects in offspring if taken during pregnancy) (Cruceanu et al., 2009). Investigators have looked for susceptibility genes for BD the widespread targets of lithium (e.g., GSK-3 pathway, phosphoinositol cycle, ERK/MAPK pathway, CREB1, BCL2, BDNF, etc.) (Machado-

Vieira, Manji, Zarate, & Jr, 2009).

In addition, there was a burst of association studies focusing on neurotransmitter systems (i.e., dopaminergic, serotonergic, and GABAergic), which were thought to bear susceptibility genes for mood disorders based on supporting evidence from animal models and human studies (Manji et al., 2003; Perona et al., 2008; Petty, 1995). There is convincing evidence that dysregulation of biogenic amines could predispose individuals to BD, thus early conceptualizations of BD etiology thought abnormal neurotransmission was key to pathophysiology (Bunney, 1970; Luchins, 1976; Schildkraut, 1974). This conceptualization evolved to multi-system interest in the complex interplay of genes, molecules, cells, and circuits (Manji et al., 2003; Schloesser, Huang, Klein, & Manji,

54 2008), and determining the role of neuroplasticity, synaptic plasticity, and cellular resilience in BD pathophysiology. Evidence from neuroimaging, postmortem brain gene expression studies, histopathological, and pharmacological studies provide converging support to the hypothesis that perturbation to signaling cascades underlies BD pathophysiology (Schloesser et al., 2008). This hypothesis is complementary into the notion that direct targets and signaling cascades modulated by lithium comprise susceptibility genes for BD.

There have been few genes with a promising association with BD out of the hundreds of candidate genes examined. The following are genes that showed strong association with BD from three of the largest association studies (Fallin et al., 2005;

Perlis et al., 2008; Sklar et al., 2002): BDNF, DAO, GRM3, GRM4, GRIN2B, IL2RB,

TUBA8, DPYSL2, NOS1, GRID1, TACR1, GABRB2 , and DISC1. These genes exhibit widespread expression in the brain, encode multiple transcripts, and exhibit biological links to (a) central nervous system development, (b) neurotransmitter signaling, or (c) cellular resilience as outlined in Table 1.

EVIDENCE FROM GWAS AND FUNCTIONAL ANALYSES

Here I review promising candidate genes that emerged from GWAS and have been extensively studied through follow-up studies to determine their potential role in BD pathophysiology.

55 CAV1.2 L-TYPE CALCIUM CHANNEL

I collated a list of 217 candidate genes for BD from three family-based association studies that examined pools of candidate genes based on a priori hypotheses about BD etiology, including genes that play a role in central nervous system development, neurotransmission, pathways that interact with lithium, and handful of risk genes for SZ that might have shared association with BD (Fallin et al., 2005; Perlis et al.,

2008; Sklar et al., 2002). I then cross-referenced this with evidence from three GWAS of

BD and one GWAS of the major psychoses (BD and SZ combined) to determine which, if any, of the candidate genes approached or surpassed genome-wide significance (p- value < 5e-08, indicative of a robust signal) in association with BD (Cross Disorder

Group of the Psychiatric Genomics Consortium, 2013; Ruderfer et al., 2013; Sklar et al.,

2011). Interestingly, only one candidate gene present on chromosome 12p13.33 called

CACNA1C, which encodes the L-type Voltage-gated Ca2+ Channel Subunit Alpha 1C

(Cav1.2), surpassed genome-wide significance. One association peak is present in the same region of the CACNA1C gene in all four GWASs (Figure 1), and was significantly associated with BD and SZ (rs1006737[A], Odds ratio = 1.124, P = 5.51e-13). This risk variant is located in intron 3 of CACNA1C, and is in modest to strong linkage disequilibrium with only non-coding variants. Genome-wide linkage meta-analysis did not reveal a strong association of chromosome 12 with BD or the major psychoses

(Badner & Gershon, 2002). However, the region 12p13 showed suggestive evidence of association with BD from a follow-up linkage scan in an Ashkenazi Jewish sample

(Avramopoulos et al., 2007). From an independent GWAS, CACNA1C was in one of four peak loci that showed genome-wide significant cross-disorder association across five

56 major psychiatric disorders (SZ, BD, major depression, autism, and attention- deficit/hyperactivity disorder).(Cross Disorder Group of the Psychiatric Genomics

Consortium, 2013) Thus, the association of CACNA1C with BD and related psychiatric disorders is well supported, and follow-up mechanistic work may reveal the role of this risk gene in etiology of several mental illnesses. It is possible that risk mutations in

CACNA1C contributes to BD pathophysiology by altering expression of the calcium channel or its functionality. This may lead to alterations in calcium-mediated signaling or calcium flux, which regulate multiple processes including neurotransmitter release, neuronal differentiations, transcription, and potentially axonal outgrowth (Rosenberg &

Spitzer, 2011).

One study reported that the BD-risk variant rs1006737 is associated with decreased expression of CACNA1C in the human cerebellum, suggesting that this risk-

SNP is a cis-expression quantitative trait locus (eQTL). This association of rs1006737 with decreased CACNA1C expression was replicated by two independent studies (Eckart et al., 2016; Roussos et al., 2014). The rs1006737 variant was found to be in strong linkage disequilibrium with nearby variants located in a predicted enhancer in intron 3 of

CACNA1C, and that the enhancer region physically interacts with the transcription start site through long-range interaction as shown by chromosomal conformation capture assays of human postmortem brain tissue (Roussos et al., 2014). Cav1.2 is important for neuronal survival, thus may be relevant to cellular resilience (Berridge, 1998; Ebbers et al., 2015). Furthermore, Cav1.2 is shown to play a role in circadian regulation, which is promising given that sleep disturbances are common among psychiatric disorders, including disorders associated with CACNA1C from GWAS. CACNA1C down-regulation

57 or pharmacological inhibition of Cav1.2 were found to disrupt circadian rhythm amplitudes and diminish the ability of lithium to modulate circadian rhythms based on experiments in cultured fibroblasts harvested from BD patients and unaffected comparison subjects (McCarthy et al., 2016).

ZINC FINGER 804A

The gene ZNF804A encoding zinc-finger 804A, a putative was one of the first genes to show a significant cross-disorder association with SZ and

BD (O’Donovan et al., 2008), with a peak association found at the non-coding variant rs1344706 present in intron 2 of ZNF804A. This gene has been extensively studied in recent years through genetic, neuroimaging, and functional analyses to reveal the role it plays in pathogenesis. The literature on ZNF804A is far too extensive to cover in this article, thus I refer readers to my two review articles (Hess & Glatt, 2014; Hess, Quinn,

Akbarian, & Glatt, 2015).

The ZNF804A gene and surrounding locus was fine-mapped confirming the association of rs1344706 with SZ and BD (Odds ratio = 1.11, P = 4.10e-13) (Williams et al., 2011). ZNF804A mRNA and protein is widely expressed in the brain, and shows preferentially high expression during prenatal development and in pyramidal neurons

(Hess & Glatt, 2014; Hess et al., 2015; Tao et al., 2014). Interestingly, the rs1344706 risk-conferring T-allele was significantly associated with decreased expression of

ZNF804A in the prenatal brain (Hill & Bray, 2012). The same group profiled neural progenitor cells with whole genome gene expression microarrays after knock-down of

ZNF804A expression, and found significant down-regulation of genes involved in cell

58 adhesion, suggesting that ZNF804A could play a role in processes involved in neurite outgrowth or synapse formation. Recently, ZNF804A was found to (1) localize in the nucleus and dendrites of CTX0E16 human neural stem cells, (2) co-localize with synaptic markers, and (3) regulates neurite outgrowth as shown by deficits in neurite length following siRNA-mediated knock-down of ZNF804A (Deans et al., 2016).

ZNF804A is predicted to have a C2H2-zinc finger domain, suggesting that it possesses DNA-binding capabilities. It is hypothesized that ZNF804A mediates transcriptional regulation. ZNF804A was shown to alter mRNA expression and bind to promoter region of PRSS16, a candidate gene for SZ, and COMT, a candidate gene for SZ and BD (Girgenti, LoTurco, & Maher, 2012). There is emerging interest the role of gene- gene interactions (i.e., “epistasis”) in mediating susceptibility for psychiatric disorders.

ZNF804A and its pathways have been examined for evidence of gene-gene interactions underlying psychosis and BD risk (Hess et al., 2015; Nicodemus et al., 2014). One study used a machine learning approach to identify potentially interacting SNPs (first interaction: STAC and MAPK8IP2; second interaction: two SNPs flanking FAM46A) that when added as an interaction predictor in a multivariate regression had significantly increased the amount of variance explained in “narrow psychosis” (i.e., SZ or schizoaffective disorder) and “broad psychosis” (i.e., SZ, schizoaffective disorder, BD, major depression, and psychosis not-otherwise-specified) phenotypes relative to polygenic risk scores alone (Nicodemus et al., 2014). Note that the two-SNP interaction flanking FAM46A is potentially driven by linkage disequilibrium, considering these two

SNPs identified in that study show a high degree of correlation in the 1000 Genomes reference panel (r2 = 0.79, dʹ = 0.93). Nevertheless, ZNF804A is one of the most

59 promising susceptibility genes identified to date for SZ and BD, with several lines of evidence supporting its role in early brain development and pathogenesis.

ANKYRIN G

Ankyrin G is a spectrin-actin cytoskeletal protein encoded by the gene ANK3 located on chromosome 10q21.2. ANK3 is another leading candidate gene for BD, and has been extensively studied to determine its functional role in the brain and potential mechanisms linking ANK3 disruption to BD susceptibility. Evidence suggests that

Ankyrin G regulates neuronal excitability through clustering of voltage-gated sodium- channels (Nav) along axons (Barry et al., 2014) and by modifying sodium current by gating Nav1.6 channels (Shirahata et al., 2006). An association of ANK3 with BD was first identified in a collaborative GWAS published in 2008 (Ferreira et al., 2008), and then twice replicated by the Psychiatric Genomics Consortium (Cross Disorder Group of the Psychiatric Genomics Consortium, 2013; Sklar et al., 2011). RNA-sequencing and immunohistochemistry shows that ANK3 is expressed throughout the body, with preferentially high protein expression in endocrine, urinary, and male reproductive organs

(Uhlen et al., 2015). In the brain, ANK3 is highly transcribed in the cerebellar cortex and shows region-specific profiles of isoform expression. Investigators used a custom ANK3 microarray probe set and count-based detection procedure(Richard et al., 2014) and found that more than half of the ANK3 transcripts in the cerebellum represent the long splice isoform (Rueckert et al., 2013). The rs1938526 risk-conferring C-allele (Odds ratio

= 1.32, P = 1.851e-09) (Pamela Sklar et al., 2011) was associated with down-regulation

60 of several ANK3 isoforms, including the long isoform, specifically in the cerebellum

(Quinn et al., 2010).

Neuroimaging reports have investigated structural and functional correlates between the cerebellum and BD, with a preponderance of studies reporting deficits in cerebellar volumes (Mills, DelBello, Adler, & Strakowski, 2005; Soares & Mann, 1997) and connectivity patterns (Shinn et al., 2016) among individuals with BD. However, there have been conflicting reports that found no significant associations (Laidi et al., 2015;

Monkul et al., 2008). Beyond the role of the cerebellum in coordination of motor circuitry, multiple lines of evidence provide supporting evidence of its role in regulating cognition and affective processing (Middleton & Strick, 2000; Schutter & van Honk,

2005; Stoodley, 2012). Abnormal cerebellar development may contribute to the pathogenesis of BD or illness severity, though more work is needed to determine if atrophy predisposes to BD or is a manifestation of sequela. Reciprocally, unaffected

“resilient” relatives of BD individuals (i.e., individuals that carry a genetic burden for BD but are asymptomatic) present with larger volumes in the cerebellar vermis compared to

BD relatives, suggesting that there may be an underlying adaptive mechanisms linked with the cerebellum that mitigate genetic susceptibility to BD (Frangou, 2012). Evidence of ANK3 genetic effects on cerebellar morphology is limited to two studies which have little agreement (Ota et al., 2016; Tesli et al., 2013), thus additional follow-up studies will be needed to clarify the association of ANK3 on brain structures. Taken together, dysregulation of ANK3 gene and isoform expression related to BD risk-conferring genotypes is a promising finding, and may shed light on mechanistic links underlying BD pathogenesis.

61

TETRATRICOPEPTIDE REPEAT AND ANKYRIN REPEAT CONTAINING 1

TRANK1 located on chromosome 3p22.2 and encodes a 336 kiloDalton protein called Tetratricopeptide Repeat and Ankyrin Repeat Containing 1. The primary transcript encodes nine ankyrin repeat domains, which potentially mediate protein-protein interactions. Multiple GWAS support the association of TRANK1 with BD, including trans-ancestry analyses of European and Asian samples (Chen et al., 2013; Forstner et al.,

2017; Ikeda et al., 2017). The 3ʹ region of TRANK1 is consistently associated with BD, suggesting that a susceptible variant is located in or around that segment. This gene shares an association with SZ (Ripke et al., 2014), suggesting that it confers pleotropic effects. There is suggestive evidence that TRANK1 is associated with lithium response in

BD patients, however, this is not supported by a genome-wide significant peak (Song et al., 2016). Fine-mapping of this locus (with special attention to the 3ʹ region) is needed to pinpoint the true susceptible variant in TRANK1 and identify the underlying mechanism driving the GWAS risk signal. Thus far, knowledge of TRANK1 biology is unknown. It is possible that TRANK1 plays a role in cellular innate immunity based on (1) evidence of being up-regulated by interferon- and Hepatitis C-mediated stimulation of human pluripotent stem cell-derived hepatic cells (Ignatius Irudayam et al., 2015), and (2) knowledge that TRANK1 is one of 200 genes that are up-regulated in activated CD4+ T- cells (Constantinides, Picard, Savage, & Bendelac, 2011; Subramanian et al., 2005).

Additional studies are needed to determine the function of TRANK1 and its role in BD pathophysiology.

62 TENEURIN TRANSMEMBRANE PROTEIN 4

TENM4 (also referred to as ODZ4) is at the locus 11q14.1 and encodes a 307 transmembrane protein that is preferentially expressed in neurons and regulates differentiation of oligodendrocytes, myelination, and axon guidance (Hor et al., 2015). It is conserved across vertebrates and shows a significant probability of loss-of-function intolerance based on the number of predicted versus observed mutations from a human exome-sequencing data collection (Lek et al., 2016). From the Mouse Genome

Informatics database, there are multiple mutations in TENM4 known to cause prenatal lethality, growth arrest, abnormal neurological phenotypes (i.e., tremors), and severe neurodevelopmental disruptions (i.e., abnormal anterior-posterior axis patterning, oligodendrocyte morphology, and oligodendrocyte apoptosis) (Eppig et al., 2017). The association of TENM4 with BD is supported by multiple GWASs (Ikeda et al., 2017;

Sklar et al., 2011) and functional neuroimaging analysis that revealed an association between a BD-risk conferring variant in TENM4 and processing (Heinrich et al., 2013). The functional role of TENM4 in organization of the nervous system provides additional support to the hypothesis of a neurodevelopmental basis to BD, and it is possible that this gene plays a role in behavioral problems that underlie BD.

SPECTRIN REPEAT CONTAINING NUCLEAR ENVELOPE PROTEIN 1

SYNE1 is located on chromosome 6q25 and encodes a member of the linker of the nucleoskeleton and cytoskeletal (LINC) complex, which is involved in providing stability and position of the nucleus. This gene demonstrates pleotropic relationships with

63 neurological and neuropsychiatric disorders. There are multiple highly pathogenic SYNE1 protein-truncating mutations linked with autosomal recessive cerebellar ataxia, and a putatively damaging de novo missense mutation potentially causative for autism (Dupré,

Gros-Louis, Bouchard, Noreau, & Rouleau, 1993; Mademan et al., 2016; O’Roak et al.,

2011). Furthermore, SYNE1 is a significant susceptibility gene for BD (Sklar et al., 2011;

Xu et al., 2014) and recurrent major depression (Green et al., 2013). The top SNP

(rs9371601) associated with BD from the Psychiatric Genomics Consortium phase 1

GWAS resides in SYNE1 and CPG2 (candidate plasticity-gene 2), which is a brain- specific alternative splicing variant of SYNE1 first identified in the hippocampus dentate gyrus in rat (Nedivi, Hevroni, Naot, Israeli, & Citri, 1993). The protein product of CPG2 localizes to dendritic spines and is embedded in F-actin cytoskeleton, which could help in regulating the morphology of synapses. CPG2 is functionally relevant to BD because it regulates synaptic glutamate receptor internalization via postsynaptic endocytosis

(Loebrich et al., 2013). From the BRAINEAC database (a resource for UK Brain

Expression Consortium) (Ramasamy et al., 2014), the top BD-associated SNP rs9371601 is nominally associated with SYNE1 expression levels in the human thalamus (top P =

1.4e-03), temporal cortex (top P = 1.8e-04), and intralobular white matter (top P = 1.4e-

02), and shows a significant brain-wide association with SYNE1 expression (top P = 3.1e-

03). Based on Roadmap Epigenomics Consortium data, rs9371601 is in a region of open- chromatin defined by DNase I hypersensitivity peaks in eight brain tissues (hippocampus, anterior caudate, substantia nigra, cingulate gyrus, inferior temporal lobe, angular gyrus, dorsolateral prefrontal cortex, germinal matrix). Several of these brain tissues show peak enrichment of histone markers that are common at poised or strong enhancers (i.e.,

64 H3K4me3, H3K27ac, and H3K9ac). Interestingly, knock-down of Cpg2 expression in cultured hippocampal neurons resulted in smaller dendritic spine heads (Cottrell, Borok,

Horvath, & Nedivi, 2004), a characteristic of weak synapses, despite synaptic spines having a higher abundance of glutamate receptors. Expanding on this finding, CPG2 down-regulation increases mini excitatory postsynaptic cell potential amplitudes, suggesting that neurons are hyperexcitable with loss of CPG2 expression (Loebrich et al.,

2013). It is possible that synaptic plasticity is affected by the BD-risk SNP rs9371601 via changes to CPG2 expression, which triggers alterations to synaptic morphology, postsynaptic glutamate receptor density, and neuronal excitability. Lithium might restore glutamate receptor internalization and subsequently neuronal excitability by activating the cAMP-PKA pathway (Tsuji, Morinobu, Tanaka, Kawano, & Yamawaki, 2003), which might potentially increase PKA-mediated phosphorylation of CPG2 leading to endocytosis of glutamatergic receptors. Re-sequencing and eQTL mapping of the SYNE1 locus can help reveal the true susceptibility variant and potential mechanisms underlying

BD etiology. Investigating potential SNP-SNP interactions between the cAMP-PKA pathway, F-actin cytoskeletal genes, and SYNE1 could provide biological insight into BD susceptibility at the level of postsynaptic regulation.

INTERFERON INDUCED PROTEIN 44 LIKE

The interferon-stimulated gene IFI44L resides on chromosome 1p31.1 and is a paralog of the IFI44, both of which play a role in cellular response to viral infection.

IFI44L reached genome-wide significance in the most recent GWAS of BD (Ruderfer et al., 2013). Interestingly, IFI44L is also significantly associated with febrile seizures in

65 children related to MMR (measles, mumps, rubella) vaccination (Feenstra et al., 2014) and the top SNPs from BD and febrile seizure GWASs are mildly correlated (r2 = 0.124, dʹ = 0.859). Since there is evidence that IFI44L may be a pleiotropic gene, it is a fitting candidate for phenome-wide association study (PheWAS) to identify other traits that it potentially mediates.

MAD1 MITOTIC ARREST DEFICIENT LIKE 1

A component of the mitotic spindle-assembly checkpoint is encoded by the gene

MAD1L1, which is found at the locus 7p22.3 and has been associated with BD in three

GWASs to date (Hou et al., 2016; Ruderfer et al., 2013; Sklar et al., 2011). MAD1L1 has been extensively studied for its role in mitosis regulation and is a potential tumor- suppressor gene (Kops, Weaver, & Cleveland, 2005; Tsukasaki et al., 2001). In addition, there is a crystalized structure available for the Mad1l1 protein (Kim, Sun, Tomchick,

Yu, & Luo, 2012), which is noteworthy for investigating interactions with other , molecules, or compounds. The therapeutic potential of MAD1L1 is uncertain and it is not known drug target (according to Ingenuity Pathway Analysis and Therapeutic Target

Database). The BD-risk allele at rs11764590 was recently associated with reduced activation and diminished functional connectivity between prefrontal and mesolimbic circuits in adults free of psychiatric illness (Trost et al., 2016). These regions are intimately related to reward system processing. If this finding can be reproduced, then it is possible that MAD1L1 is related to an intermediate phenotype of BD. One of the regions that displayed hypoactivation among health rs11764590 risk allele carriers was the ventral tegmental area (VTA), which is comprised of ~30% GABAergic neurons

66 (Creed, Ntamati, & Tan, 2014). GABA neuron cell cycle regulation and post-mitotic genomic stability are thought to play a role in neuropsychiatric susceptibility (Benes,

2011). It is possible that risk variants in MAD1L1 predispose to selective neuronal vulnerability in GABA cells consequently alter the functionality of mesolimbic circuits.

Ontogenetic experiments in transgenic mice demonstrated that GABA cells in the VTA functionally regulate neighboring dopaminergic neurons and affect reward consummatory behaviors (van Zessen, Phillips, Budygin, & Stuber, 2012). It would be of interest to investigate whether MAD1L1 risk variants interacts with the network of genes involved in maintaining the functional and genomic stability of GABA neurons to alter BD susceptibility. This follow-up analysis may help identify a role that MAD1L1 plays in BD etiology, and shine additional light on a BD risk pathway (comprising GABA and dopaminergic regulation) that has strong therapeutic potential.

RARE AND DE NOVO MUTATIONS IN BD CANDIDATE GENES

Rare deleterious mutations such as copy number variants (CNVs) and smaller protein-disrupting de novo mutations have been investigated for their etiological relevance to BD. Statistical power is a major obstacle for rare variant studies in BD, particularly when investigators are interested in testing for differences in frequencies of specific rare variants between groups, such as CNVs at certain breakpoints. To combat this problem, investigators will often group rare variants by a particular class (i.e., CNV deletions or duplicates) or potentially biological function (i.e. pathway-level tests) to boost statistical power but sacrificing resolution. To date, reports from well-powered

67 analyses demonstrated that the global burden of CNVs in BD-affected individuals is approximately equal that of controls, and that the rates of CNVs are higher in SZ-affected cases compared to BD cases (Green et al., 2016; Grozeva et al., 2010). Interestingly, one study reported lower frequencies of de novo CNVs in BD patients compared to SZ

(Georgieva et al., 2014), supporting the view that CNVs have a stronger etiological relevance to SZ. However, it is possible that CNVs increase risk for early-onset BD wherein symptoms emerge before adulthood (< 18 years old), which was supported by one report found a higher frequency of de novo CNVs BD patients that had early-onset mania compared to controls (Zhang et al. 2009). Taken together, these finding suggest that CNVs may have a minor or targeted role in BD etiology, which requires additional follow-up analyses to determine the role that CNVs play in functional or neurochemical deficits in BD. Gene disrupted by BD-implicated CNVs are of interest for follow-up investigations. It may be less challenging to illuminate underlying disease mechanisms related to rare protein-disrupting mutations and CNVs that increase BD risk relative to association signals from GWAS, which are typically small in effect size and difficult to localize to a single gene. Synthesizing findings from GWAS and rare variant studies is one approach to determine which, if any, candidate genes or pathways can be prioritized for functional analyses.

One the largest CNV studies of BD to date with a combined sample size of approximately 9,000 cases with BD (from three published studies and one new sample) and 82,000 controls examined 15 CNV loci previously implicated in SZ and identified three significantly associated with BD risk (Green et al., 2016): (1) duplications at 1q22.1

(Odds ratio = 2.64, P = 0.022), (2) deletions at 3q29 (Odds ratio = 17.31, P = 0.03), and

68 (3) duplications at 16p11.2 (Odds ratio = 4.37, P = 2.3E-04). No genes in these loci showed genome-wide significant association with BD, however, there are strong GWAS signals in the 16q11.2 locus at genes SETD1A (rs9788865, P = 1.11E-07) and ZNF668

(rs2359674, P = 1.17E-07). It is possible that these genes are etiologically related to BD via common and rare susceptibility variants, which is significant for follow-up analyses and evaluating their therapeutic potential. From a whole-exome sequencing study of

20,804 subjects (Singh et al., 2016), SETD1A showed a significant enriched with rare loss-of-function mutations in SZ (10 cases versus 0 controls, P = 5.6e-09). Based on whole-exome aggregation data, only two subjects out of 60,706 possessed heterozygous loss-of-function mutation in SETD1 (Lek et al., 2016), which is strongly suggestive that there are selectively constraint mechanisms protecting this gene from protein-disrupting mutations. Additional evidence was provided to support the hypothesis that SETD1A loss-of-function mutations may be associated with deleterious outcomes, such as 8 out of the 10 loss-of-function SZ carriers exhibiting chronic illness requiring long-term hospitalizations and that seven additional SETD1A loss-of-function carriers without SZ had intellectual disability (Singh et al., 2016). The enzyme encoded by SETD1A is catalyzes methylation of lysine residues in histone H3 complexes, which is important for chromatin remodeling and gene expression regulation. Experimental evidence from transgenic mice exhibiting loss of Setd1a expression show deficits in gastrulation and neural stem cell survival (Bledau et al., 2014), which provides key insight into the role of

SETD1A in early development and a potential mechanism underlying BD pathogenesis.

For sporadic cases of BD or SZ related to SETD1A mutations, it may be important to tailor drugs to target this gene or its pathway to alleviate symptoms of mental illness.

69 Given its association to intellectual disability and hospitalizations, SETD1A mutations may be useful as prognostic biomarkers. Additional work will be required to identify disease-specific mutations in this gene or its pathway to use as diagnostic biomarkers for

SZ or BD.

Green et al. also examined CNVs across all genes in the genome (excluding those from SZ-implicated loci) using a restricted sample of ~2600 BD cases and ~8900 controls, yielding 55 genes with a nominally significant difference in CNV rates between the groups, including the glutamate ionotropic receptor subunit encoded by GRIN2A and the transcription factor encoded by ATF7IP2, which interacts with the histone H3K9 methyltransferase SETDB1. None of these genes remained significant after multiple testing corrections were applied, thus a larger sample size will be needed to achieve genome-wide significance. In comparison to SZ, a genome-wide mega-analysis of 41,321 subjects (1.04 cases:control ratio) yielded 8 significant CNV associations (Marshall et al.,

2016). Remarkably, one candidate gene from the list of 217 (PLCG2 referenced in (Sklar et al., 2002)) exhibited a higher frequency of deletions in BD (3 cases versus 0 controls,

P = 0.012) in that this genome-wide CNV analysis. PLCG2 encodes the transmembrane signaling enzyme Phospholipase C Gamma 2, which promotes inositol-1,4,5-triphosphate second messenger pathway and diacylglycerol second messenger signaling pathways.

These are known to be lithium-sensitive pathways, indicating that genetic disruption to

PLCG2 functionality may be key to BD pathogenesis and treatment.

70 DISCUSSION

Large-scale genetic studies are helping us gain insight into the complex etiology of BD. We are starting to piece together clues from genome-wide common and rare variant analyses and prioritizing genes for follow-up analysis. BD has a strong heritable component, however, it is difficult to identify susceptibility genes for BD potentially due to small sample sizes of studies to date, or for reasons that investigators do not have control over (i.e., heterogeneity). Despite these challenges, a handful of promising candidate genes have emerged over the past two decades of genetic studies for BD, which individually account for a fraction of BD heritability. Averaging the effects of common

BD risk variants across hundreds to thousands of genes accounts for ~2% of BD liability

(Purcell et al., 2009), which is similar to the amount of variance in SZ capture by BD risk alleles. Identifying risk variants associated with illness sub-types or intermediate phenotypes (also called “endophenotypes”) might help reveal different molecular substrates of BD and advance us toward a personal genomics approach to psychiatric disorders. Several approaches to identify molecular substrates of psychiatric disorders have been proposed, such as an investigation of the polygenic architecture of clinical dimensions of BD (Ruderfer et al., 2013) using SZ-associated risk alleles. One study claimed that unsupervised clustering is capable of identifying SZ sub-types using SZ- associated risk variants (Arnedo et al., 2014). However, uncontrolled technical artefacts may have confounded these findings. Taken together, there are several genetics approaches in place lending promise for our understanding of BD etiology. A critical next step for research is the identification of underlying biology of GWAS risk signals for BD.

There has been rapidly growing interest in applying poly-omic approaches for identifying

71 susceptibility pathways for BD. Recently, international collaborations such as the

PsychENCODE project and CommonMind Consortium released transcriptomic, epigenetic, and genome-wide SNP genotypes for postmortem brain collections of psychiatric and non-psychiatric samples. These projects are continuing to collect and process data and these efforts have potential to significantly impact our knowledge of BD etiology.

In the interim, bioinformatics can capitalize on currently available genome-wide association results to offer biological insights. Recently, a method called stratified linkage disequilibrium score (LD score) regression demonstrated that heritability for complex phenotypes, including BD, can be computed across genomic annotations such as transcriptional regulation, chromatin remodeling, and cell-specific enhancers (Finucane et al., 2015). This method can help determine which functional elements in the genome contribute are enriched for heritability directly from summary statistics, which are typically easier to obtain access to (or made freely available) compared to individual level-genotype. This is a promising approach for investigating the underlying biology of complex polygenic phenotypes from all common SNPs in the genome. However, stratified LD score regression may not be amenable to questions that some investigators have based on practical limitations, namely computational time and resources required to compute LD scores across newly acquired functional annotations for integration into the

LD score model to test for heritability enrichment. I propose an alternative method in this dissertation adapted for GWAS summary statistics and genome-wide functional annotations obtained from ENCODE and Roadmap projects. My method uses a logistic regression framework to estimate the probability that a functional annotation will overlap

72 an LD-clumped interval based on its overall z-score. Simply put, this method determines if functional annotations are generally enriched in GWAS risk signals versus the rest of the genome. It is possible to examine enrichment of functional annotations among a subset of highly significant susceptible variants (i.e., SNPs with a P < 5.0e-08) using a

“local enrichment” analysis such as the one proposed by Coetzee et al. for biological insights into Parkinson’s disease risk variants (Coetzee et al., 2016). This method suffers from one major drawback in that it does not account for differences in linkage disequilibrium between risk SNPs in the enrichment analysis, thus there may be upward bias of the resulting test statistics.

Furthermore, there is an abundance of transcriptome-wide gene expression data available for SZ and BD which can offer insight into their etiologies and common molecular signatures. Based on multiple converging lines of work, it is hypothesized that functionally relevant non-coding variation involved in the regulation of gene expression strongly contributes to BD risk. Investigating relationships between risk-conferring genotypes, genetic/epigenetic regulatory elements, and gene expression profiles may help mechanisms relevant to pathophysiology. Furthermore, there is significant genetic overlap between SZ and BD reported in family studies and GWAS, thus transcriptome- wide studies can help reveal molecular substrates that are disrupted in both disorders.

Identification of disorder-specific signatures in gene expression profiles holds promise for developing better biomarkers and therapeutics. In addition, overlaying transcriptome- wide expression profiles and GWAS risk signals can help to pinpoint risk-conferring genes in regions of complex LD (e.g., the MHC locus) and infer functional consequences of risk signals. In this dissertation, I describe my work using transcriptome-wide

73 expression profiles from multiple laboratories to identify molecular signatures of SZ and

BD, including machine learning models that accurately distinguish these two disorders using blood-based RNA profiles. This work has potential to impact our fundamental understanding of psychiatric etiology and has clinical relevance. The compendium of findings we have obtained from genomic, epigenomic, and transcriptomic analyses will have considerable utility for future research. In addition, this work is providing a rigid foundation for personalized genomics where a main focus will be the biological interpretation of disease-relevant mutations to identifying clinically actionable variation, and developing rational drug treatments in a person-focused approach in order to alleviate suffering from debilitating illnesses.

74 TABLE 1. PROMISING BD RISK GENES THAT EMERGED FROM LARGE FAMILY-BASED CANDIDATE GENE STUDIES OR GENOME-WIDE ASSOCIATION STUDIES.

Brain region(s) with highest probeset expression Gene symbol Evidence (BRAINEAC)† Number of transcripts§ Functional relationship(s) BDNF Family-based Cerebellar cortex 36 Neurodevelopment, neurotransmission

DAO Family-based Medulla 20 Glutamate transmission and metabolism

GRM3 Family-based Putamen 5 Glutamate signaling

GRM4 Family-based Cerebellar cortex 15 Glutamate signaling

GRIN2B Family-based Hippocampus 9 Glutamate signaling

IL2RB Family-based Thalamus 7 Immune response, mitogenic signal transduction

TUBA8 Family-based Cerebellar cortex 5 Cytoskeletal remodeling, neurodevelopment

DPYSL2 Family-based Intralobular white matter 8 Neuron growth and polarity

NOS1 Family-based Putamen 15 Neurotransmission and plasticity

GRID1 Family-based Intralobular white matter 6 Glutamate signaling

TACR1 Family-based Putamen 3 Neurotransmission, growth, immune response, calcium mobilization

GABRB2 Family-based Occipital cortex 9 GABA signaling

DISC1 Family-based Intralobular white matter 39 Neurogenesis, synaptic formation Family- CACNA1C based/GWAS Cerebellar cortex / Thalamus 42 Neurotransmission, cell growth, motility

ODZ4 (TENM4) GWAS Thalamus 13 Neurodevelopment, myelination

SYNE1 GWAS Cerebellar cortex 41 Cytoskeletal organization

IFI44L GWAS Medulla / Intralobular white matter 10 Cellular response to viral infection

TRANK1 GWAS Cerebellar cortex 3 Unknown function / possible role in innate immune response

MAD1L1 GWAS Intralobular white matter 23 Cell cycle regulation

PIK3C2A GWAS Putamen 5 Cell growth, apoptosis, metabolism

ZNF804A GWAS Cerebellar cortex 2 Possible zinc-finger transcription factor ANK3 GWAS Cerebellar cortex 32 Synaptic receptor organization, cellular trafficking † Whole-gene expression data (mean log2 expression of probe sets) assayed from 134 postmortem brain samples across 10 regions (http://www.braineac.org). The region with the highest mean expression level for each gene was assigned to column 3. § Number of Ensembl transcripts (Ensembl Genome browser version 87, accessed March 2017).

75

FIGURE 1. CACNA1C REGIONAL ASSOCIATION PLOT. Results are shown across four genome-wide association studies by the

Psychiatric Genomics Consortium (2012: cases = 11,974, controls = 51,792; 2013: cases = 6,990, controls 4,820; 2014: cases

= 19,779, controls = 19,432; and 2014 BD + SZ: cases = 10,410, controls = 10,700).

76 BIBLIOGRAPHY

Arnedo, J., Svrakic, D. M., Del Val, C., Romero-Zaliz, R., Hernández-Cuervo, H., Fanous, A. H.,

… Zwir, I. (2014). Uncovering the Hidden Risk Architecture of the :

Confirmation in Three Independent Genome-Wide Association Studies. The American

Journal of Psychiatry, 172(2), 139–53. http://doi.org/10.1176/appi.ajp.2014.14040435

Avramopoulos, D., Lasseter, V. K., Fallin, M. D., Wolyniec, P. S., McGrath, J. A., Nestadt, G.,

… Pulver, A. E. (2007). Stage II follow-up on a linkage scan for bipolar disorder in the

Ashkenazim provides suggestive evidence for chromosome 12p and the GRIN2B gene.

Genetics in Medicine : Official Journal of the American College of Medical Genetics, 9(11),

745–51. http://doi.org/10.1097GIM.0b013e318159a37c

Badner, J. A., & Gershon, E. S. (2002). Meta-analysis of whole-genome linkage scans of bipolar

disorder and schizophrenia. Mol Psychiatry, 7(4), 405–411.

http://doi.org/10.1038/sj.mp.4001012

Barry, J., Gu, Y., Jukkola, P., O’Neill, B., Gu, H., Mohler, P. J., … Gu, C. (2014). Ankyrin-G

Directly Binds to Kinesin-1 to Transport Voltage-Gated Na+ Channels into Axons.

Developmental Cell, 28(2), 117–131. http://doi.org/10.1016/j.devcel.2013.11.023

Benes, F. M. (2011). Regulation of cell cycle and DNA repair in post-mitotic GABA neurons in

psychotic disorders. Neuropharmacology, 60(7–8), 1232–42.

http://doi.org/10.1016/j.neuropharm.2010.12.011

Berridge, M. J. (1998). Neuronal calcium signaling. Neuron, 21(1), 13–26. Retrieved from

http://www.ncbi.nlm.nih.gov/pubmed/9697848

Bledau, A. S., Schmidt, K., Neumann, K., Hill, U., Ciotta, G., Gupta, A., … Anastassiadis, K.

(2014). The H3K4 methyltransferase Setd1a is first required at the epiblast stage, whereas

Setd1b becomes essential after gastrulation. Development (Cambridge, England), 141(5),

1022–35. http://doi.org/10.1242/dev.098152

77

Bunney, W. (1970). THE SWITCH PROCESS FROM DEPRESSION TO MANIA:

RELATIONSHIP TO DRUGS WHICH ALTER BRAIN AMINES. The Lancet, 295(7655),

1022–1027. http://doi.org/10.1016/S0140-6736(70)91151-7

Chen, D. T., Jiang, X., Akula, N., Shugart, Y. Y., Wendland, J. R., Steele, C. J. M., … Strauss, J.

(2013). Genome-wide association study meta-analysis of European and Asian-ancestry

samples identifies three novel loci associated with bipolar disorder. Molecular Psychiatry,

18(2), 195–205. http://doi.org/10.1038/mp.2011.157

Coetzee, S. G., Pierce, S., Brundin, P., Brundin, L., Hazelett, D. J., & Coetzee, G. A. (2016).

Enrichment of risk SNPs in regulatory regions implicate diverse tissues in Parkinson’s

disease etiology. Scientific Reports, 6, 30509. http://doi.org/10.1038/srep30509

Constantinides, M. G., Picard, D., Savage, A. K., & Bendelac, A. (2011). A naive-like population

of human CD1d-restricted T cells expressing intermediate levels of promyelocytic leukemia

zinc finger. Journal of Immunology (Baltimore, Md. : 1950), 187(1), 309–15.

http://doi.org/10.4049/jimmunol.1100761

Cottrell, J. R., Borok, E., Horvath, T. L., & Nedivi, E. (2004). CPG2: a brain- and synapse-

specific protein that regulates the endocytosis of glutamate receptors. Neuron, 44(4), 677–

90. http://doi.org/10.1016/j.neuron.2004.10.025

Creed, M. C., Ntamati, N. R., & Tan, K. R. (2014). VTA GABA neurons modulate specific

learning behaviors through the control of dopamine and cholinergic systems. Frontiers in

Behavioral Neuroscience, 8, 8. http://doi.org/10.3389/fnbeh.2014.00008

Cross Disorder Group of the Psychiatric Genomics Consortium. (2013). Identification of risk loci

with shared effects on five major psychiatric disorders: a genome-wide analysis. Lancet,

381(9875), 1371–9. http://doi.org/10.1016/S0140-6736(12)62129-1

Cruceanu, C., Alda, M., & Turecki, G. (2009). Lithium: a key to the genetics of bipolar disorder.

Genome Medicine, 1(8), 79. http://doi.org/10.1186/gm79

Deans, P. J. M., Raval, P., Sellers, K. J., Gatford, N. J. F., Halai, S., Duarte, R. R. R., … al., et.

78

(2016). Psychosis Risk Candidate ZNF804A Localizes to Synapses and Regulates Neurite

Formation and Dendritic Spine Structure. Biological Psychiatry, 0(0), 1053–1055.

http://doi.org/10.1016/j.biopsych.2016.08.038

Dupré, N., Gros-Louis, F., Bouchard, J.-P., Noreau, A., & Rouleau, G. A. (1993). SYNE1-Related

Autosomal Recessive Cerebellar Ataxia. GeneReviews(®). University of Washington,

Seattle. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/20301553

Ebbers, L., Satheesh, S. V, Janz, K., Rüttiger, L., Blosa, M., Hofmann, F., … Nothwang, H. G.

(2015). L-type Calcium Channel Cav1.2 Is Required for Maintenance of Auditory

Brainstem Nuclei. The Journal of Biological Chemistry, 290(39), 23692–710.

http://doi.org/10.1074/jbc.M115.672675

Eckart, N., Song, Q., Yang, R., Wang, R., Zhu, H., McCallion, A. S., & Avramopoulos, D.

(2016). Functional Characterization of Schizophrenia-Associated Variation in CACNA1C.

PLOS ONE, 11(6), e0157086. http://doi.org/10.1371/journal.pone.0157086

Eppig, J. T., Smith, C. L., Blake, J. A., Ringwald, M., Kadin, J. A., Richardson, J. E., & Bult, C.

J. (2017). Mouse Genome Informatics (MGI): Resources for Mining Mouse Genetic,

Genomic, and Biological Data in Support of Primary and Translational Research. In

Methods in molecular biology (Clifton, N.J.) (Vol. 1488, pp. 47–73).

http://doi.org/10.1007/978-1-4939-6427-7_3

Eriksen, M., Ezzati, M., Holck, S., Lawes, C., Parag, V., Priest, P., & Vander Hoorn, S. (2002).

The World Health Report 2002: Reducing Risks, Promoting Healthy Life, 14661–7000.

Retrieved from http://www.who.int/whr/2002/en/whr02_en.pdf?ua=1

Fallin, M. D., Lasseter, V. K., Avramopoulos, D., Nicodemus, K. K., Wolyniec, P. S., McGrath,

J. A., … Pulver, A. E. (2005). Bipolar I disorder and schizophrenia: a 440-single-nucleotide

polymorphism screen of 64 candidate genes among Ashkenazi Jewish case-parent trios.

American Journal of Human Genetics, 77(6), 918–36. http://doi.org/10.1086/497703

Feenstra, B., Pasternak, B., Geller, F., Carstensen, L., Wang, T., Huang, F., … Hviid, A. (2014).

79

Common variants associated with general and MMR vaccine-related febrile seizures. Nature

Genetics, 46(12), 1274–82. http://doi.org/10.1038/ng.3129

Ferreira, M. A. R., O’Donovan, M. C., Meng, Y. A., Jones, I. R., Ruderfer, D. M., Jones, L., …

Craddock, N. (2008). Collaborative genome-wide association analysis supports a role for

ANK3 and CACNA1C in bipolar disorder. Nature Genetics, 40(9), 1056–8.

http://doi.org/10.1038/ng.209

Finucane, H. K., Bulik-Sullivan, B., Gusev, A., Trynka, G., Reshef, Y., Loh, P.-R., … Price, A.

L. (2015). Partitioning heritability by functional annotation using genome-wide association

summary statistics. Nature Genetics, 47(11), 1228–1235. http://doi.org/10.1038/ng.3404

Forstner, A. J., Hecker, J., Hofmann, A., Maaser, A., Reinbold, C. S., Mühleisen, T. W., …

Nöthen, M. M. (2017). Identification of shared risk loci and pathways for bipolar disorder

and schizophrenia. PLOS ONE, 12(2), e0171595.

http://doi.org/10.1371/journal.pone.0171595

Frangou, S. (2012). Brain structural and functional correlates of resilience to Bipolar Disorder.

Frontiers in Human Neuroscience, 5, 184. http://doi.org/10.3389/fnhum.2011.00184

Georgieva, L., Rees, E., Moran, J. L., Chambert, K. D., Milanova, V., Craddock, N., … Kirov, G.

(2014). De novo CNVs in bipolar affective disorder and schizophrenia. Human Molecular

Genetics, 23(24), 6677–6683. http://doi.org/10.1093/hmg/ddu379

Gershon, E. S. (1977). Genetic and Biologic Studies of Affective Illness. In The Impact of

Biology on Modern Psychiatry (pp. 207–228). Boston, MA: Springer US.

http://doi.org/10.1007/978-1-4684-0778-5_16

Girgenti, M. J., LoTurco, J. J., & Maher, B. J. (2012). ZNF804a regulates expression of the

schizophrenia-associated genes PRSS16, COMT, PDE4B, and DRD2. PloS One, 7(2),

e32404. http://doi.org/10.1371/journal.pone.0032404

Green, E. K., Grozeva, D., Forty, L., Gordon-Smith, K., Russell, E., Farmer, A., … Craddock, N.

(2013). Association at SYNE1 in both bipolar disorder and recurrent major depression.

80

Molecular Psychiatry, 18(5), 614–617. http://doi.org/10.1038/mp.2012.48

Green, E. K., Rees, E., Walters, J. T. R., Smith, K.-G., Forty, L., Grozeva, D., … Kirov, G.

(2016). Copy number variation in bipolar disorder. Molecular Psychiatry, 21(1), 89–93.

http://doi.org/10.1038/mp.2014.174

Grozeva, D., Kirov, G., Ivanov, D., Jones, I. R., Jones, L., Green, E. K., … Wellcome Trust Case

Control Consortium. (2010). Rare Copy Number Variants<subtitle>A Point of Rarity

in Genetic Risk for Bipolar Disorder and Schizophrenia</subtitle><alt-

title>Rare Copy Number Variants</alt-title> Archives of General Psychiatry,

67(4), 318. http://doi.org/10.1001/archgenpsychiatry.2010.25

Heinrich, A., Lourdusamy, A., Tzschoppe, J., Vollstädt-Klein, S., Bühler, M., Steiner, S., …

Nees, F. (2013). The risk variant in ODZ4 for bipolar disorder impacts on amygdala

activation during reward processing. Bipolar Disorders, 15(4), 440–5.

http://doi.org/10.1111/bdi.12068

Hess, J. L., & Glatt, S. J. (2014). How might ZNF804A variants influence risk for schizophrenia

and bipolar disorder? A literature review, synthesis, and bioinformatic analysis. American

Journal of Medical Genetics. Part B, Neuropsychiatric Genetics : The Official Publication

of the International Society of Psychiatric Genetics, 165(1), 28–40.

http://doi.org/10.1002/ajmg.b.32207

Hess, J. L., Quinn, T. P., Akbarian, S., & Glatt, S. J. (2015). Bioinformatic analyses and

conceptual synthesis of evidence linking ZNF804A to risk for schizophrenia and bipolar

disorder. American Journal of Medical Genetics. Part B, Neuropsychiatric Genetics : The

Official Publication of the International Society of Psychiatric Genetics, 168(1), 14–35.

http://doi.org/10.1002/ajmg.b.32284

Hill, M. J., & Bray, N. J. (2012). Evidence that schizophrenia risk variation in the ZNF804A gene

exerts its effects during fetal brain development. Am J Psychiatry, 169(12), 1301–1308.

81

http://doi.org/1461106 [pii]10.1176/appi.ajp.2012.11121845

Hor, H., Francescatto, L., Bartesaghi, L., Ortega-Cubero, S., Kousi, M., Lorenzo-Betancor, O., …

Estivill, X. (2015). Missense mutations in TENM4, a regulator of axon guidance and central

myelination, cause essential tremor. Human Molecular Genetics, 24(20), 5677–86.

http://doi.org/10.1093/hmg/ddv281

Hou, L., Bergen, S. E., Akula, N., Song, J., Hultman, C. M., Landé N, M., … Lang, M. (2016).

Genome-wide association study of 40,000 individuals identifies two novel loci associated

with bipolar disorder. Human Molecular Genetics, 25(15), 3383–3394.

http://doi.org/10.1093/hmg/ddw181

Ignatius Irudayam, J., Contreras, D., Spurka, L., Subramanian, A., Allen, J., Ren, S., …

Arumugaswami, V. (2015). Characterization of type I interferon pathway during hepatic

differentiation of human pluripotent stem cells and hepatitis C virus infection. Stem Cell

Research, 15(2), 354–64. http://doi.org/10.1016/j.scr.2015.08.003

Ikeda, M., Takahashi, A., Kamatani, Y., Okahisa, Y., Kunugi, H., Mori, N., … Iwata, N. (2017).

A genome-wide association study identifies two novel susceptibility loci and trans

population polygenicity associated with bipolar disorder. Molecular Psychiatry.

http://doi.org/10.1038/mp.2016.259

Kim, S., Sun, H., Tomchick, D. R., Yu, H., & Luo, X. (2012). Structure of human Mad1 C-

terminal domain reveals its involvement in kinetochore targeting. Proceedings of the

National Academy of Sciences, 109(17), 6549–6554.

http://doi.org/10.1073/pnas.1118210109

Kops, G. J. P. L., Weaver, B. A. A., & Cleveland, D. W. (2005). On the road to cancer:

aneuploidy and the mitotic checkpoint. Nature Reviews Cancer, 5(10), 773–785.

http://doi.org/10.1038/nrc1714

Laidi, C., d’Albis, M.-A., Wessa, M., Linke, J., Phillips, M. L., Delavest, M., … Houenou, J.

(2015). Cerebellar volume in schizophrenia and bipolar I disorder with and without

82

psychotic features. Acta Psychiatrica Scandinavica, 131(3), 223–233.

http://doi.org/10.1111/acps.12363

Lek, M., Karczewski, K. J., Minikel, E. V., Samocha, K. E., Banks, E., Fennell, T., …

Consortium, E. A. (2016). Analysis of protein-coding genetic variation in 60,706 humans.

Nature, 536(7616), 285–291. http://doi.org/10.1038/nature19057

Loebrich, S., Djukic, B., Tong, Z. J., Cottrell, J. R., Turrigiano, G. G., & Nedivi, E. (2013).

Regulation of glutamate receptor internalization by the spine cytoskeleton is mediated by its

PKA-dependent association with CPG2. Proceedings of the National Academy of Sciences,

110(47), E4548–E4556. http://doi.org/10.1073/pnas.1318860110

Luchins, D. (1976). Biogenic amines and affective disorders. A critical analysis. International

Pharmacopsychiatry, 11(3), 135–49. Retrieved from

http://www.ncbi.nlm.nih.gov/pubmed/136425

Machado-Vieira, R., Manji, H. K., Zarate, C. A., & Jr. (2009). The role of lithium in the treatment

of bipolar disorder: convergent evidence for neurotrophic effects as a unifying hypothesis.

Bipolar Disorders, 11 Suppl 2(Suppl 2), 92–109. http://doi.org/10.1111/j.1399-

5618.2009.00714.x

Mademan, I., Harmuth, F., Giordano, I., Timmann, D., Magri, S., Deconinck, T., … Synofzik, M.

(2016). Multisystemic SYNE1 ataxia: confirming the high frequency and extending the

mutational and phenotypic spectrum. Brain, 139(8), e46–e46.

http://doi.org/10.1093/brain/aww115

Manji, H. K., Quiroz, J. A., Payne, J. L., Singh, J., Lopes, B. P., Viegas, J. S., & Zarate, C. A.

(2003). The underlying neurobiology of bipolar disorder. World Psychiatry : Official

Journal of the World Psychiatric Association (WPA), 2(3), 136–46. Retrieved from

http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1525098&tool=pmcentrez&ren

dertype=abstract

Marshall, C. R., Howrigan, D. P., Merico, D., Thiruvahindrapuram, B., Wu, W., Greer, D. S., …

83

Sebat, J. (2016). Contribution of copy number variants to schizophrenia from a genome-

wide study of 41,321 subjects. Nature Genetics. http://doi.org/10.1038/ng.3725

Marvel, C. L., & Paradiso, S. (2004). Cognitive and neurological impairment in mood disorders.

The Psychiatric Clinics of North America, 27(1), 19–36, vii–viii.

http://doi.org/10.1016/S0193-953X(03)00106-0

McCarthy, M. J., Le Roux, M. J., Wei, H., Beesley, S., Kelsoe, J. R., & Welsh, D. K. (2016).

Calcium channel genes associated with bipolar disorder modulate lithium’s amplification of

circadian rhythms. Neuropharmacology, 101, 439–48.

http://doi.org/10.1016/j.neuropharm.2015.10.017

Merikangas, K. R., Jin, R., He, J.-P., Kessler, R. C., Lee, S., Sampson, N. A., … Zarkov, Z.

(2011). Prevalence and Correlates of Bipolar Spectrum Disorder in the World Mental Health

Survey Initiative. Archives of General Psychiatry, 68(3), 241.

http://doi.org/10.1001/archgenpsychiatry.2011.12

Middleton, F. A., & Strick, P. L. (2000). Basal ganglia and cerebellar loops: motor and cognitive

circuits. Brain Research. Brain Research Reviews, 31(2–3), 236–50. Retrieved from

http://www.ncbi.nlm.nih.gov/pubmed/10719151

Mills, N. P., DelBello, M. P., Adler, C. M., & Strakowski, S. M. (2005). MRI Analysis of

Cerebellar Vermal Abnormalities in Bipolar Disorder. American Journal of Psychiatry,

162(8), 1530–1533. http://doi.org/10.1176/appi.ajp.162.8.1530

Monkul, E. S., Hatch, J. P., Sassi, R. B., Axelson, D., Brambilla, P., Nicoletti, M. A., … Soares,

J. C. (2008). MRI study of the cerebellum in young bipolar patients. Progress in Neuro-

Psychopharmacology and Biological Psychiatry, 32(3), 613–619.

http://doi.org/10.1016/j.pnpbp.2007.09.016

Nedivi, E., Hevroni, D., Naot, D., Israeli, D., & Citri, Y. (1993). Numerous candidate plasticity-

related genes revealed by differential cDNA cloning. Nature, 363(6431), 718–22.

http://doi.org/10.1038/363718a0

84

Nicodemus, K. K., Hargreaves, A., Morris, D., Anney, R., Gill, M., Corvin, A., & Donohoe, G.

(2014). Variability in working memory performance explained by epistasis vs polygenic

scores in the ZNF804A pathway. JAMA Psychiatry, 71(7), 778–85.

http://doi.org/10.1001/jamapsychiatry.2014.528

O’Donovan, M. C., Craddock, N., Norton, N., Williams, H., Peirce, T., Moskvina, V., …

Cloninger, C. R. (2008). Identification of loci associated with schizophrenia by genome-

wide association and follow-up. Nat Genet, 40(9), 1053–1055. http://doi.org/ng.201

[pii]10.1038/ng.201

O’Roak, B. J., Deriziotis, P., Lee, C., Vives, L., Schwartz, J. J., Girirajan, S., … Eichler, E. E.

(2011). Exome sequencing in sporadic autism spectrum disorders identifies severe de novo

mutations. Nature Genetics, 43(6), 585–9. http://doi.org/10.1038/ng.835

Ota, M., Hori, H., Sato, N., Yoshida, F., Hattori, K., Teraishi, T., & Kunugi, H. (2016). Effects of

ankyrin 3 gene risk variants on brain structures in patients with bipolar disorder and healthy

subjects. Psychiatry and Clinical Neurosciences, 70(11), 498–506.

http://doi.org/10.1111/pcn.12431

Perlis, R. H., Purcell, S., Fagerness, J., Kirby, A., Petryshen, T. L., Fan, J., … T, H. (2008).

Family-Based Association Study of Lithium-Related and Other Candidate Genes in Bipolar

Disorder. Archives of General Psychiatry, 65(1), 53.

http://doi.org/10.1001/archgenpsychiatry.2007.15

Perona, M. T. G., Waters, S., Hall, F. S., Sora, I., Lesch, K.-P., Murphy, D. L., … Uhl, G. R.

(2008). Animal models of depression in dopamine, serotonin, and norepinephrine

transporter knockout mice: prominent effects of dopamine transporter deletions.

Behavioural Pharmacology, 19(5–6), 566–574.

http://doi.org/10.1097/FBP.0b013e32830cd80f

Petty, F. (1995). GABA and mood disorders: a brief review and hypothesis. Journal of Affective

Disorders, 34(4), 275–81. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/8550953

85

Polderman, T. J. C., Benyamin, B., de Leeuw, C. A., Sullivan, P. F., van Bochoven, A., Visscher,

P. M., & Posthuma, D. (2015). Meta-analysis of the heritability of human traits based on

fifty years of twin studies. Nature Genetics, 47(7), 702–709. http://doi.org/10.1038/ng.3285

Purcell, S. M., Wray, N. R., Stone, J. L., Visscher, P. M., O’Donovan, M. C., Sullivan, P. F., &

Sklar, P. (2009). Common polygenic variation contributes to risk of schizophrenia and

bipolar disorder. Nature, 460(7256), 748–752. http://doi.org/nature08185

[pii]10.1038/nature08185

Quinn, E. M., Hill, M., Anney, R., Gill, M., Corvin, A. P., & Morris, D. W. (2010). Evidence for

cis-acting regulation of ANK3 and CACNA1C gene expression. Bipolar Disorders, 12(4),

440–5. http://doi.org/10.1111/j.1399-5618.2010.00817.x

Ramasamy, A., Trabzuni, D., Guelfi, S., Varghese, V., Smith, C., Walker, R., … Weale, M. E.

(2014). Genetic variability in the regulation of gene expression in ten regions of the human

brain. Nature Neuroscience, 17(10), 1418–1428. http://doi.org/10.1038/nn.3801

Richard, A. C., Lyons, P. A., Peters, J. E., Biasci, D., Flint, S. M., Lee, J. C., … Smith, K. G.

(2014). Comparison of gene expression microarray data with count-based RNA

measurements informs microarray interpretation. BMC Genomics, 15(1), 649.

http://doi.org/10.1186/1471-2164-15-649

Ripke, S., Neale, B. M., Corvin, A., Walters, J. T. R., Farh, K.-H., Holmans, P. A., …

O’Donovan, M. C. (2014). Biological insights from 108 schizophrenia-associated genetic

loci. Nature, 511(7510), 421–427. http://doi.org/10.1038/nature13595

Rosenberg, S. S., & Spitzer, N. C. (2011). Calcium Signaling in Neuronal Development. Cold

Spring Harbor Perspectives in Biology, 3(10), a004259–a004259.

http://doi.org/10.1101/cshperspect.a004259

Roussos, P., Mitchell, A. C., Voloudakis, G., Fullard, J. F., Pothula, V. M., Tsang, J., … Sklar, P.

(2014). A Role for Noncoding Variation in Schizophrenia. Cell Reports, 9(4), 1417–29.

http://doi.org/10.1016/j.celrep.2014.10.015

86

Ruderfer, D. M., Fanous, A. H., Ripke, S., McQuillin, A., Amdur, R. L., Gejman, P. V, …

Kendler, K. S. (2013). Polygenic dissection of diagnosis and clinical dimensions of bipolar

disorder and schizophrenia. Molecular Psychiatry. http://doi.org/10.1038/mp.2013.138

Rueckert, E. H., Barker, D., Ruderfer, D., Bergen, S. E., O’Dushlaine, C., Luce, C. J., … Sklar, P.

(2013). Cis-acting regulation of brain-specific ANK3 gene expression by a genetic variant

associated with bipolar disorder. Molecular Psychiatry, 18(8), 922–9.

http://doi.org/10.1038/mp.2012.104

Schildkraut, J. J. (1974). Biogenic Amines and Affective Disorders. Annual Review of Medicine,

25(1), 333–348. http://doi.org/10.1146/annurev.me.25.020174.002001

Schloesser, R. J., Huang, J., Klein, P. S., & Manji, H. K. (2008). Cellular plasticity cascades in

the pathophysiology and treatment of bipolar disorder. Neuropsychopharmacology : Official

Publication of the American College of Neuropsychopharmacology, 33(1), 110–133.

http://doi.org/10.1038/sj.npp.1301575

Schutter, D. J. L. G., & van Honk, J. (2005). The cerebellum on the rise in human emotion. The

Cerebellum, 4(April), 290–294. http://doi.org/10.1080/14734220500348584

Sekar, A., Bialas, A., de Rivera, H., Davis, H., Hammond, T., Kamitaki, N., … McCarrol, S. A.

(2016). Schizophrenia risk from complex variation of complement component 4. Nature,

177–83.

Shinn, A. K., Roh, Y. S., Ravichandran, C. T., Baker, J. T., Öngür, D., Cohen, B. M., & al., et.

(2016). Aberrant Cerebellar Connectivity in Bipolar Disorder With Psychosis. Biological

Psychiatry: Cognitive Neuroscience and Neuroimaging, 79(0), 231–238.

http://doi.org/10.1016/j.bpsc.2016.07.002

Shirahata, E., Iwasaki, H., Takagi, M., Lin, C., Bennett, V., Okamura, Y., & Hayasaka, K. (2006).

Ankyrin-G Regulates Inactivation Gating of the Neuronal Sodium Channel, Nav1.6. Journal

of Neurophysiology, 96(3), 1347–1357. http://doi.org/10.1152/jn.01264.2005

Singh, T., Kurki, M. I., Curtis, D., Purcell, S. M., Crooks, L., McRae, J., … Barrett, J. C. (2016).

87

Rare loss-of-function variants in SETD1A are associated with schizophrenia and

developmental disorders. Nature Neuroscience, 19(4), 571–577.

http://doi.org/10.1038/nn.4267

Sklar, P., Gabriel, S. B., McInnis, M. G., Bennett, P., Lim, Y.-M., Tsan, G., … Lander, E. S.

(2002). Family-based association study of 76 candidate genes in bipolar disorder: BDNF is

a potential risk locus. Molecular Psychiatry, 7(6), 579–593.

http://doi.org/10.1038/sj.mp.4001058

Sklar, P., Ripke, S., Scott, L. J., Andreassen, O. A., Cichon, S., Craddock, N., … Psychiatric

GWAS Consortium Bipolar Disorder Working Group. (2011). Large-scale genome-wide

association analysis of bipolar disorder identifies a new susceptibility locus near ODZ4. Nat

Genet, 43(10), 977–983. http://doi.org/ng.943 [pii] 10.1038/ng.943

Smoller, J. W., & Finn, C. T. (2003). Family, twin, and adoption studies of bipolar disorder.

American Journal of Medical Genetics Part C: Seminars in Medical Genetics, 123C(1), 48–

58. http://doi.org/10.1002/ajmg.c.20013

Soares, J. C., & Mann, J. J. (1997). The anatomy of mood disorders—review of structural

neuroimaging studies. Biological Psychiatry, 41(1), 86–106. http://doi.org/10.1016/S0006-

3223(96)00006-6

Song, J., Bergen, S. E., Di Florio, A., Karlsson, R., Charney, A., Ruderfer, D. M., … Belliveau,

R. A. (2016). Genome-wide association study identifies SESTD1 as a novel risk gene for

lithium-responsive bipolar disorder. Molecular Psychiatry, 21(9), 1290–7.

http://doi.org/10.1038/mp.2015.165

Stoodley, C. J. (2012). The cerebellum and cognition: Evidence from functional imaging studies.

In Cerebellum (Vol. 11, pp. 352–365). http://doi.org/10.1007/s12311-011-0260-7

Subramanian, A., Tamayo, P., Mootha, V. K., Mukherjee, S., Ebert, B. L., Gillette, M. a, …

Mesirov, J. P. (2005). Gene set enrichment analysis: a knowledge-based approach for

interpreting genome-wide expression profiles. Proceedings of the National Academy of

88

Sciences of the United States of America, 102(43), 15545–50.

http://doi.org/10.1073/pnas.0506580102

Tao, R., Cousijn, H., Jaffe, A. E., Burnet, P. W. J., Edwards, F., Eastwood, S. L., … Kleinman, J.

E. (2014). Expression of ZNF804A in Human Brain and Alterations in Schizophrenia,

Bipolar Disorder, and Major Depressive Disorder: A Novel Transcript Fetally Regulated by

the Psychosis Risk Variant rs1344706. JAMA Psychiatry.

http://doi.org/10.1001/jamapsychiatry.2014.1079

Tesli, M., Egeland, R., Sønderby, I. E., Haukvik, U. K., Bettella, F., Hibar, D. P., … Andreassen,

O. A. (2013). No evidence for association between bipolar disorder risk gene variants and

brain structural phenotypes. Journal of Affective Disorders, 151(1), 291–297.

http://doi.org/10.1016/j.jad.2013.06.008

Trost, S., Diekhof, E. K., Mohr, H., Vieker, H., Krämer, B., Wolf, C., … Gruber, O. (2016).

Investigating the Impact of a Genome-Wide Supported Bipolar Risk Variant of MAD1L1

on the Human Reward System. Neuropsychopharmacology, 41(11), 2679–2687.

http://doi.org/10.1038/npp.2016.70

Tsuji, S., Morinobu, S., Tanaka, K., Kawano, K., & Yamawaki, S. (2003). Lithium, but not

valproate, induces the serine/threonine phosphatase activity of protein phosphatase 2A in

the rat brain, without affecting its expression. Journal of Neural Transmission, 110(4), 413–

425. http://doi.org/10.1007/s00702-002-0798-0

Tsukasaki, K., Miller, C. W., Greenspun, E., Eshaghian, S., Kawabata, H., Fujimoto, T., …

Koeffler, H. P. (2001). Mutations in the mitotic check point gene, MAD1L1, in human

cancers. Oncogene, 20(25), 3301–3305. http://doi.org/10.1038/sj.onc.1204421

Uhlen, M., Fagerberg, L., Hallstrom, B. M., Lindskog, C., Oksvold, P., Mardinoglu, A., …

Ponten, F. (2015). Tissue-based map of the human proteome. Science, 347(6220), 1260419–

1260419. http://doi.org/10.1126/science.1260419 van Zessen, R., Phillips, J. L., Budygin, E. A., & Stuber, G. D. (2012). Activation of VTA GABA

89

Neurons Disrupts Reward Consumption. Neuron, 73(6), 1184–1194.

http://doi.org/10.1016/j.neuron.2012.02.016

Welter, D., MacArthur, J., Morales, J., Burdett, T., Hall, P., Junkins, H., … Parkinson, H. (2014).

The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids

Research, 42(D1). http://doi.org/10.1093/nar/gkt1229

Williams, H. J., Norton, N., Dwyer, S., Moskvina, V., Nikolov, I., Carroll, L., … O’Donovan, M.

C. (2011). Fine mapping of ZNF804A and genome-wide significant evidence for its

involvement in schizophrenia and bipolar disorder. Mol Psychiatry, 16(4), 429–441.

http://doi.org/mp201036 [pii]10.1038/mp.2010.36

Xu, W., Cohen-Woods, S., Chen, Q., Noor, A., Knight, J., Hosang, G., … Vincent, J. B. (2014).

Genome-wide association study of bipolar disorder in Canadian and UK populations

corroborates disease loci including SYNE1 and CSMD1. BMC Medical Genetics, 15(1), 2.

http://doi.org/10.1186/1471-2350-15-2

90

ENRICHMENT OF COMMON RISK VARIANTS FOR SCHIZOPHRENIA AND BIPOLAR DISORDER IN PATHWAYS AND REGULATORY ELEMENTS

ABSTRACT

Genome-wide association studies (GWAS) provide a wealth of information that can serve as a backbone for making biological inferences. There are over 160 loci reported with genome-wide significant association with schizophrenia (SZ) and bipolar disorder (BD), and thousands more nominally associated loci that contribute to risk

(Purcell et al., 2009; Welter et al., 2014). SNPs with the strongest association to SZ and

BD are predominately in non-coding or intergenic regions. It is challenging to decipher the function of non-coding risk SNPs, but there are mounting evidence that non-coding risk variants play a role in the regulation of gene expression. Functional and regulatory maps from the ENCODE and Roadmap projects offer detailed annotations of the genome and epigenome, which can help bridge the gap between GWAS signals and biological mechanisms. I created a software tool called Functional LD-clump EnrichmEnt Test

(FLEET) to help decipher the underlying biology of GWAS signals by making use of pathway databases and detailed genomic annotations obtained from the ENCODE and

Roadmap Projects. Possible utilities of this software include: prioritizing risk genes for fine-mapping, illuminating potential regulatory mechanisms that underlie susceptibility, and identifying molecular substrates that are common to psychiatric disorders. Analysis of SZ and BD revealed widespread associations with histone markers and DNA accessibility in brain, immune and other peripheral tissues. In addition, significant enrichmeant was detected for gene sets that circumscribe neurotransmission and

91 neurodevelopment, histone modification enzymes, and target sites of transcription factors, microRNAs, and RNA-binding proteins.

INTRODUCTION

The heritability of SZ and BD is approximately 60 – 80% according to twin- and family-based studies. Despite being highly heritable, identifying risk genes for SZ and

BD remains a challenge. The largest GWAS of SZ to date (cases = 36,989, controls =

113,075) identified 108 independent loci surpassing genome-wide significance, which implicated over 300 protein coding genes plus non-coding RNAs (19 microRNAs, 13 small nucleolar RNAs, and one long noncoding RNA) and together explain 7% of risk for

SZ (Ripke et al., 2014). Further, the largest GWAS of BD to date (cases = 9,784, controls

= 30,471) identified six SNPs with genome-wide significance within 100 kilobases of 16 protein coding genes and 3 microRNAs. Lead SNPs related to SZ and BD are predominately non-coding variants (i.e., found introns or untranslated regions) or fall intergenic regions (i.e., “gene deserts”). Endeavors such as the ENCODE project and the

Roadmap Epigenomics project have yielded detailed annotations of the human genome including cell- and tissue-specific epigenomic landscapes. This rich set of information can be used for deciphering functional relationships of GWAS signals. Determining functional roles of non-coding risk variants is an arduous task and has only been employed for a few risk genes to date (Cohen et al., 2015; Roussos et al., 2014; Sekar et al., 2016). Genome-wide statistical approaches can help accelerate this effort by identifying positional relationships of GWAS signals that relate to functional elements.

This follows the assumption that there is a non-random organization to GWAS signals

92 with respect vis-à-vis tagging genes from a specific biological pathway or type of histone modification. There is a growing demand for user-friendly bioinformatics tools that can bring this into fruition.

Individual-level GWAS data is typically controlled access, which can be prohibitive for developing or testing hypotheses. There is a critical need for computational approaches that can be applied to publically available GWAS summary statistics. To fill this critical gap, I created a software tool called FLEET as a framework for analyzing thousands of annotations and pathways, including cell- and tissue-specific information when acceptable. A tutorial to install and run FLEET is available on Github

(https://github.com/hessJ/FLEET). FLEET requires a single GWAS file with two columns of summary data (SNP name and p-value) to operate. FLEET is divided into five distinct modules; the first four modules pre-process and format data and the last module executes the enrichment tests. I used FLEET to analyze summary statistics from the largest available GWAS meta-analyses of SZ (Ripke et al., 2014) and BD (Hou et al.,

2016).

METHODS

PRE-PROCESSING GWASS OF SZ AND BD

I downloaded summary statistics for the largest available GWASs for SZ (Ripke et al., 2014) and BD (Hou et al., 2016) to date, which contained results for ~9 million single nucleotide polymorphisms (SNPs) that are common in the population (minor allele frequency > 5%). A two-step process of quality control was used to ensure that enrichment tests were not confounded by linkage disequilibrium (LD). First, genome-

93 wide SNP genotypes from 1000 Genome European reference panel were pruned using the

Plink algorithm --indep (parameters: 50 5 2), which yielded a set of SNPs that are roughly independent based on short range LD. GWAS signals were then further pruned for long range LD using the --clump algorithm in Plink, which forms “clumps” of SNPs around a single index SNP in a 1000 kilobase region. An index SNP is chosen to represent all tagged SNPs in a “clump” according to its association GWAS significance.

This process yielded ~100,000 linkage-independent clumps for SZ and BD.

REFERENCE DATA FOR REGULATORY ELEMENTS AND POPULATION-LEVEL

GENOTYPES

I retrieved reference maps for the following gene regions from the database: exons, introns, 3ʹ/5ʹ untranslated regions, exon-intron splice junctions

(including a 20 base upstream/downstream window), and promoters. Pathway-level annotations for genes were acquired from (GO) and PANTHER databases and pruned down to pathways with > 20 or < 200 genes, yielding 558 and 85 pathways for GO and PANTHER, respectively. I expanded to borders of each gene by 20 kilobases to capture potential regulatory variants across pathway-level annotations

(Veyrieras et al., 2008). I obtained genomic coordinates for cell- and organ-specific enhancers (included a 100 base upstream/downstream window) derived from an atlas of

Cap Analysis of Gene Expression (CAGE) that were acquired by the FANTOM3 and

FANTOM4 projects produced by the RIKEN Omics Science Center and released through the package FANTOM3and4CAGE in Bioconductor. The remaining annotations were downloaded directly from the ENCODE data matrix webpage. DNA- and RNA-binding

94 protein target sites were retrieved in narrowPeak BED files, which contained genomic positions from ChIP-Seq and eCLIP peaks, respectively. Transcription factor targets sites with optimal irreproducibility rate thresholded peaks (reliable) were kept for analysis.

Biological replicates for eCLIP data were compared to identify RNA-binding protein target sites that overlapped a minimum of 10%. The most up-to-date data for DNA accessibility regions (status “released” by ENCODE) generated by DNase-seq and histone modification markers generated by ChIP-seq were used for this analysis.

Genomic coordinates for each (epi)genomic annotation were converted into a Rdata file formatted as a GenomicRanges object, then uploaded to a Synapse project portal

(https://www.synapse.org/) under the Synapse ID syn8547501. There, 1000 Genomes reference data for European founders in Plink format were uploaded after quality control to remove uncommon variants (minor allele frequency < 5%), variants deviating from

Hardy-Weinberg equilibrium at a p-value < 1e-06, and samples or variants exhibiting genotype missingness greater than 2%. SNPs from the 1000 Genomes reference data sets were annotated with regulatory elements pulled from ENCODE, Roadmap, and Entrez.

Index SNPs were added to annotation categories if displaying high LD to a gene set or regulatory element (r2 > 0.6). Categorical assignments for index SNPs were coded as 0

(not linked to annotation) or 1 (linked to annotation).

ENRICHMENT TESTS

FLEET uses a linear regression framework to test for a global enrichment of regulatory elements across GWAS loci. This method assumes that the phenotype of interest is polygenic, i.e., numerous loci in the genome contributing to a phenotype, and is untested for Mendelian-like phenotypes. SNPs in the extended major

95 histocompatibility locus (MHCx, chr6:24MB – 35MB) are discarded from analysis due to complex LD and high gene density in this region. A weighted linear regression model framework is used that adjusts for confounding variation due to minor allele frequency and LD. FLEET regresses the categorical assignments of index SNPs (coded 0 or 1) onto their z-scores transformed from the p-value column of the GWAS summary statistics file.

I controlled for between-clump variation in minor allele frequency (sum total for SNPs in

LD with index) and number of SNPs tagged by an index SNP in the regression using covariates. The inverse of the sum of minor allele frequency within LD-clumps was used to weight the regression (i.e., applying more weight to lower frequency variants). This follows the assumption that lower frequency variants yield stronger effects than more common variants, and may be functionally relevant (Moore et al., 2013; Zeggini &

Morris, 2015). If an annotation showed strong global enrichment (P < 5e-05), then

FLEET automatically performed a permutation-based approach by scanning genome- wide significant SNPs (P < 5e-08) to determine if the annotation is over-represented among the most significant risk variants. The proportion of index SNPs in each bin linked with annotation were compared to the proportion of randomly selected index SNPs equal in length and matched by sum total minor allele frequency of their clumps linked with the same annotation. Matching on minor allele frequency is important for ensuring that index markers within the same allele frequency spectrum and LD score are being compared, as these two variables are highly correlated (Bulik-Sullivan et al., 2015). A total of 1,000 permutations were performed to generate an empirical p-value of enrichment. FLEET produces two plots that summarize enrichment of each annotation category analyzed, and comma separated tables containing annotation-level summary statistics (files separated

96 by category) from weighted regression models and permutations. A column is populated in these tables with genome-wide significant variants (P < 5e-08) that overlap an annotation. A significance threshold of Benjamini-Hochberg adjusted p-value (FDRp) <

0.05 was used to account for multiple comparisons and the non-independence between tests (i.e., correlated epigenetic markers and overlapping gene sets).

CONTROL AND OTHER PSYCHIATRIC GWASS

I applied FLEET to analyze GWAS summary statistics for a control phenotype

(type II diabetes (Morris et al., 2012)) and then compare enrichment results with SZ and

BD. This follows evidence that type II diabetes has little genetic relationship with psychiatric disorders (Bulik-Sullivan et al., 2015). In addition, summary statistics for the largest GWAS of autism spectrum disorder (ASD) conducted by the Psychiatric

Genomics Consortium (cases = 5,305, pseudocontrols = 5,305) were analyzed in FLEET.

Pair-wise correlations (Pearson’s r) were calculated between GWAS based on enrichment statistics to evaluate similarities between phenotypes.

RESULTS

FLEET uncovered 558 and 548 annotations that are globally enriched with

GWAS signals associated with SZ and BD (FDRp < 0.05), and 329 annotations enriched in both disorders. Summary statistics for all 2,089 annotations tested in GWASs of SZ and BD are visually displayed in Figure 1. The top 10% of annotations that showed joint association with SZ and BD are reported in Table 1 (full table of results can be made available upon request). Gene sets and annotations were highly correlated between SZ

97 and BD (r = 0.61) and between autism spectrum disorder (r = 0.54, Figure 1C). Type II diabetes showed a moderate correlation (r = 0.14 – 0.22) with these psychiatric disorders, suggesting that there is partial overlap among molecular substrates (Figure 1D). The number of annotations with a shared association with SZ and BD is displayed is Figure

1E. Histone modifications were over-represented among annotations that emerged as globally significant from analyses of SZ and BD (~38%). Nine (out of 12) and 25 (out of

33) of individual histone modifications associated with SZ and BD originated from brain and immune system tissues, respectively. Figure 2 displays the enrichment p-values for histone modifications stratified by tissue type. In general, histone modifications from central nervous system were more strongly associated with SZ above other tissues

(Figure 2A and 2C) whereas histone modifications from the immune system exhibited the strongest overall association with BD (Figure 2D). The top ranking histone modification from brain that was jointly associated with SZ and BD was histone 3 lysine

4 mono-methylation (H3K4me1) from temporal lobe (FDRp = 2.12e-36 and FDRp =

3.03e-16, respectively).

Open chromatin marks designated by DNase I hypersensitivity sites were also investigated for their association with SZ and BD variants. I found several DNase I hypersentivity sites from central nervous system (astrocytes of hippocampus, astrocytes of the spinal cord, choroid plexus, globus pallidus, midbrain, and middle frontal gyrus), immune system (B cells, CD14+ monocytes, myeloid dendritic cells, T-helper cells, and thymus) and multiple tissues in the periphery globally enriched with SZ and BD risk loci

(FDRp < 0.05). DNase I hypersensitivity sites in globus pallidus and spinal cord were significantly enriched with genome-wide significant index SNPs for SZ (permutation P <

98

0.001). In addition, DNase I hypersentivity sites in whole brain and thymus were enriched with genome-wide significant markers from BD (permutation P = 0.018 and permutation P = 0.009, respectively).

Based on analysis of FANTOM data, active enhancers from blood cells exhibited a strong shared association with SZ (FDRp = 1.87e-07) and BD (FDRp = 6.73e-15).

Active enhancers from cerebrum (FDRp = 1.24e-04), whole brain (FDRp = 9.48e-03), adrenal gland (FDRp = 1.05e-02), cerebellum (FDRp = 1.1e-02), parietal lobe (FDRp =

1.17e-02), and liver (FDRp = 4.1e-02) showed a significant association with SZ.

Genome-wide significant index markers were not enriched in FANTOM enhancers for

SZ or BD, however, there was a border significant enrichment of genome-wide significant SZ risk SNPs detected for cerebellum enhancers (permutation P = 0.061).

Pathways related to neurotransmitter receptor activity (ionotropic/metabotropic glutamate receptors, calcium channel activity, endogenous cannabinoid signaling, and sodium channel activity), dendritic spines, and presynaptic structure were strongly and significantly associated with SZ and BD. Multiple gene sets that play a role transcriptional regulation, neuronal function, and cell differentiation/growth were found to be significantly associated in SZ but not as strongly with BD, including: histone binding (FDRp = 2.46e-08), postsynaptic density (FDRp = 5.44e-06), axonal growth cone

(FDRp = 8.59e-03), clathrin-coated vesicles (FDRp = 0.039), ephrin receptor binding

(FDRp = 1.05e-06), GABA B recepetor II signaling (FDRp = 2.36e-03), histone methyltransferase activity (FDRp = 8.59e-03), histone deacetyltransferase complex

(FDRp = 0.023), insulin/IGF-pathway signaling cascade (FDRp = 3.07e-

02), and SMAD targets (FDRp = 7.84e-04). Genes related to the postsynaptic density had

99 the highest density of genome-wide significant index SNPs for SZ among the gene sets analyzed (n SNPs = 21, permutation P < 0.001). Conversely, multiple gene sets involved in protein sorting, energy homeostasis, response to oxidative or metabolic stress showed a stronger association with BD compare to SZ, including: endoplasmic reticulum-to-Golgi transport vesicles (FDRp = 9.7e-03), cytokine binding (FDRp = 0.022), mitochondrial respiratory transport chain (FDRp = 0.016), mast cell granules (FDRP = 2.47e-03), activin beta signaling (FDRp = 1.87e-03), p53 pathway (FDRp = 5.35e-03), hypoxia response by HIF activation (FDRp = 4.3e-03), glass bottom boat (GBB) signaling (FDRp

= 0.014).

There were 66 transcription factor proteins whose target sites exhibited a shared enrichment across SZ and BD, several of which are highly expressed in the brain [median

RPKM > 10 in multiple regions (The GTEx Consortium, 2015)], including: CHD2,

CTCF, BHLHE40, ARID4B, BCL6, JUND, L3MBTL2, MXI1, RBFOX2, and ZEB2. A total of three RNA-binding protein target sites showed a shared enrichment across disorders: CSTF2T, PCPB2, and RBOX2. There were 11 RNA-binding protein whose target sites were significantly associated with BD but not SZ: XRN2 (FDRp = 0.47),

DDX3X (FDRp = 0.043), NONO (FDRp = 0.42), HNRNPM (0.038), SFPQ (FDRp =

0.035), SF3B4 (P = 0.026), AGGF1 (FDRp = 0.02), NCBP2 (FDRp = 9.93e-03),

FAM120A (FDRp = 5.4e-03), NKRF (FDRp = 2.7e-03), and TAF15 (FDRp = 1.9e-03).

Further, target sites six RNA-binding proteins were found to be significantly associated with SZ but not BD: ILF3 (FDRp = 1.8e-03), DROSHA (FDRp = 0.012), RPS11 (FDRp

= 0.014), METAP (FDRp = 0.016), FXR1 (FDRp = 0.021), and KHSRP (FDRp = 0.045).

One microRNA family showed a significant association with BD, namely miR-34-

100

5p/449-5p (FDRp = 8.0e-04), which showed no evidence of association with SZ (P =

0.96). A broader analysis of gene regions revealed significant enrichments of SZ and BD risk variants in exons, introns, and 3ʹ untranslated regions.

DISCUSSION

OVERVIEW OF FLEET

In this study, I introduced a software tool called FLEET to aid in the biological interpretation of GWAS signals obtained for SZ and BD. Inferring mechanisms from individual GWAS signals can be problematic due to high gene density, extensive LD, or lack of nearby genes, which can obfuscate analyses. Most of the GWAS signals for SZ and BD reside in non-coding DNA, implying that abnormalities in gene expression or splicing might account for risk. However, regulation of gene expression and splicing is multifactorial and can be challenging to investigate in the context of GWAS signals without prior knowledge of regulatory molecules or elements that might be disrupted by risk variants. Moreover, identifying gene networks that are enriched with risk variants can offer insight into pathways that may be affected by transcriptional or splicing dysregulation. FLEET is meant to fill a critical gap between collection of GWAS results and experimental follow-up. It has advantages over other methods like FunciSNP

(Coetzee et al., 2016) or GARFIELD (Iotchkova et al., 2016), which focus on only the strongest variants from GWAS. FLEET is capable of modeling genome-wide enrichment statistics and performing permutation tests on top GWAS markers; both tests include adjustments for confounding variation (i.e., LD and minor allele frequency). Installation

101 of FLEET comes with a pre-formatted reference database with >2,000 annotations, however, users can provide their own custom annotations for analysis. In addition,

FLEET can be readily modified to assess continuous measures as opposed to categorical information (i.e., gene expression values of a gene set or fold-change in histone modifications), which will be of value when “omics” summary data are released by

PsychENCODE.

BIOLOGICAL RELEVANCE OF FINDINGS

This study provides key evidence of gene sets and molecular substrates that are unique to and common between SZ and BD. Beyond this study, there are converging lines of evidence that chromatin dynamics and transcriptional regulation are fundamental to the etiology of psychiatric disorders. After screening thousands of annotations, I found

329 that were similarly associated with SZ and BD, including 160 cell- or tissue-specific histone modifications, 53 cell- and tissue-specific DNase I hypersensitive sites, 2 cell- specific FANTOM enhancers, 66 transcription factors, 40 gene sets and pathways, 3

RNA-binding proteins, and 3 gene regions. These findings are encouraging and informative for future studies of SZ and BD. The genetic overlap between SZ and BD is well established by epidemiological studies, in addition to GWAS evidence implicating shared genes and pathways (Cross Disorder Group of the Psychiatric Genomics

Consortium, 2013; Power et al., 2015; Purcell et al., 2009; Schulze et al., 2014; The

Network and Pathway Analysis Subgroup of the Psychiatric Genomics Consortium,

2015). My study takes this one step further to implicate specific regulatory molecules and pathways that are shared and different between SZ and BD. This knowledge can be used to extend the general framework of common polygenic scoring analysis (Purcell et al.,

102

2009), wherein disorder-specific polygenic scores could be developed and tested by selecting variants from pathways/regulatory elements that are most dissimilar between SZ and BD. A group recently generated disorder-specific GWASs for SZ and BD (Ruderfer et al., 2013), which can be used to inform pathway-level risk scoring analyses. This would be another step toward elucidating the pathophysiological trajectories of these disorders.

MicroRNA family members miR-34a and miR-449a showed a significant association with BD but not SZ. The genome-wide significant risk genes for BD ANK3 and CACNB3 are experimentally verified targets of miR-34a, and over-expression of miR-34a has been shown to disrupt neuronal differentiation and synaptogenesis

(Bavamian et al., 2015). Although the genetic evidence of association miR-34a/449a with

SZ is not strong, downstream targets of these microRNAs are significantly over- represented among differentially expressed genes in SZ, including those identified in postmortem brain and ex vivo blood analyses (Hess et al., 2016). In addition, miR-34a was reported to be over-expressed in SZ individuals in the brain (Kim et al., 2010) and blood (Lai et al., 2011). MiR-34a may serve as a promising may help to explain deficits in neurodevelopment and neurobehavioral-cognitive changes that are stereotypical of these disorders.

H3K4me1 histone modifications are functionally linked with the transcriptional regulator SIN3A(Cheng et al., 2014), which were jointly associated with SZ and BD in this study. There is growing interest in the effect of environmental exposures and genetic variations on chromatin dynamics within psychiatry, especially with the recent discovery of a deleterious rare variant in the histone methyltransferase gene SETD1A associated

103 with SZ risk (Singh et al., 2016). The loss of SETD1A dysregulates the H3K4 methylation pathway during brain development, which may lead to abnormalities in gene expression such as untimely activation/repression of genes or ectopic expression of genes in the wrong cell type (Vallianatos & Iwase, 2015). Expanding on this, target sites of the transcriptional repressor protein REST were significantly enriched among GWAS signals for SZ and BD. REST plays a critical role in neurogenesis by triggering activation of neuronal genes (neurotransmitter receptors, ion channels, and synaptic vesicle proteins) in progenitor cells (Ballas, Grunseich, Lu, Speh, & Mandel, 2005). REST interacts with co-repressors (SIN3A and CoREST complex) to suppress transcription through recruitment of histone deacetylases, which enzymatically remove acetyl groups from histone tails to create more positively charged nucleosomes and tighter DNA wrapping

(Ballas et al., 2005). This affords valuable insight into the underlying role of chromatin dynamics and transcriptional regulation in risk for SZ and BD.

The transcriptional regulator called ARID4B showed cross-disorder enrichment in this study and is of interest given that this gene is significantly up-regulated in the peripheral blood of BD-affected individuals, but is not differentially expressed in SZ

(presented later in the chapter “Transcriptomic Abnormalities in Bipolar Disorder and Discrimination of the Major Psychoses”). This finding is consistent with evidence from a recent blood-based transcriptomic analysis of youth at high risk for BD designated

ARID4B as a potential risk gene based on differential expression compared to unaffected comparison subjects (Fries et al., 2017). Furthermore, ARID4B shows strong evidence of dysregulation in the blood of individuals affected with autism (Kong et al., 2012). The protein ARID4B is a subunit of the SIN3A co-repressor complex and participates in cell

104 cycle regulation, oncogenesis, protection from DNA damage, and apoptosis (Winter,

Lukes, Walker, Welch, & Hunter, 2012; Wu, Eldin, & Beaudet, 2008). This evidence suggests that genetic or transcriptomic alteration to ARID4B is relevant for BD pathophysiology. I found additional evidence of overlapping genomic and transcriptomic abnormalities between this and other studies. Four out of the 66 transcription factors that were jointly associated with SZ and BD in this study (REST, KLF1, JUN, and MYC) are major upstream regulators in the interactomes of differentially expressed genes associated with SZ (Hess et al., 2016). Interestingly, these four transcriptional regulators do not possess strong GWAS associations with SZ but there are multiple genome-wide significant risk variants for SZ and BD linked with their target sites. In this study, I found that REST target sites are tagged by three genome-wide risk SNPs for BD, which was statically greater than expected by chance (permutation P = 0.012), namely: rs2517959

(PGAP3, intronic), rs4236274 (MAD1L1, intronic), and rs7105749 (13kb from

AP003498.3). According to evidence from the Genotype-Tissue Expression project, rs2517959 is a significant eQTL for PGAP3 (multiple tissues, P range = 3.7e-15 – 7.2e-

05), therefore exploring the mechanism of rs2517959-mediated regulation of PGAP3 is warranted. REST was one of 38 transcription factor target sites in LD with rs2517959, some others being: CTCF, JUND, SIN3A, MLLT1, and ARID4B. Identifying risk variants that disrupt target sites of transcription regulators important next step for relating

GWAS signals for SZ and BD to biological mechanisms.

SZ and BD shared many significant associations, however, regulatory elements and DNA or RNA-regulatory factors that differed between these disorders are of interest to this study. One divergent relationship that was noteworthy was a significant

105 association between the RNA-binding protein FXR1 identified for SZ that was not seen in BD. FXR1 encodes the protein Fragile X Mental Retardation Autosomal Homolog 1, which interacts with members of the fragile X gene family (FMR1 and FMR2) that have been implicated in autism and other neurodevelopmental disorders (Bailey, Hatton,

Skinner, & Mesibov, 2001) (Wall et al., 2009). Loss of FXR1 has been linked with deficits in eye and neural crest cell development in Xenopus laevis (Gessert, Bugner,

Tecza, Pinker, & Kühl, 2010). In addition, there is a genome-wide significant risk signal in the upstream/promoter region of FXR1 associated with SZ (Ripke et al., 2014). There may be a possible epistatic relationship between FXR1 and risk variants in its target sites that mediate susceptibility for SZ, but this has not yet been reported. Epistasis is thought to be an important source of “missing heritability”, i.e., the gap between the heritability reported for twins and the amount of heritability captured by additive effects of common variants. It is not possible to model epistasis with summary statistics alone; individual- level genotypes are required.

At the level of gene sets, there also appears to be some divergence between SZ and BD. Differences between gene sets identified in this study expands on recent pathway-level analyses published by the Psychiatric Genomics Consortium (The

Network and Pathway Analysis Subgroup of the Psychiatric Genomics Consortium,

2015). Namely, I found multiple gene sets related to neurodevelopment, axon/neuron structure, neurotransmitter regulation, and histone modification that were highly enriched with risk signals for SZ but did not have as strong of an association with BD. Conversely, gene sets related to protein trafficking, energy and metabolism, and cellular response to stress were typically seen among the top pathways associated with BD but did not seem

106 to be as relevant for SZ. Furthermore, genome-wide significant risk variants for SZ and

BD showed discordance with respect to the tissues that they were over-represented in, namely brain-derived histone modifications bearing a stronger enricher of top SZ risk variants versus histone modifications in immune cells exhibiting a prominent enrichment of BD risk variants well above those in brain (Figure 2). This is profound given that SZ and BD exhibit strong genetic correlation, phenotypic resemblance, and co-occurrence in families. These data suggest that SZ and BD exhibit underlying differences in gene expression regulation coupled with unique vulnerabilities in biological pathways.

Investigating these relationships further at the level of transcriptomic signatures is warranted, as this may lead us toward an understanding of how molecular mechanisms modify pathophysiological trajectories toward SZ or BD. This work provides an early proof-of-concept that integrating poly-omics data can be used to distinguish shared and discordant molecular associations between etiologically similar phenotypes. Furthermore,

FLEET is a promising tool for interpretation of GWAS signals that can be adapted to analyze a wide array of annotations and types of data. This study offers insight into potential biological pathways and regulatory mechanisms that are disrupted by risk variants for SZ and BD. This genome-wide first approach helped to discern biological relationships that are not obvious from examining GWAS summary statistics alone or exclusively genome-wide significant variants.

107

TABLE 1. SUMMARY STATISTICS FOR TOP ANNOTATIONS THAT WERE ASSOCIATED WITH SZ AND BD (FDRP < 0.05, TOTAL OF 329).

Top 10% of enrichments for schizophrenia (ranked by p-value) Index SNPs Pemutation Annotation* Beta SE P FDR (P < P 5e-08) H3K4me1 temporal lobe 0.061 0.005 1.02E-39 2.12E-36 113 < 0.001 H3K4me1 brain 0.049 0.004 7.35E-31 5.10E-28 124 0.004 H3K4me1 cortex derived neurospheres 0.050 0.005 3.36E-24 1.16E-21 96 0.114 INTRON 0.032 0.003 4.24E-23 1.26E-20 133 0.002 H3K9ac cingulate gyrus 0.056 0.006 1.09E-19 2.05E-17 71 0.001 H3K9ac middle frontal area 46 0.066 0.008 5.32E-17 6.92E-15 66 < 0.001 H3K4me1 thymus 0.043 0.005 6.11E-17 7.48E-15 81 < 0.001 voltage gated cation channel activity 0.125 0.015 4.23E-16 4.63E-14 14 < 0.001 DHS astrocyte of the hippocampus 0.053 0.007 6.16E-15 5.83E-13 84 0.027 H3K4me3 germinal matrix 0.064 0.008 8.55E-15 7.13E-13 70 < 0.001 H3K4me1 T cell 0.040 0.005 8.56E-15 7.13E-13 78 0.011 voltage gated calcium channel complex 0.182 0.024 1.72E-14 1.28E-12 9 < 0.001 H3K27me3 common myeloid progenitor CD34 positive 0.032 0.004 2.01E-13 1.16E-11 77 0.928 H3K4me1 common myeloid progenitor CD34 positive 0.035 0.005 2.16E-13 1.21E-11 97 0.485 H3K4me1 B cell 0.035 0.005 2.65E-13 1.45E-11 93 0.013 H3K4me3 cortex derived neurospheres 0.066 0.009 7.20E-13 3.75E-11 57 0.007 calcium channel complex 0.140 0.020 2.03E-12 8.94E-11 10 < 0.001

Top 10% of enrichments for bipolar disorder (ranked by p-value) Index SNPs Pemutation Annotation* Beta SE P FDR (P < P 5e-08) INTRON 0.02597673 0.002800771 1.80E-20 1.25E-17 9 0.186 H3K4me1 brain 0.03254493 0.003633455 3.37E-19 1.49E-16 9 0.036

108

H3K4me1 temporal lobe 0.03457078 0.003905726 8.73E-19 3.04E-16 10 0 FANTOM blood 0.03009953 0.003661522 2.04E-16 4.26E-14 11 0 H3K4me1 B cell 0.03099275 0.004086399 3.36E-14 4.13E-12 9 0 H3K4me1 cortex derived neurospheres 0.03106746 0.004118886 4.63E-14 5.13E-12 7 0.105 H3K4me1 neutrophil 0.0347606 0.004750321 2.54E-13 2.30E-11 8 0.01 H3K9ac cingulate gyrus 0.03769568 0.005276341 9.08E-13 7.03E-11 5 0.097 DHS brain 0.03460847 0.004985114 3.87E-12 2.53E-10 8 0.018 H3K4me1 common myeloid progenitor CD34 positive 0.02843757 0.004101716 4.13E-12 2.62E-10 8 0.032 H3K4me1 natural killer cell 0.03014287 0.004359411 4.71E-12 2.90E-10 8 0.004 H3K4me1 thymus 0.03001746 0.004353052 5.38E-12 3.21E-10 8 0.002 H3K4me1 CD14 positive monocyte 0.03078288 0.004506313 8.46E-12 4.53E-10 7 0.052 DHS astrocyte of the hippocampus 0.03976075 0.005824123 8.70E-12 4.55E-10 6 0.066 H3K4me1 T cell 0.02751715 0.004314304 1.80E-10 7.51E-09 9 0.001 H3K27me3 common myeloid progenitor CD34 positive 0.02297687 0.003620148 2.20E-10 9.02E-09 5 0.401 H3K27ac natural killer cell 0.03943461 0.006404597 7.42E-10 2.67E-08 6 0.006

*Does not include cell- and tissue-level annotations outside of the brain or immune system

109

FIGURE 2. ENRICHMENT OF INDIVIDUAL HISTONE MODIFICATIONS ACROSS CELLS AND TISSUES WITH RISK VARIANTS

FOR SZ AND BD. (A and B) Show global enrichment patterns across histone modifications separated by cell type obtained from weighted linear regression. (C and D) Show enrichment significance of genome-wide significant risk variants across histone modification sites.

110

BIBLIOGRAPHY

Bailey, D. B., Hatton, D. D., Skinner, M., & Mesibov, G. (2001). Autistic Behavior,

FMR1 Protein, and Developmental Trajectories in Young Males with Fragile X

Syndrome. Journal of Autism and Developmental Disorders, 31(2), 165–174.

http://doi.org/10.1023/A:1010747131386

Ballas, N., Grunseich, C., Lu, D. D., Speh, J. C., & Mandel, G. (2005). REST and its

corepressors mediate plasticity of neuronal gene chromatin throughout neurogenesis.

Cell, 121(4), 645–657. http://doi.org/10.1016/j.cell.2005.03.013

Bavamian, S., Mellios, N., Lalonde, J., Fass, D. M., Wang, J., Sheridan, S. D., …

Haggarty, S. J. (2015). Dysregulation of miR-34a links neuronal development to

genetic risk factors for bipolar disorder. Molecular Psychiatry, 20(5), 573–84.

http://doi.org/10.1038/mp.2014.176

Bulik-Sullivan, B., Finucane, H. K., Anttila, V., Gusev, A., Day, F. R., Loh, P.-R., …

Neale, B. M. (2015). An atlas of genetic correlations across human diseases and

traits. Nature Genetics, 47(11), 1236–1241. http://doi.org/10.1038/ng.3406

Bulik-Sullivan, B. K., Loh, P.-R., Finucane, H. K., Ripke, S., Yang, J., Consortium, S.

W. G. of the P. G., … Neale, B. M. (2015). LD Score regression distinguishes

confounding from polygenicity in genome-wide association studies. Nat Genet,

advance on(3), 291–295. http://doi.org/10.1038/ng.3211

Cheng, J., Blum, R., Bowman, C., Hu, D., Shilatifard, A., Shen, S., & Dynlacht, B. D.

(2014). A role for H3K4 monomethylation in gene repression and partitioning of

chromatin readers. Molecular Cell, 53(6), 979–92.

http://doi.org/10.1016/j.molcel.2014.02.032

111 Coetzee, S. G., Pierce, S., Brundin, P., Brundin, L., Hazelett, D. J., & Coetzee, G. A.

(2016). Enrichment of risk SNPs in regulatory regions implicate diverse tissues in

Parkinson’s disease etiology. Scientific Reports, 6, 30509.

http://doi.org/10.1038/srep30509

Cohen, O. S., Weickert, T. W., Hess, J. L., Paish, L. M., McCoy, S. Y., Rothmond, D. A.,

… Glatt, S. J. (2015). A splicing-regulatory polymorphism in DRD2 disrupts

ZRANB2 binding, impairs cognitive functioning and increases risk for

schizophrenia in six Han Chinese samples. Molecular Psychiatry, (August 2014), 1–

8. http://doi.org/10.1038/mp.2015.137

Cross Disorder Group of the Psychiatric Genomics Consortium. (2013). Identification of

risk loci with shared effects on five major psychiatric disorders: a genome-wide

analysis. Lancet, 381(9875), 1371–9. http://doi.org/10.1016/S0140-6736(12)62129-

1

Fries, G. R., Quevedo, J., Zeni, C. P., Kazimi, I. F., Zunta-Soares, G., Spiker, D. E., …

Soares, J. C. (2017). Integrated transcriptome and methylome analysis in youth at

high risk for bipolar disorder: a preliminary analysis. Translational Psychiatry, 7(3),

e1059. http://doi.org/10.1038/tp.2017.32

Gessert, S., Bugner, V., Tecza, A., Pinker, M., & Kühl, M. (2010). FMR1/FXR1 and the

miRNA pathway are required for eye and neural crest development. Developmental

Biology, 341(1), 222–235. http://doi.org/10.1016/j.ydbio.2010.02.031

Hess, J. L., Tylee, D. S., Barve, R., de Jong, S., Ophoff, R. A., Kumarasinghe, N., …

Glatt, S. J. (2016). Transcriptome-wide mega-analyses reveal joint dysregulation of

immunologic genes and transcription regulators in brain and blood in schizophrenia.

112 Schizophrenia Research, 176(2–3), 114–124.

http://doi.org/10.1016/j.schres.2016.07.006

Hou, L., Bergen, S. E., Akula, N., Song, J., Hultman, C. M., Landé N, M., … Lang, M.

(2016). Genome-wide association study of 40,000 individuals identifies two novel

loci associated with bipolar disorder. Human Molecular Genetics, 25(15), 3383–

3394. http://doi.org/10.1093/hmg/ddw181

Iotchkova, V., Ritchie, G. R. S., Geihs, M., Morganella, S., Min, J. L., Walter, K., …

Soranzo, N. (2016). GARFIELD -•‐ GWAS Analysis of Regulatory or Functional

Information Enrichment with LD correction, 5. http://doi.org/10.1101/085738

Kim, A. H., Reimers, M., Maher, B., Williamson, V., McMichael, O., McClay, J. L., …

Vladimirov, V. I. (2010). MicroRNA expression profiling in the prefrontal cortex of

individuals affected with schizophrenia and bipolar disorders. Schizophr Res, 124(1–

3), 183–191. http://doi.org/10.1016/j.schres.2010.07.002

Kong, S. W., Collins, C. D., Shimizu-Motohashi, Y., Holm, I. A., Campbell, M. G., Lee,

I. H., … Kohane, I. S. (2012). Characteristics and Predictive Value of Blood

Transcriptome Signature in Males with Autism Spectrum Disorders. PLoS ONE,

7(12). http://doi.org/10.1371/journal.pone.0049475

Lai, C. Y., Yu, S. L., Hsieh, M. H., Chen, C. H., Chen, H. Y., Wen, C. C., … Chen, W. J.

(2011). MicroRNA expression Aberration as potential peripheral blood biomarkers

for Schizophrenia. PLoS ONE, 6(6). http://doi.org/10.1371/journal.pone.0021635

Moore, C. B., Wallace, J. R., Wolfe, D. J., Frase, A. T., Pendergrass, S. A., Weiss, K. M.,

& Ritchie, M. D. (2013). Low Frequency Variants, Collapsed Based on Biological

Knowledge, Uncover Complexity of Population Stratification in 1000 Genomes

113 Project Data. PLoS Genetics, 9(12). http://doi.org/10.1371/journal.pgen.1003959

Morris, A. P., Voight, B. F., Teslovich, T. M., Ferreira, T., Segrè, A. V, Steinthorsdottir,

V., … McCarthy, M. I. (2012). Large-scale association analysis provides insights

into the genetic architecture and pathophysiology of type 2 diabetes. Nature

Genetics, 44(9), 981–990. http://doi.org/10.1038/ng.2383

Power, R. A., Steinberg, S., Bjornsdottir, G., Rietveld, C. A., Abdellaoui, A., Nivard, M.

M., … Stefansson, K. (2015). Polygenic risk scores for schizophrenia and bipolar

disorder predict creativity. Nature Neuroscience, 18(7), 953–5.

http://doi.org/10.1038/nn.4040

Purcell, S. M., Wray, N. R., Stone, J. L., Visscher, P. M., O’Donovan, M. C., Sullivan, P.

F., & Sklar, P. (2009). Common polygenic variation contributes to risk of

schizophrenia and bipolar disorder. Nature, 460(7256), 748–752.

http://doi.org/nature08185 [pii]10.1038/nature08185

Ripke, S., Neale, B. M., Corvin, A., Walters, J. T. R., Farh, K.-H., Holmans, P. A., …

O’Donovan, M. C. (2014). Biological insights from 108 schizophrenia-associated

genetic loci. Nature, 511(7510), 421–7. http://doi.org/10.1038/nature13595

Roussos, P., Mitchell, A. C., Voloudakis, G., Fullard, J. F., Pothula, V. M., Tsang, J., …

Sklar, P. (2014). A Role for Noncoding Variation in Schizophrenia. Cell Reports,

9(4), 1417–29. http://doi.org/10.1016/j.celrep.2014.10.015

Ruderfer, D. M., Fanous, A. H., Ripke, S., McQuillin, A., Amdur, R. L., Gejman, P. V,

… Kendler, K. S. (2013). Polygenic dissection of diagnosis and clinical dimensions

of bipolar disorder and schizophrenia. Molecular Psychiatry, 19(9), 1017–1024.

http://doi.org/10.1038/mp.2013.138

114 Schulze, T. G., Akula, N., Breuer, R., Steele, J., Nalls, M. a, Singleton, A. B., …

McMahon, F. J. (2014). Molecular genetic overlap in bipolar disorder,

schizophrenia, and major depressive disorder. The World Journal of Biological

Psychiatry, 15(3), 200–208. http://doi.org/10.3109/15622975.2012.662282

Sekar, A., Bialas, A., de Rivera, H., Davis, H., Hammond, T., Kamitaki, N., …

McCarrol, S. A. (2016). Schizophrenia risk from complex variation of complement

component 4. Nature, 177–83.

Singh, T., Kurki, M. I., Curtis, D., Purcell, S. M., Crooks, L., McRae, J., … Barrett, J. C.

(2016). Rare loss-of-function variants in SETD1A are associated with schizophrenia

and developmental disorders. Nature Neuroscience, 19(4), 571–577.

http://doi.org/10.1038/nn.4267

The GTEx Consortium. (2015). The Genotype-Tissue Expression (GTEx) pilot analysis:

Multitissue gene regulation in humans. Science, 348(6235), 648–660.

http://doi.org/10.1126/science.1262110

The Network and Pathway Analysis Subgroup of the Psychiatric Genomics Consortium.

(2015). Psychiatric genome-wide association study analyses implicate neuronal,

immune and histone pathways. Nature Neuroscience, 18(2), 199–209.

http://doi.org/10.1038/nn.3922

Vallianatos, C. N., & Iwase, S. (2015). Disrupted intricacy of histone H3K4 methylation

in neurodevelopmental disorders. Epigenomics, 7(3), 503–19.

http://doi.org/10.2217/epi.15.1

Veyrieras, J.-B., Kudaravalli, S., Kim, S. Y., Dermitzakis, E. T., Gilad, Y., Stephens, M.,

& Pritchard, J. K. (2008). High-resolution mapping of expression-QTLs yields

115 insight into human gene regulation. PLoS Genetics, 4(10), e1000214.

http://doi.org/10.1371/journal.pgen.1000214

Wall, D. P., Esteban, F. J., DeLuca, T. F., Huyck, M., Monaghan, T., Velez de

Mendizabal, N., … Kohane, I. S. (2009). Comparative analysis of neurological

disorders focuses genome-wide search for autism genes. Genomics, 93(2), 120–129.

http://doi.org/10.1016/j.ygeno.2008.09.015

Welter, D., MacArthur, J., Morales, J., Burdett, T., Hall, P., Junkins, H., … Parkinson, H.

(2014). The NHGRI GWAS Catalog, a curated resource of SNP-trait associations.

Nucleic Acids Research, 42(D1). http://doi.org/10.1093/nar/gkt1229

Winter, S. F., Lukes, L., Walker, R. C., Welch, D. R., & Hunter, K. W. (2012). Allelic

variation and differential expression of the mSIN3A histone deacetylase complex

gene Arid4b promote mammary tumor growth and metastasis. PLoS Genetics, 8(5).

http://doi.org/10.1371/journal.pgen.1002735

Wu, M. Y., Eldin, K. W., & Beaudet, A. L. (2008). Identification of chromatin

remodeling genes Arid4a and Arid4b as leukemia suppressor genes. Journal of the

National Cancer Institute, 100(17), 1247–1259. http://doi.org/10.1093/jnci/djn253

Zeggini, E., & Morris, A. (2015). Assessing rare variation in complex traits: Design and

analysis of genetic studies. In Design and Analysis of Genetic Studies (pp. 1–261).

http://doi.org/10.1007/978-1-4939-2824-8

116 TRANSCRIPTOME-WIDE MEGA-ANALYSES REVEAL JOINT DYSREGULATION OF IMMUNOLOGIC GENES AND TRANSCRIPTION REGULATORS IN BRAIN AND BLOOD IN SCHIZOPHRENIA

Authors: Jonathan L. Hess1*, Daniel S. Tylee1*, Rahul Barve1, Simone de Jong2, Roel A. Ophoff2,3, Nishantha Kumarasinghe4,5,6,7, Paul Tooney6,8,9,10, Ulrich Schall4,6,9,10, Erin Gardiner6,8,10, Natalie Jane Beveridge6,8,9, Rodney J. Scott8,9, Surangi Yasawardene5, Antionette Perera5, Jayan Mendis5, Vaughan Carr6,11, Brian Kelly4,9,10, Murray Cairns6,8,9,10, the Neurobehavioural Genetics Unit, Ming T. Tsuang12, and Stephen J. Glatt1†

1 Psychiatric Genetic Epidemiology & Neurobiology Laboratory (PsychGENe Lab); Departments of Psychiatry and Behavioral Sciences & Neuroscience and Physiology; SUNY Upstate Medical University; Syracuse, NY, U.S.A. 2 Center for Neurobehavioral Genetics, Semel Institute for Neuroscience and Behavior, David Geffen School of Medicine at the University of California Los Angeles, Los Angeles, California, U.S.A. 3 Department of Psychiatry, Brain Center Rudolf Magnus, University Medical Center Utrecht, Utrecht, The Netherlands 4 School of Medicine & Public Health, The University of Newcastle, Callaghan, Newcastle, Australia. 5 Department of Anatomy, Faculty of Medical Sciences, University of Sri Jayawardenepura, Nugegoda, Sri Lanka 6 Schizophrenia Research Institute, Sydney, New South Wales, Australia 7 Faculty of Medicine, Sir John Kotelawala Defence University, Ratmalana, Sri Lanka 8 School of Biomedical Sciences & Pharmacy, Faculty of Health, The University of Newcastle, New South Wales, Australia 9 Hunter Medical Research Institute, Newcastle, Australia 10 Centre for Translational Neuroscience & Mental Health, University of Newcastle, Callaghan, Newcastle, Australia. 11 School of Psychiatry, University of New South Wales, Kensington, New South Wales, Australia 12 Center for Behavioral Genomics, Department of Psychiatry, Institute for Genomic Medicine, University of California, San Diego, La Jolla, CA, USA; Harvard Institute of Psychiatric Epidemiology and Genetics, Boston, USA.

* These authors contributed equally to this work.

117 ABSTRACT

The application of microarray technology in schizophrenia research was heralded as paradigm-shifting, as it allowed for high-throughput assessment of cell and tissue function. This technology was widely adopted, initially in studies of postmortem brain tissue, and later in studies of peripheral blood. The collective body of schizophrenia microarray literature contains apparent inconsistencies between studies, with failures to replicate top hits, in part due to small sample sizes, cohort-specific effects, differences in array types, and other confounders. In an attempt to summarize existing studies of schizophrenia cases and non-related comparison subjects, we performed two mega- analyses of a combined set of microarray data from postmortem prefrontal cortices (n =

315) and from ex-vivo blood tissues (n = 578). We adjusted regression models per gene to remove non-significant covariates, providing best-estimates of transcripts dysregulated in schizophrenia. We also examined dysregulation of functionally related gene sets and gene co-expression modules, and assessed enrichment of cell types and genetic risk factors. The identities of the most significantly dysregulated genes were largely distinct for each tissue, but the findings indicated common emergent biological functions (e.g. immunity) and regulatory factors (e.g., predicted targets of transcription factors and miRNA species across tissues). Our network-based analyses converged upon similar patterns of heightened innate immune gene expression in both brain and blood in schizophrenia. We also constructed generalizable machine-learning classifiers using the blood-based microarray data. Our study provides an informative atlas for future pathophysiologic and biomarker studies of schizophrenia.

118 INTRODUCTION

The molecular bases of schizophrenia (SZ) remain unresolved despite decades of intensifying research. This situation impedes progress toward biologically based risk assessment and diagnostic testing, early detection, and the development of rationally selected therapeutics to alter disease progression and clinical trajectory. As such, characterizing molecular correlates of SZ is of great potential interest and value. In the past 15 years, the transcriptome has received growing attention in SZ research, particularly in the effort to identify biomarkers—objective biological indicators of normal functioning or illness. Whole-transcriptome quantification (e.g., by microarray) offers several attractive features: (1) it provides a relatively efficient and unbiased means of screening many analytes of a single molecule type (i.e., messenger RNAs, mRNAs); (2)

RNAs can be mapped reliably onto genes and proteins to assess a wide range of biological processes; (3) the measurement of large numbers of biological features allows for the assessment of network function; and (4) differences in mRNA expression reflect the combination of genetic and environmental factors, making it a more dynamic and responsive readout of biological function than static genetic variants. Indeed, transcriptome-wide studies of postmortem brain tissues have revealed altered molecular pathways and helped generate new hypotheses about the biological underpinnings of SZ.

Similarly, studies of blood samples from individuals with SZ shed light on disturbances in circulating immune cells and could provide a basis for easily assessable SZ biomarkers.

Despite the vast potential and initial enthusiasm surrounding the use of microarrays in SZ research, the cross-study replication of genes and pathways found to be

119 disrupted in SZ is mixed (Mirnics, Levitt, & Lewis, 2006). A variety of practical and technical limitations may contribute to this, including: the evolution of new array technologies over time, the likelihood of etiologic heterogeneity of SZ (Arnedo et al.,

2014; M T Tsuang & Faraone, 1995), the use of small sample sizes, and the inability to adequately protect against type-I errors, all of which exacerbate the “winner’s-curse” phenomenon that undermines replication. In light of these issues, several studies have sought to consolidate the knowledge of transcriptomic abnormalities in SZ via meta- analysis (Bergon et al., 2015; M Mistry & Pavlidis, 2010; Meeta Mistry, Gillis, &

Pavlidis, 2013; Pérez-Santiago et al., 2012). These studies bolstered confidence by employing consistent pre-processing methods and demonstrating some similar dysregulated genes and network features across different studies; the implicated biological functions included oxidative phosphorylation, protein and nucleotide metabolism, synaptic transmission, myelination and glial function, and immune function, each of which have been implicated in previous work (Aberg et al., 2006; Dean, 2011;

Devor & Waziri, 1993; Fineberg & Ellman, 2013; Middleton, Mirnics, Pierri, Lewis, &

Levitt, 2002; Potvin et al., 2008). However, the approaches employed in previous meta- analyses studies had some limitations: (1) for the detection of differentially expressed genes, meta-analysis of summary statistics is relatively underpowered compared with combined-samples re-analysis of individual level data; (2) summary statistic meta- analysis does not allow flexible and transcript-specific modeling of clinical covariates across the entire sample; and (3) meta-analysis is not amenable to co-expression network analyses.

120 We use the term mega-analysis to refer to a strategy of the pooling of individual- level clinical and biological data from multiple studies for statistical modeling with appropriate correction for between-study variations (M Mistry, Gillis, & Pavlidis, 2013;

Seifuddin et al., 2013). This strategy allows for explicit modeling of factors that are consistently reported across studies (i.e., gender, age), as well as factors that are inconsistently reported across studies (e.g., subject medication status). In this study, we conducted two separate mega-analyses to summarize existing microarray-based transcriptomic studies of SZ in postmortem brain and in blood tissue using mixed-effect linear modeling. We extend upon previous approaches by employing network and annotation-based analyses to assess emergent biological functions. Furthermore, we perform systematic cross-tissue comparison of dysregulated functional gene sets and co- expression networks in SZ. We also characterized gene co-expression networks that were preserved across brain and blood samples in unaffected comparison subjects. Finally, we examined the cross-study generalizability of blood-based transcriptomic classifiers that differentiated SZ cases from unaffected comparison subjects.

METHODS

LITERATURE SEARCH AND STUDY SELECTION

We searched public databases (i.e., NCBI dbGaP, PubMed, SCOPUS, and EMBL-

EBI ArrayExpress) for microarray-based studies of gene expression in subjects with SZ, schizoaffective disorder, or psychosis. We conducted this literature search for eligible data up to January 1, 2015; otherwise-eligible studies published after this date were not included in our analyses, but are shown in Table 1 and compared to our findings qualitatively in the

121 Discussion section.. Twenty-five studies of blood-based gene expression and 19 studies of brain-based gene expression were identified. The following criteria for study inclusion and sample inclusion were used: (1) we only included studies that compared cases with unaffected non-related controls, (2) we only included cases classified as SZ or schizoaffective disorder, depressive subtype, based on the original investigators’ determinations, (3) we only included studies for which raw probe-level data and gene annotations were available, (4) we only included studies that utilized non-custom microarray platforms developed by Affymetrix or Illumina to minimize technical sources of heterogeneity, and (5) we only included postmortem brain studies with samples consisting of tissue homogenates. Ultimately, we included nine blood studies and nine brain studies (postmortem prefrontal cortex, PFC, only) were ultimately retained for analysis (Table 1). The rationale for excluding each of the 26 other studies is provided in

Table 2.

DATA IMPORT, NORMALIZATION, QUALITY CONTROL AND PROBE MATCHING

Among the selected studies, two different array manufacturers were represented

(Table 1). Brain studies were all run on Affymetrix array platforms (GeneChip® Human

Genome U133A, U133A Plus 2.0, or Exon 1.0 ST). Blood studies were run on several different Affymetrix and Illumina array platforms. Data from each study were processed and normalized independently. For a minority of studies, two different array versions were used (Table 1); data from each version were also processed independently. Raw Affymetrix

CEL files were imported using the affy package from the R/Bioconductor library (Gautier,

Cope, Bolstad, & Irizarry, 2004). Oligonucleotide probes were imported, and corrections

122 for background signal were applied using the robust multi-array average (RMA) method

(Irizarry et al., 2003) with additional corrections applied for the GC-content of probes whenever possible. The set of GeneChips was standardized using quantile normalization, and expression levels of each probe underwent log-2 transformation to yield distributions of data that more closely approximated normality. For Illumina arrays, Bead-summarized data files were imported and processed in R; data received default Illumina background correction and were quantile-normalized and log-2 transformed. Affymetrix quality- control indices were inspected, and principal components analysis (PCA) plots were constructed for all datasets, allowing for the detection and removal of outliers (> 4 s.d. on each of the first three PCA dimensions). Outliers were rare (n = 10 for brain studies, n = 3 for blood studies) and sporadically distributed across datasets. Data were visually inspected for batch effects; these were corrected when scan date information was available.

Probeset-matching across manufacturers and platforms was carried out in R using the biomaRt package from the R/Bioconductor library (Durinck, Spellman, Birney, & Huber,

2009). For each dataset, manufacturer probe IDs were first converted to HGNC gene symbols. Probes matching to no gene symbols were omitted from analysis. In instances where several probes were matched to the same gene symbol, the expression values were summarized by taking the median value. Finally, within each individual study, gene expression values were z-transformed in order to normalize the range and variance of each gene’s expression across datasets generated on different array platforms; graphical representation of the effects of normalization and transformation are depicted in Figure S1.

Finally, all datasets for a given tissue type were then merged based on common gene symbols, and genes that were not measured within a given study were coded as “NA”.

123 The number of genes available for analysis in given blood and brain studies are shown in Table 1. The effects of normalization and transformation are depicted in Figure

1. In order to identify potential differences in the proportions of leukocyte subtypes between SZ cases and unaffected comparison subjects, we performed deconvolution analysis using previously described methods (Abbas, Wolslegel, Seshasayee, Modrusan,

& Clark, 2009) followed by an independent samples t-test with family-wise Benjamini-

Hochberg (BH) correction for multiple testing.

MIXED-EFFECT LINEAR MODELING AND GENE SET ANALYSIS

Expression and covariate data from individual studies were combined, creating separate brain (n = 315 and blood datasets (n = 578). Independent mega-analyses were performed on these datasets using mixed-effect linear modeling. The brain analysis included covariates for age (continuous), ancestry (Caucasian, Asian, African-American), gender (male, female), postmortem interval (continuous) and tissue pH (continuous). The blood analysis included covariates for age, sample-type (whole blood, leukocytes, peripheral blood mononuclear cells), ancestry (Caucasian, Asian), gender, and anti- psychotic status (yes, no; as defined by original study authors). A total of 20,767 genes were analyzed from brain studies and 19,737 genes contained sufficient data for analysis in blood studies. For multiple-test correction, we examined Bonferroni-corrected p-values in order to conservatively define differentially expressed genes; for downstream analyses, we used a more permissive False Discovery Rate (FDR) q-value < 0.10 to control the family-wise error rate at 10% while allowing more transcripts to move forward for cross- tissue comparison and enrichment analysis (Storey, 2003).

124 Permutation-based gene set analyses (Väremo, Nielsen, & Nookaew, 2013) were performed separately for each tissue using the summary statistics (p- and t-values) derived from the single-gene analyses. Permutation-based gene set analysis was performed using the Piano package (Väremo et al., 2013). This approach uses summary statistics (p- and t- values) derived from the single-gene models to assess whether a priori-defined gene sets reflecting various functional and ontological themes were dysregulated. Specifically, assessed the mean of summary statistics for the genes corresponding to a given gene set, comparing this value against 1x106 randomly selected gene sets of equal size in order to generate empirical p-values for tests of five distinct hypotheses. For the test of a non- directional dysregulation, the mean p-value of the target set was compared to the mean p- value of the randomly permuted sets. For test of an absolute directional hypotheses (i.e. all up-regulated and all down-regulated), the mean t-values for the target set were compared against the mean for permuted sets. For the test of a mixed directional hypothesis, each target gene set was subset to include only genes with positive (i.e. for mixed up-regulated effects) or negative (i.e. for mixed down-regulated effects) t-values; mean t-values for this target subset were compared against permuted t-values drawn from the background reflecting all same-signed test statistics. Six gene set databases were obtained from the

Molecular Signature Database (Broad Institute): H: hallmark gene sets, C1: positional gene sets, C2: curated gene sets, C3: motif gene sets, C5: GO gene sets, and C7: immunologic signatures; together these sets capture collective knowledge of gene’s participation in functional pathways, known and predicted regulatory relationships, and chromosomal locations. Only gene sets intersecting with the target list of tests statistics (with 4 to 500 genes) were analyzed. For each family of tests (reflecting the combination of database and

125 hypothesis type), the test results were Bonferroni-corrected to control the family-wise error rate at 5%. For gene sets where multiple test hypotheses were significant (i.e. non- directional, mixed-up-regulated and all up-regulated effects), we elected to report the directional effects, such that all-upregulated was given the highest preference.

EXPRESSION QUANTITATIVE TRAIT LOCI AND GWAS ENRICHMENT ANALYSIS

For various gene lists of interest (i.e., differentially expressed or participants in a network module), we sought to assess whether those genes disproportionately represented: (1) expression quantitative trait loci (eQTLs) previously identified in brain or blood cells (National Center for Biotechnology Information [NCBI] eQTL Browser); and (2) SNPs associated with SZ based on prior association studies (Ripke et al., 2014).

We next sought to determine whether the lists of differentially expressed genes derived from our mega-analyses were enriched with genes known to be regulated by expression quantitative trait loci (eQTL). To assess the significance of these overlaps, we permuted gene labels from the background and re-calculated enrichment ratios for each permuted gene set. Finally, p-values were calculated based on the number of permuted enrichment that equaled or exceeded those for dysregulated gene sets divided by the number of permutations (n = 10,000).

CONSTRUCTING NETWORKS OF CO-EXPRESSED GENES IN BRAIN AND BLOOD

We performed unsupervised weighted unsigned gene co-expression network analysis (WGCNA; Langfelder et al, 2011) using the blockwiseModules function separately in brain and blood datasets. These networks were detected based on Pearson

126 correlation coefficients for all pairs of genes common to brain and blood datasets.

Expression matrices (median summarized gene expression transformed to z-scale) were quality-checked to remove genes with too many missing values. The absolute correlation coefficients between all gene pairs were raised to the power β, selected as the soft- threshold power that approximated a scale-free topology network (brain = 8, blood = 5).

In addition, we set the following parameters to construct the networks (kept default if not specified) within the blockwiseModules command: deepSplit = 2, TOMType =

"unsigned", minModuleSize = 30, minCoreKME = 0.5, minCoreKMESize = 10, minKMEtoStay = 0, reassignThreshold = 1e-6, mergeCutHeight = 0.20, detectCutHeight

= 0.995, and maxBlockSize = 5000. We detected modules of highly correlated genes and quantified their eigenvalues—or module eigengenes—representing the first principal component of a given module.

We applied independent linear mixed-effect models that controlled for continuous and categorical fixed effects of samples and categorical random effects of study. We constructed these modules using identical sets of predictors to those used in the mega- analyses of brain and blood samples. P-values for random and fixed effects were estimated according to Satterthwaite’s approximation for degrees of freedom; all regression models and calculations were performed using the lmerTest package in R. For

SZ-associated modules, we examined biological annotations enriched among the module’s genes using a pathway-based approach described below. We compared SZ- associated networks identified in the brain with those identified in the blood in terms of common genes and significantly enriched annotations (FDR q-value ≤ 0.05).

Hypergeometric tests were used to determine the significance of overlap while adjusting

127 for the total number of background genes and biological annotations associated with the set of background genes, respectively. For SZ- associated modules, we identified highly connected hub genes, assessed functional enrichment (described below), and performed cross-tissue comparison of module genes. Additionally, we assessed network module preservation within each tissue (across diagnostic groups) and between tissues (only in unaffected comparison subjects).

EVALUATING PRESERVATION OF CO-EXPRESSION MODULES ACROSS DIAGNOSTIC

GROUPS AND TISSUES

We tested the reproducibility of modules across diagnostic groups and between tissues using the modulePreservation command in the WGCNA R package. Separate networks were identified in each of three pairs of comparisons: (1) brain of SZ cases versus unaffected comparisons, (2) blood of SZ cases versus unaffected comparisons, and

(3) brain versus blood in unaffected comparisons only. Networks were constructed using the blockwiseModules command with the following parameters: deepSplit = 2, TOMType

= "unsigned", minModuleSize = 30, minCoreKME = 0.5, minCoreKMESize = 10, minKMEtoStay = 0, reassignThreshold = 1e-6, mergeCutHeight = 0.20, detectCutHeight

= 0.995, and maxBlockSize = 5000. We set the power β = 6 when constructing all networks, which is the conventional soft-threshold power for constructing unsigned networks in WGCNA. We implemented 100 network permutations within the modulePreservation command and derived individual z-scores per module which were summarized into a composite statistic called Zsummary. This composite statistic was used to indicate the significance of a module’s preservation compared to randomly sampled genes (Zsummary < 2 no preservation, 2 < Zsummary < 10 moderate evidence of

128 preservation, and Zsummary > 10 strong evidenceof preservation). We ranked modules by Zsummary and selected the least or most preserved modules to characterize using a pathway-based approach described below.

ENRICHMENT ANALYSIS OF BIOLOGICAL ANNOTATIONS AND CELL-TYPE SIGNATURES

Hypergeometric testing was used to assess network modules for: (1) functional enrichment based on the contents of the DAVID Knowledgebase (v.6.7; Huang et al,

2009) (2) enrichment with brain cell-specific signatures; (Dougherty, Schmidt, Nakajima,

& Heintz, 2010) and (3) immune cell-specific signatures (Abbas et al., 2009; Watkins et al., 2009). Family-wise BH correction was applied per database to control for multiple testing. The following DAVID knowledgebase categories were chosen for the pathway- based analysis: cytobands, GOTERM_BP_ALL, (GO annotations for biological processes), GOTERM_MF_ALL (GO annotations for molecular functions),

GOTERM_CC_ALL (GO terms for cell compartments), KEGG pathways, PANTHER pathways, Reactome pathways, Reactome interactions, Protein Information Resource

(PIR) Superfamily and Keywords, and Simple Modular Architecture Research Tool

(SMART).

GENE SET AND NETWORK MODULE HETEROGENEITY ANALYSES

For SZ-associated gene sets and network modules, we sought to assess whether the same SZ samples were driving the between-groups difference observed for each feature using previously described clustering methods (Lottaz, Toedling, & Spang, 2007).

129 MACHINE-LEARNING CLASSIFICATION USING BLOOD TRANSCRIPTOMIC DATA

We used Random Forest and Ensemble SVM approaches to construct and validate classifiers using independent data matrices carved from the blood mega-analysis dataset.

Training (n = 413 samples run on Illumina arrays) and validation sets (n = 165 run on

Affymetrix arrays) were generated; this manufacturer-based separation was chosen to pose a maximal challenge to classifier generalizability. Our intention was to evaluate several models, not to identify the single best classifier. Sets of the top significantly dysregulated genes (k = 20, 60, 150) were identified by linear mixed-modeling in the training set; these sets reflect the minimum, maximum, and average size of optimal classifiers of neuropsychiatric disorders from blood transcriptome data based on our past experience

(Glatt et al., 2012, 2013; Ming T Tsuang et al., 2005; Tylee et al., 2015). Classifiers were fit to the training data and subsequently tested in the independently withheld validation set.

RESULTS

DYSREGULATED GENES AND GENE SETS IN BRAIN

For our brain mega-analysis, 92 genes were dysregulated in SZ at an FDR q < 0.10 and two genes (RHOBTB3 and ABCA1) reached a Bonferroni-corrected level of significance (Table 3). Among the 92 dysregulated genes, 73 genes were up-regulated and

19 genes were down-regulated in SZ (two-tailed sign test p < 3.9×10-9). Gene set analysis identified 745 sets (among 9254 examined) with at least one significant test hypothesis

(Bonferroni p < 0.05); the vast majority of gene sets (720) were up-regulated among SZ cases, whereas one gene set showed a non-directional effect, and 19 gene sets showed a down-regulated effect (Table 4). The list of up-regulated sets included innate immune and

130 inflammatory signaling pathways (TNF-α, NF-kB, p38 MAPK, IL-6 via STAT3, IL-2 via

STAT5, IFN-γ, several TLR signaling cascades, protozoal infection, implicated in lupus); cellular stress responses (hypoxia, UV exposure, unfolded protein response, apoptosis/p53 cascades); response to androgens; metabolism of cholesterol and fatty acids; RNA metabolism and binding; several pathways related to cell survival, growth, and oncogenesis

(EGFR, ERBB2, Insulin, KRAS, MAPK/MEK, MTOR, MYC, PDGFR, PIGF, VEGF); many gene sets targeted by miRNAs and transcription factors; genes involved in development and cellular differentiation; and several chromosomal loci, among others. Down-regulated gene sets included olfactory signaling pathways, genes with promotor CpG-site methylation in neural precursor cells, and several chromosomal loci (most notably 22q11).

DYSREGULATED GENES AND GENE SETS IN BLOOD

We estimated the abundance of 17 leukocyte subtypes in SZ cases and unaffected comparison subjects and found no significant difference between groups, though a trend toward increased activated cytotoxic T cells was observed among SZ cases (uncorrected p

= 0.054, Table 5).

Within our blood mega-analysis, 2 238 genes were dysregulated in SZ at an FDR q

< 0.10 and 220 reached a Bonferroni-corrected level of significance (Table 6). Among the

2 238 genes, 1 110 were up-regulated and 1 128 were down-regulated in SZ (two-tailed sign test p = 0.66); the absence of a systematic directional effect in the single-gene analysis did not preclude a directional effect at the level of gene sets, and gene set analysis identified

526 gene sets (among 9256 examined) with at least one significant test hypothesis

(Bonferroni p < 0.05); the majority of gene sets (390) were up-regulated among SZ cases,

131 whereas 21 gene sets showed a non-directional effect, and 115 gene sets showed a down- regulated effect (Table 7). The list of up-regulated sets included innate immune and inflammatory signaling pathways (TNF-α, NF-kB, IL-6, several TLR signaling cascades, protozoal infection, implicated in lupus); cellular stress responses (hypoxia, UV exposure); response to androgens; glycolytic metabolism; several pathways related to cell survival, growth, and oncogenesis (EGFR, ERBB2, PDGFR, PIGF, PTEN, VEGF); many gene sets targeted by miRNAs and transcription factors; genes involved in development and cellular differentiation; and several chromosomal loci, among others. Down-regulated gene sets included those involved in DNA repair; metabolism and nonsense-mediated decay of mRNA; influenza RNA replication; citric acid cycle, mitochondrial function, and oxidative phosphorylation; ribosomal function and the regulation of protein translation; genes whose expression is typically driven by MYC and EIF4E; and several chromosomal loci, among others.

CROSS-TISSUE COMPARISON OF DYSREGULATED GENES AND GENE SETS

Cross-tissue comparison of dysregulated gene lists is depicted in Figure 2. A total of 10 genes were dysregulated in both the brain and blood mega-analyses, reflecting a non- significant overlap (hypergeometric test p = 0.68). Seven genes were coordinately up- regulated in SZ across tissues (p = 0.15) and 1 gene was coordinately down-regulated. Two genes showed directionally discordant effects across tissues. The cross-tissue overlap of significantly dysregulated gene sets is depicted in Figure 3, Panel B. Two hundred and sixty-three gene sets were common to both tissues (Bonferroni p = 1.7x10-158); 255 of these sets were commonly up-regulated (p = 2.8x10-198) and 4 sets were commonly down-

132 regulated (p = 4.6x10-4). One gene set showed directionally discordant effects across tissues. Five gene sets showed significant evidence for heterogeneity, such that approximately 20% of SZ cases appeared to drive the group main effect, and different individuals drove the effects for different functional sets (Figure 3, Panel A). One gene set (MYC targets) showed significant evidence for heterogeneity, such that approximately

50% of SZ cases appear to drive the group main effect (Figure 3, Panel B). The average correlation coefficient reflecting similarity of between-tissue differential expression statistics (gene level) among commonly up-regulated in the brain and blood of SZ cases was r = 0.133 (Stouffer-Liptak combined p-value = 2.3e-08, Figure 4).

ENRICHMENT OF EQTL AND GWAS ASSOCIATION SIGNALS AMONG DYSREGULATED GENES

We found that eQTLs (NCBI eQTL Browser) were significantly enriched among dysregulated genes (q < 0.1) identified in the blood mega-analysis, but not those from the brain analysis (Table 8). Specifically, 50 genes identified as dysregulated in the blood analysis previously showed evidence of eQTL regulation in lymphoblastoid cell lines (p

< 0.007), while 607 showed evidence of eQTL regulation in the frontal cortex (p <

0.001); 20 genes were common to both lists, including TMEM30A, SCAMP3, HSD17B12,

TRPV2, BCR, SLC2A8, SLK, BCAT2, NUP93, DDX55, ANP32A, RTN4, ETS2, MDH2,

DHRS1, UROS, MRPL43, HERPUD2, CYB561, and RAE1. Within our lists of dysregulated genes in each tissue, we did not find significant quantitative enrichment with genes harboring (or located proximal to) SZ-associated SNPs (Ripke et al., 2014) as compared with randomly permuted gene lists of the same size (brain p = 0.07, blood p =

0.99). Among the 108 independent loci (located proximal to 311 genes) showing genome-wide significant association in the largest SZ GWAS from the Psychiatric

133 Genomics Consortium (PGC; Ripke et al, 2014), we observed overlap with 1 differentially expressed brain gene (Clusterin, CLU, up-regulated in SZ cases) and 36 differentially expressed blood genes (hypergeometric overlap p < 0.99; Table 9).

NETWORK CO-EXPRESSION ANALYSIS IDENTIFIES SZ-ASSOCIATED MODULES IN

BRAIN AND BLOOD

We detected 21 modules of co-expressed genes in the brain, and while none were associated with SZ at a BH-corrected threshold of significance, three (arbitrarily labeled

“green”, “salmon”, and “yellow”) were nominally associated and were examined in downstream analyses (Figure 5). The green module was diminished in SZ cases and enriched with synapse- and neuronal projection- and development-related genes. This module also over-represented signatures of neuronal cell types known to express D1 and

D2 dopaminergic receptors, and cortical neurons and immune cells (Figure 5). The salmon module was enriched with immunologic terms and also over-represented signatures of cortical and cerebellar astrocytes, cerebellar oligodendrocytes, Bergman glia, and brain stem cholinergic motor neurons (Figure 5). The yellow module was enriched with metabolic and electron transport function and over-represented cortical and cerebellar astrocytes, cerebellar oligodendrocytes, and Bergman glia, and cortical oligodendrocyte progenitors.

We detected 33 modules in the blood, nine of which were associated with SZ at a corrected level of significance. Among these, “darkolivegreen”, “greenyellow”,

“grey60”, “pink”, “turquoise”, and “yellow” were enhanced in SZ, while the “blue”,

“steelblue”, and “cyan” modules were diminished among SZ cases. Notably, two

134 enhanced modules (darkolivegreen and grey60) were enriched with innate immune function and over-represented granulocytes (Table 10). The yellow module also over- represented the granulocyte and B lymphocyte signature. The steelblue module was associated with antigen presentation and self-recognition and over-represented natural killer cells. The pink module enriched with mitochondrial functions. The turquoise and greenyellow modules enriched with post-translational modifications and protein trafficking, with the former module showing over-representation of B lymphocytes, and also enriched in splicing complex function. Hub genes and functional enrichments of darkolivegreen SZ-associated blood modules is shown is Figure 6.

ENRICHMENT OF GWAS ASSOCIATION SIGNAL IN SZ-ASSOCIATED MODULES

The green and yellow brain modules, as well as the blue, grey60, turquoise, and yellow blood modules contained genes with known SNPs more strongly associated with

SZ compared with randomly permuted genes (family-wise BH p < 0.05; Figure 7).

CROSS-TISSUE OVERLAP OF GENES AND FUNCTIONAL ANNOTATIONS IN SZ-

ASSOCIATED MODULES

Cross-tissue comparison revealed overlapping genes between the green brain module and the blue and cyan blood modules (hypergeometric p < 6.1×10-14 and 1.94×10-

7); there was also significant sharing of genes between the salmon brain module and yellow blood module (p < 1.3x10-3; Table 11). The salmon brain module shared three functional annotations with two blood modules: genes mapping to major histocompatibility complex region at cytoband 6p21.3 (cyan), and response to biotic stimulus and defense response (darkolivegreen).

135 NETWORK CO-EXPRESSION ANALYSIS IDENTIFIES CROSS-GROUP AND CROSS-TISSUE

PRESERVATION

Among the 26 modules that were identified in brain samples from unaffected comparison subjects, 24 were highly preserved and two were moderately preserved among SZ samples (Figure 8); the least preserved module (“darkturquoise”, Z summary

= 5.9) was enriched with extracellular space, immune functions, cytokine (CXCL1, CSF3,

BMP2, CXCL2, IL12A, TNFSF9, CXCL10) and growth factor activity (CXCL1, CSF3,

BMP2, ENDOU, IL12A, HBEGF, HGF) and signal transducer activity. For the thirty- two modules identified in unaffected comparison subject blood samples, all were highly preserved among SZ samples (Figure 8). The “skyblue” module was the least preserved

(Z summary = 14) and was enriched with genes functioning in RNA processing and metabolism. In a contrast of brain and blood, 13 of 30 network modules identified in unaffected comparison subjects’ brain samples showed strong evidence for preservation

(Z summary > 10), 15 showed moderate preservation (10 < Z summary > 2), and two showed no evidence for preservation (Z summary < 2; Figure 8). The “pink” module identified in brain showed the strongest evidence of preservation in blood (Z summary =

25) and was enriched with genes associated with zinc finger transcription factors and nuclear components.

MACHINE-LEARNING CLASSIFICATION USING BLOOD TRANSCRIPTOME DATA

The results of blood-based transcriptomic classification analyses are shown in

Table 12. Random Forest classifiers performed with high receiver operating characteristic area under the curve (AUC; 0.92 to 0.96) in the training matrix and retained

136 moderate AUCs (0.72 to 0.77) in the independent validation matrix. Ensemble Support

Vector Machine classifiers also performed with high AUCs in the training matrix (0.90 to

0.99) and moderate AUCs in the validation matrix (0.72 to 0.75). Assuming that a random binomial classifier (e.g., coin-flip) would obtain no-better-than chance performance if employed within an identical bootstrapping and aggregation framework

(AUC = 0.50), the validation sample predictions reflect better than chance performance

(binomial test p-values ranging from 5×10-6 to 2×10-9).

DISCUSSION

LIMITATIONS OF GENE NETWORK PRESERVATION ANALYSIS

There are important limitations to discuss with respect to the co-expression preservation analysis that we performed to assess the reproducibility of brain-derived gene co-expression networks in the blood. Our analysis showed that roughly 50% of gene networks identified in the brain are strongly and significantly preserved in the blood. A major drawback to our analysis is that it does not provide a direct measure of how similarly any single gene is expressed in both tissues, because gene expression variability was not the focus of our network preservation analysis. A mistaken interpretation from our data would be that ~50% of genes expressed in brain are also highly expressed in blood. An appropriate interpretation is that gene-gene correlations and intra-network connectivity are similar between brain and blood. A plausible explanation for what could be driving similar co-expression of genes between tissues is the sharing of regulatory molecules (e.g., transcriptional regulators, microRNAs, long noncoding RNAs, etc.) involved in transcriptional or post-transcriptional regulation across brain and blood.

137 Based on analyses of human microarray data, there are hundreds of ubiquitously expressed transcription factors that are similarly expressed in brain and blood

(Vaquerizas, Kummerfeld, Teichmann, & Luscombe, 2009). In addition, RNA- sequencing data from humans showed that there are thousands of small non-coding RNA regulatory molecules that exhibit positively correlated expression levels between the prefrontal cortex and whole blood (Leung et al., 2016). Additional evidence that supports this hypothesis is that our gene enrichment analysis using differential expression data from uncovered miRNA and transcription factor targets that were significantly associated with SZ across the brain and blood. Shared epigenetic variation would be an alternative explanation as to how pairs of genes are similarly co-regulated across tissues. However, comparisons of genome-wide methylation signatures indicated that the brain and blood have distinct epigenetic landscapes (Hannon, Lunnon, Schalkwyk, & Mill, 2015).

Identifying factors that underlie gene co-expression network preservation in the brain and periphery is an important next step for biomarker and drug target research. Our findings warrant further analyses using intra-individual RNA-sequencing from postmortem brain and blood, which would capture small non-coding regulatory molecules that are missed by microarrays and address another limitation of our preservation analysis – the absence of within-subject data. Furthermore, there is a critical need for cell-specific gene and protein expression data to have the ability to investigate preservation of gene networks across different cell types in the brain and periphery.

138 CONCLUSIONS AND REMARKS

Our study detected many dysregulated genes that surpassed rigorous corrections for multiple testing, particularly in the larger blood dataset. In combination with gene set and network-based analyses, identified emergent biological functions robustly altered in

SZ brain and peripheral blood transcriptome. Our study is preceded by a recent meta- analysis that sought to identify a cross-tissue signature of SZ using postmortem brain and ex vivo peripheral blood microarray data (Bergon et al., 2015). Comparing the differentially expressed genes identified in the present study with those reported by

Bergon et al., we observe only eight brain and 40 blood genes implicated by both studies; differences in the results could be attributed to the following differences between studies:

(1) inclusion of different postmortem brain regions (multiple regions vs. PFC-only in our analysis) and the number of blood studies included (more included in present study); (2) the total number of genes tested (more in the present study); (3) the approaches taken to reduce between-study variation; (4) the approach to statistical modeling and the specific covariates modeled; and (5) the thresholds for declaring significance. Despite these differences, we note broad similarities with respect to the biological pathways and functions implicated in each study (e.g., genes that mediate immunologic functions, mitochondrial processes, and protein metabolism). However, it is important to acknowledge that functional annotation-based approaches will necessarily be limited to the biological terms that are well-annotated within their respective databases, and that many databases over-represent domains of cancer biology and immunology, which could contribute to bias in studies such as ours.

139 In addition, we note similarities between our mega-analysis of postmortem PFC homogenates and a prior microarray study of SZ based on laser-microdissections of dorsolateral PFC (n = 24 schizophrenia, n = 12 schizoaffective, n = 24 unaffected comparison; Arion et al, 2015). From this study, we examined the top 35 differentially expressed genes detected in SZ cases in layers 3/5; among these, our PFC analyses also showed down-regulation of DEF8. However, the effect we observed in PFC homogenates for DEF8 was not strong enough to survive multiple testing (uncorrected Ps < 6×10-4, q- values < 0.12). We also cross-referenced their findings with the results from our blood analysis and found four genes with consistent down-regulation in SZ at a p< 0.05

(TINF2, RPS10, NDUFA8, and EMG1). These overlaps suggest that a subset of genes show generalizable differences across tissues and cell types.

In the present study, genes dysregulated in SZ brain tissue were associated with diverse biological functions, but featured prominently among these were up-regulated inflammatory and cellular stress responses, cell growth and oncogenesis pathways, and metabolic pathways. These findings were recapitulated and further resolved by network analysis, which implicated three modules reflecting neurodevelopment (diminished in

SZ), inflammation (enhanced), and lipid metabolism (enhanced), with the latter two modules enriched with markers of glial cells and the former enriched with neuronal cell types. Our mega-analysis helps clarify conflicting reports of NF-kB dysregulation (Rao,

Kim, Harry, Rapoport, & Reese, 2013; Roussos et al., 2013) by demonstrating that transcriptional targets of this signaling pathway are up-regulated in a large sample.

Furthermore, our observations are consistent with the idea that excessive expression and

140 signaling via damage/pathogen-associated molecular pattern receptors may contribute to brain inflammation in SZ (Fillman et al., 2013; Venkatasubramanian & Debnath, 2013).

The blood mega-analysis yielded more significantly dysregulated genes compared with the brain mega-analysis; this was a function of both more samples and a larger magnitude of effect sizes among the top 1% of genes ranked by p-value (|covariate adjusted mean difference| brain = 0.41 + 0.05; |covariate adjusted mean difference| blood

= 0.46 + 0.05; t-test p-value < 2×10-22); the observation that the transcriptomic signature of SZ is more prominent in blood tissue is a curious one that remains open to interpretation; one explanation may be that blood tissue simply allows for a wider range in the intensity of gene expression as compared with the brain. Alternatively, the blood may more prominently reflect the effects of inadequately controlled covariates (e.g., smoking, antipsychotics). Blood co-expression modules were generally preserved between SZ cases and unaffected comparison subjects and SZ samples, yet several modules identified in the full sample were associated with the SZ diagnosis and support the assertion that disturbances in innate immunity, antigen presentation, granulocytic, natural killer cells and lymphocytic functions are altered in SZ.

Relatively little cross-tissue overlap was observed at the level of dysregulated gene lists, yet our gene set and network-based approaches identified cross-tissue transcriptomic convergence, particularly with respect to innate immune functions, antigen presentation, cellular growth pathways, and common regulatory mechanisms (particularly the up-regulation of many miRNA targets, suggesting that miRNA-based regulation of gene expression may be deficient in SZ). Taken together, these findings suggest that

141 different genes are dysregulated in each tissue, but that cross-tissue convergence may be observable at the level of emergent function.

Notably, one gene from the brain analysis and 36 genes from the blood analysis harbored SNPs that reached genome-wide significance in the largest available association meta-analysis of SZ from the PGC (Ripke et al., 2014). We did not, however, observe significant enrichment of SZ GWAS signals within the list of dysregulated genes for either tissue, yet a network-based approach revealed that SZ-associated modules in both brain and blood were enriched with SZ GWAS association signal. Additionally, we observed significant over-representation of genes with lymphoblastoid cell and postmortem PFC eQTLs (NCBI eQTL Browser) within our lists of dysregulated genes in the SZ blood samples, allowing the possibility that some peripheral transcriptomic differences may be governed by genetic variants with known regulatory activity in both neural and immunologic cell types (Sanders et al., 2013). However, the majority of dysregulated genes in both tissues were not associated with known eQTLs or SZ- associated loci from GWAS studies, suggesting that genetic regulatory elements may play a relatively small, indirect, or developmentally dependent role in shaping SZ- associated transcriptomic differences.

Another essential outcome of our work is identification of numerous gene co- expression modules that are preserved across brain and blood tissues in non-psychotic individuals. These results align with the findings of our previous review on the topic

(Tylee, Kawaguchi, & Glatt, 2013) and also supports the pursuit of blood-based transcriptomic classification tools for CNS disorders like SZ. Our machine-learning classifier work makes several important contributions to the psychiatric biomarker

142 literature. To our knowledge, this was the first attempt at classifier construction within a sample composed of multiple independent studies, reflecting different distributions of sex, age, ancestry, and medication usage. Our classifiers performed with moderate

(approximately 70%) accuracy in an independently withheld sample composed of distinct studies (rather than a withheld subset of the same study sample); thus these classifiers appear robust to differences in experimental factors that vary across study sites (e.g., microarray platform). Future studies should employ higher-resolution transcriptomic data

(i.e., RNA sequencing) and more sophisticated feature-selection algorithms to explore the upper limits of blood-based classification accuracy and to assess classifier specificity when discriminating different psychiatric conditions (e.g., SZ vs. bipolar disorder); prospective transcriptomic studies could also be useful for predicting treatment response

(Mamdani et al., 2011), thus paving the way for treatment selection biomarkers.

Accumulating evidence from various lines of research links immune dysregulation and SZ, including: (1) the most strongly implicated locus in the largest SZ

GWAS study lies within the major histocompatibility complex (MHC) region (Ripke et al., 2014), which encodes genes involved in cellular antigen presentation and reflects a critical bridge between innate and adaptive immune functions; (2) increased prevalence of autoimmune disorders is found among individuals with SZ and their relatives in epidemiologic studies (Eaton et al., 2006); (3) increased levels of cytokines in peripheral blood (Miller, Buckley, Seabolt, Mellor, & Kirkpatrick, 2011) and cerebrospinal fluid of

SZ patients are correlated with elevated levels of an endogenous NMDA receptor antagonist (Schwieler et al, 2015) and with cytoarchitectural and structural changes in the brain (Ellman et al., 2010; Fung, Joshi, Fillman, & Weickert, 2014); and (4)

143 pharmacological evidence that antipsychotics dampen inflammation and may act as a restoration loop into dopaminergic circuitry (Kumarasinghe et al., 2013; Müller &

Schwarz, 2010; Sugino, Futamura, Mitsumoto, Maeda, & Marunaka, 2009). The present study demonstrated that many of the same inflammatory signaling cascades are up- regulated in both tissues, and also supports previous accounts of acute SZ-related changes in adaptive immune cells (Maino et al., 2007; Ripke et al., 2014; Steiner et al., 2010) and alterations in white matter and glial cell populations (Cotter et al., 2002; Duncan et al.,

2014). In light of this growing body of literature, it seems likely that global dysregulation of immune and inflammatory function is present in a subset of individuals with SZ and it is plausible that some genetic risk factors for SZ may exert their effects through immunologic cell types and their emergent functions. A new landmark finding in SZ genetics demonstrated that risk for SZ increases linearly with expression of the MHC region gene C4, which in turn leads to excessive synaptic pruning, thus providing clear mechanistic evidence that immune genes can mediate and perturb brain development

(Sekar et al., 2016). Acute dysregulation of inflammatory signaling systems could also contribute to the SZ phenotype within the fully developed brain, through changes in neuroplasticity and neurotransmission. The implication of dysregulated inflammatory functions in the postmortem SZ brain underscores the need for interdisciplinary basic science research characterizing the cross-talk between these signaling cascades and those controlling typical neurodevelopmental processes and normal functioning in the fully developed brain.

In summary, our study makes several important contributions, including statistical improvements to obtain the best-estimate of SZ-associated differential expression, a

144 thorough cross-tissue assessment of transcriptomic dysregulation in SZ, and generalizable classification of SZ cases and comparison subjects measured on different microarray chip technologies. However, the present study inherits many of the same limitations shared by all postmortem brain studies of SZ. These data are cross-sectional, representing transcriptomic profiles from a single time point, occurring after disease onset, death, and

(probably in the vast majority of cases) years of anti-psychotic treatment. As such, it is not possible to know whether the observed differences are causal contributors or downstream consequences of SZ pathophysiology. While we attempted to control for the effects of medication, we must acknowledge the likelihood that some observations currently attributed to diagnostic status were influenced by group differences in medication use or other uncontrolled covariates (e.g., tobacco use). Studies in animal models, or antipsychotic-naïve, smoking-matched subjects will be essential for resolving these possibilities. Despite these limitations, this study provides an atlas of dysregulated genes, biological processes, and co-expression networks associated with SZ in brain and blood tissues.

145 TABLE 1. SCHIZOPHRENIA MICROARRAY STUDIES INCLUDED IN THE MEGA ANALYSES. Cases Controls % % Predominant Genes Brain Studies Array Type (n) (n) Female Medicated Ancestry Analyzed Altar -Stanley Medical Research Institute (SMRI) – Affymetrix U133a 8 10 22% 100% Caucasian 12,153 Collection A Altar - SMRI - Collection C Affymetrix U133a 11 11 32% 100% Caucasian 12,366 Affymetrix U133 Plus Dobrin - SMRI Collection 24 25 27% 100% Caucasian 20,286 2.0 Cohen et al., 2009 Affymetrix HG 1.0 ST 4 4 0% 100% NA 17,168

Glatt et al., 2005 Affymetrix U133a 16 25 27% 100% Caucasian 12,410 Affymetrix U133 Plus Maycox et al., 2009 28 23 40% NA Caucasian 20,767 2.0 Affymetrix U133 Plus Narayan et al., 2008 28 28 16% NA NA 20,766 2.0 Affymetrix U133 Plus Lanz (GSE53987) 14 19 48% NA Caucasian 20,767 2.0 Katsel et al., 2005 Affymetrix U133a 20 17 38% NA NA 13,832

Total: 9 153 162 20,767

Cases Controls % % Predominant Genes Blood Studies Array Type (n) (n) Female Medicated Ancestry Analyzed Affymetrix U133 Plus Tsuang et al., 2005 10 15 48% 36% Asian 22,014 2.0 Tsuang et al., 2005 Affymetrix U133a 20 9 60% 64% Asian 13,977

Glatt et al., 2009 Affymetrix Exon 1.0 ST 13 8 33% 57% Caucasian 18,850 Affymetrix U133 Plus Glatt et al., 2011 8 12 50% 40% Caucasian 22,014 2.0 Affymetrix U133 Plus Van Beveren et al., 2012 41 29 0% NA NA 22,014 2.0 de Jong et al., 2012 Illumina HumanHT-8 v3 15 21 27% 0% Caucasian 18,516 Illumina HumanHT-12 de Jong et al., 2012 106 95 41% 45% Caucasian 20,156 v3 Illumina HumanHT-12 Kumarasinghe et al, 2013 9 11 40% 0% Asian 20,156 v3 Illumina HumanHT-12 Gardiner et al., 2013 78 78 48% 48% Caucasian 20,156 v3 Total: 9 300 278 19,737* NA = Not Available * Among the 23 755 mixed-effect models assessed in blood studies, only 19 737 converged to produce valid statistical results; failures to converge were related to the number of missing values for a particular gene’s expression level.

146 TABLE 2. SCHIZOPHRENIA GENE EXPRESSION STUDIES EXCLUDED FROM MEGA- ANALYSES. Studies of Brain Reason for Exclusion Thomas et al., 2011 Used a custom array Iwamoto et al., 2008 Samples overlapped with other SMRI collections Harris et al., 2008 Samples overlapped with other SMRI collections de Baumont et al., 2015 Samples overlapped with other SMRI collections Mirnics et al., 2000 Used an ineligible array platform Hashimoto et al. 2008 Used a custom array Wong et al. 2013 No microarray data for cases and controls Kimoto et al. 2015 Samples did not consist of tissue homogenates Datta et al. 2015 Samples did not consist of tissue homogenates Arion et al. 2015 Samples did not consist of tissue homogenates Studies of Blood Reason for Exclusion Cases and unaffected comparison subjects were Middleton et al., 2005 siblings Bowden et al., 2006 Used a custom array Chertkow et al., 2007 Used a custom array Kuzman et al., 2009 Unable to provide raw data Drexhage et al., 2010 Pooled samples for all cases Takahashi et al., 2010 Unable to provide raw data Kurian et al., 2011 No unaffected comparison subjects included Maschietto et al., 2012 Used a custom array Sanders et al., 2013 Used transformed lymphoblast cells Cases and unaffected comparison subjects were Stoll et al., 2013 family members Xu et al. 2016 Used an inegible array platform Wu et al. 2015 Samples overlapped with Gardiner et al. 2013 Zheutlin et al. 2016 Cases and controls were discordant twins Zhang et al. 2015 Used an inegible array platform Sun et al. 2015 Used an inegible array platform Santoro et al. 2015 No confirmed diagnosis of schizophrenia

147

TABLE 3. GENES SIGNIFICANTLY DYSREGULATED (FDRP < 0.05) IN SCHIZOPHRENIA ACROSS STUDIES OF POSTMORTEM BRAIN TISSUE Diagnostic Group Main Effect Gene Estimated Marginal Mean FDR Gene Product F p Symbol † Difference* q -Value THAP2 THAP domain containing, apoptosis associated protein 2 0.61 13.5 3.5E-04 0.09 SLC44A3 solute carrier family 44, member 3 0.60 19.9 1.4E-05 0.03 BAG3 BCL2-associated athanogene 3 0.59 20.9 7.9E-06 0.02 HSPB1 heat shock 27kDa protein-like 2 pseudogene; heat shock 27kDa protein 1 0.57 19.4 1.6E-05 0.03 ARRDC3 arrestin domain containing 3 0.56 16.5 7.0E-05 0.05 IGDCC4 heat shock 27kDa protein-like 2 pseudogene; heat shock 27kDa protein 1 0.54 17.9 3.7E-05 0.04 RHOBTB3 Rho-related BTB domain containing 3 0.54 24.6 1.2E-06 0.02 DPY19L3 dpy-19-like 3 (C. elegans) 0.52 14.8 1.6E-04 0.06 S100A8 S100 calcium binding protein A8 0.52 16.7 6.1E-05 0.05 SLC25A20 solute carrier family 25 (carnitine/acylcarnitine translocase), member 20 0.51 22.2 3.7E-06 0.02 ABCA1 ATP-binding cassette, sub-family A (ABC1), member 1 0.51 22.5 3.1E-06 0.02 ACSS3 acyl-CoA synthetase short-chain family member 3 0.50 21.8 4.6E-06 0.02 GABRA2 gamma-aminobutyric acid (GABA) A receptor, alpha 2 0.50 21.7 4.7E-06 0.02 PITPNC1 phosphatidylinositol transfer protein, cytoplasmic 1 0.50 21.3 5.6E-06 0.02 GLIS3 GLIS family zinc finger 3 0.50 13.4 3.3E-04 0.09 DHRS3 dehydrogenase/reductase (SDR family) member 3 0.50 21.3 5.8E-06 0.02 MIR568 microRNA 568 0.48 19.3 1.5E-05 0.03 PDLIM5 PDZ and LIM domain 5 0.47 19.1 1.7E-05 0.03 MEIS2 Meis homeobox 2 0.47 18.8 1.9E-05 0.03 potassium intermediate/small conductance calcium-activated channel, subfamily N, KCNN3 member 3 0.47 17.1 4.6E-05 0.04 WLS G protein-coupled receptor 177 0.46 18.4 2.4E-05 0.03 ALDH6A1 aldehyde dehydrogenase 6 family, member A1 0.46 18.0 2.9E-05 0.04 OXTR oxytocin receptor 0.46 17.6 3.5E-05 0.04 SCAF11 SR-related CTD-associated factor 11 0.45 18.0 3.0E-05 0.04 CADM1 cell adhesion molecule 1 0.45 17.5 3.7E-05 0.04 STAG2 stromal antigen 2 0.45 17.4 4.0E-05 0.04 similar to ABT1-associated protein; ESF1, nucleolar pre-rRNA processing protein, ESF1 homolog (S. cerevisiae) 0.45 16.3 6.7E-05 0.05 LPL lipoprotein lipase 0.44 17.0 4.9E-05 0.04 DTNA dystrobrevin, alpha 0.44 16.5 6.1E-05 0.05 BBX bobby sox homolog (Drosophila) 0.44 16.5 6.1E-05 0.05 SLC1A4 solute carrier family 1 (glutamate/neutral amino acid transporter), member 4 0.44 16.1 7.4E-05 0.05 NFE2L2 nuclear factor (erythroid-derived 2)-like 2 0.43 15.9 8.5E-05 0.05 SLCO1C1 solute carrier organic anion transporter family, member 1C1 0.43 15.8 8.8E-05 0.05 PABPC3 poly(A) binding protein, cytoplasmic 3 0.43 15.6 9.6E-05 0.05 GRAMD3 GRAM domain containing 3 0.43 15.7 9.2E-05 0.05 KRAS v-Ki-ras2 Kirsten rat sarcoma viral oncogene homolog 0.43 14.9 1.4E-04 0.06 SELENBP1 selenium binding protein 1 0.43 15.2 1.2E-04 0.06 PPAP2B phosphatidic acid phosphatase type 2B 0.42 15.2 1.2E-04 0.06 INHBB inhibin, beta B 0.42 14.7 1.5E-04 0.06

148 PPP1R3D protein phosphatase 1, regulatory (inhibitor) subunit 3D 0.42 15.2 1.2E-04 0.06 MAP3K5 mitogen-activated protein kinase kinase kinase 5 0.42 14.5 1.7E-04 0.06 AQP1 aquaporin 1 (Colton blood group) 0.42 15.9 8.5E-05 0.05 ADM adrenomedullin 0.42 15.0 1.3E-04 0.06 PON2 paraoxonase 2 0.42 14.7 1.5E-04 0.06 AK4 adenylate kinase 4 0.42 14.6 1.6E-04 0.06 EFEMP1 EGF-containing fibulin-like extracellular matrix protein 1 0.42 15.0 1.3E-04 0.06 FTLP3 ferritin, light polypeptide-like 1 0.42 14.1 2.1E-04 0.07 SERP1 stress-associated endoplasmic reticulum protein 1 0.41 14.1 2.0E-04 0.07 PMP2 peripheral myelin protein 2 0.41 12.7 4.3E-04 0.10 ETNPPL ethanolamine-phosphate phospho-lyase 0.41 14.1 2.1E-04 0.07 SLC15A2 solute carrier family 15 (H+/peptide transporter), member 2 0.41 13.6 2.7E-04 0.08 NTRK2 neurotrophic tyrosine kinase, receptor, type 2 0.41 14.0 2.2E-04 0.07 SMURF2 SMAD specific E3 ubiquitin protein ligase 2 0.41 13.8 2.4E-04 0.08 GFAP glial fibrillary acidic protein 0.41 14.8 1.5E-04 0.06 TMEM47 transmembrane protein 47 0.40 13.7 2.6E-04 0.08 CYBRD1 cytochrome b reductase 1 0.40 13.5 2.9E-04 0.08 metallothionein 1L (gene/pseudogene); metallothionein 1E; metallothionein 1 MT1E pseudogene 3; metallothionein 1J (pseudogene) 0.40 13.7 2.6E-04 0.08 CLU clusterin 0.40 13.6 2.6E-04 0.08 PCDH9 protocadherin 9 0.40 13.3 3.1E-04 0.08 KAL1 Kallmann syndrome 1 sequence 0.40 13.5 2.9E-04 0.08 PPIG peptidylprolyl isomerase G (cyclophilin G) 0.40 13.2 3.2E-04 0.09 WIF1 WNT inhibitory factor 1 0.40 13.2 3.2E-04 0.09 AQP4 aquaporin 4 0.40 13.4 2.9E-04 0.08 CASP7 caspase 7, apoptosis-related cysteine peptidase 0.39 13.0 3.6E-04 0.09 HSD17B6 hydroxysteroid (17-beta) dehydrogenase 6 homolog (mouse) 0.39 13.0 3.6E-04 0.09 PDLIM3 PDZ and LIM domain 3 0.39 13.0 3.6E-04 0.09 protein tyrosine phosphatase, non-receptor type 13 (APO-1/CD95 (Fas)-associated PTPN13 phosphatase) 0.39 12.9 3.9E-04 0.10 ZNF292 zinc finger protein 292 0.39 12.8 4.0E-04 0.10 FTL similar to ferritin, light polypeptide; ferritin, light polypeptide 0.39 12.7 4.3E-04 0.10 TMEM176A transmembrane protein 176A 0.39 12.8 4.0E-04 0.10 GRAMD1C GRAM domain containing 1C 0.39 12.7 4.3E-04 0.10 ATP6V0E1 ATPase, H+ transporting, lysosomal 9kDa, V0 subunit e1 0.39 12.7 4.3E-04 0.10 DECR1 2,4-dienoyl CoA reductase 1, mitochondrial 0.39 12.8 4.0E-04 0.10 CERK ceramide kinase -0.39 12.9 3.8E-04 0.10 CACNB3 calcium channel, voltage-dependent, beta 3 subunit -0.40 14.4 1.8E-04 0.07 IKBKG inhibitor of kappa light polypeptide gene enhancer in B-cells, kinase gamma -0.40 13.6 2.6E-04 0.08 SLC29A1 solute carrier family 29 (nucleoside transporters), member 1 -0.40 12.8 4.0E-04 0.10 APBA2 amyloid beta (A4) precursor protein-binding, family A, member 2 -0.41 13.9 2.3E-04 0.08 DEPDC5 DEP domain containing 5 -0.41 14.3 1.9E-04 0.07 PPP1R17 protein phosphatase 1, regulatory subunit 17 -0.41 14.3 1.8E-04 0.07 MCAT malonyl CoA:ACP acyltransferase (mitochondrial) -0.41 13.7 2.6E-04 0.08 NEK2 NIMA (never in mitosis gene a)-related kinase 2 -0.42 15.8 9.0E-05 0.05 TPMT thiopurine S-methyltransferase -0.42 14.7 1.5E-04 0.06 EMX1 empty spiracles homeobox 1 -0.42 14.5 1.7E-04 0.06 UBL4A ubiquitin-like 4A -0.43 15.7 9.4E-05 0.05 TOPBP1 topoisomerase (DNA) II binding protein 1 -0.43 15.2 1.2E-04 0.06 ST3GAL6 ST3 beta-galactoside alpha-2,3-sialyltransferase 6 -0.43 15.5 1.0E-04 0.05

149 TYRP1 tyrosinase-related protein 1 -0.43 16.0 8.0E-05 0.05 OPN3 opsin 3 -0.45 17.0 4.7E-05 0.04 PDCD6 aryl-hydrocarbon receptor repressor; programmed cell death 6 -0.46 17.7 3.4E-05 0.04 MPPE1 metallophosphoesterase 1 -0.50 21.7 4.7E-06 0.02 AQP11 aquaporin 11 -0.53 15.1 1.4E-04 0.06 Estimated Marginal Mean Differences are reported in units of the standard deviation for the gene’s expression values. Positive values for estimated marginal means reflect genes more highly expressed among SZ cases. *Rows are sorted by decreasing value of the difference in estimated marginal means between groups. † Rows in bold survived a Bonferonni-correction for family-wise error inflation.

150 TABLE 4. GENE SETS SIGNIFICANTLY DYSREGULATED AT A BONFERRONI P < 0.05 BASED ON PERMUTATIONS OF SINGLE-GENE TEST STATISTICS FROM THE BRAIN MEGA-ANALYSIS.

Brain - Brain - Blood – Corresponding Gene Set Bonferroni- Database Gene Set Name† Hypothesis Test with Significant Test Result Corrected (Bonferroni p < 0.05). p -Value All Up- msigdb_h_hallmarksets HALLMARK_TNFA_SIGNALING_VIA_NFKB Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_h_hallmarksets HALLMARK_HYPOXIA Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_h_hallmarksets HALLMARK_CHOLESTEROL_HOMEOSTASIS Regulated 0.001 -- All Up- msigdb_h_hallmarksets HALLMARK_IL6_JAK_STAT3_SIGNALING Regulated < 1.0E-6 -- All Up- msigdb_h_hallmarksets HALLMARK_APOPTOSIS Regulated 0.006 -- All Up- msigdb_h_hallmarksets HALLMARK_ADIPOGENESIS Regulated < 1.0E-6 -- All Up- msigdb_h_hallmarksets HALLMARK_ANDROGEN_RESPONSE Regulated 0.002 All Up-Regulated All Up- msigdb_h_hallmarksets HALLMARK_INTERFERON_GAMMA_RESPONSE Regulated < 1.0E-6 -- Mixed Up- msigdb_h_hallmarksets HALLMARK_APICAL_SURFACE Regulated 0.006 Mixed Up-Regulated All Up- msigdb_h_hallmarksets HALLMARK_COMPLEMENT Regulated 0.001 All Up-Regulated All Up- msigdb_h_hallmarksets HALLMARK_UNFOLDED_PROTEIN_RESPONSE Regulated 0.003 -- All Up- msigdb_h_hallmarksets HALLMARK_MTORC1_SIGNALING Regulated < 1.0E-6 -- All Up- msigdb_h_hallmarksets HALLMARK_MYC_TARGETS_V1 Regulated < 1.0E-6 -- All Up- msigdb_h_hallmarksets HALLMARK_EPITHELIAL_MESENCHYMAL_TRANSITION Regulated 0.009 -- All Up- msigdb_h_hallmarksets HALLMARK_INFLAMMATORY_RESPONSE Regulated 0.001 All Up-Regulated All Up- msigdb_h_hallmarksets HALLMARK_XENOBIOTIC_METABOLISM Regulated 0.012 -- All Up- msigdb_h_hallmarksets HALLMARK_FATTY_ACID_METABOLISM Regulated < 1.0E-6 -- All Up- msigdb_h_hallmarksets HALLMARK_P53_PATHWAY Regulated 0.000 -- All Up- msigdb_h_hallmarksets HALLMARK_UV_RESPONSE_DN Regulated 0.011 All Up-Regulated All Up- msigdb_h_hallmarksets HALLMARK_IL2_STAT5_SIGNALING Regulated 0.000 --

151 All Up- msigdb_h_hallmarksets HALLMARK_ALLOGRAFT_REJECTION Regulated 0.025 -- All Up- msigdb_h_hallmarksets HALLMARK_KRAS_SIGNALING_UP Regulated 0.002 -- All Up- msigdb_c1_chrregions chr1p31 Regulated 0.006 -- All Up- msigdb_c1_chrregions chr2p11 Regulated 0.033 -- All Up- msigdb_c1_chrregions chr8q21 Regulated 0.003 -- All Up- msigdb_c1_chrregions chr4q32 Regulated 0.016 -- All Up- msigdb_c2_curatedsets KEGG_VALINE_LEUCINE_AND_ISOLEUCINE_DEGRADATION Regulated 0.005 -- All Up- msigdb_c2_curatedsets KEGG_ADHERENS_JUNCTION Regulated 0.037 -- All Up- msigdb_c2_curatedsets KEGG_INSULIN_SIGNALING_PATHWAY Regulated < 1.0E-6 -- All Up- msigdb_c2_curatedsets KEGG_LEISHMANIA_INFECTION Regulated 0.005 All Up-Regulated All Up- msigdb_c2_curatedsets BIOCARTA_MAPK_PATHWAY Regulated < 1.0E-6 -- All Up- msigdb_c2_curatedsets BIOCARTA_P38MAPK_PATHWAY Regulated 0.014 -- All Up- msigdb_c2_curatedsets BIOCARTA_IL1R_PATHWAY Regulated 0.009 -- All Up- msigdb_c2_curatedsets PID_BCR_5PATHWAY Regulated 0.018 -- All Up- msigdb_c2_curatedsets PID_PDGFRBPATHWAY Regulated 0.005 All Up-Regulated All Up- msigdb_c2_curatedsets PID_VEGFR1_2_PATHWAY Regulated 0.014 -- All Up- msigdb_c2_curatedsets REACTOME_GENERIC_TRANSCRIPTION_PATHWAY Regulated 0.009 -- REACTOME_TRAF6_MEDIATED_INDUCTION_OF_NFKB_AND_ All Up- msigdb_c2_curatedsets MAP_KINASES_UPON_TLR7_8_OR_9_ACTIVATION Regulated 0.028 -- REACTOME_NFKB_AND_MAP_KINASES_ACTIVATION_MEDI All Up- msigdb_c2_curatedsets ATED_BY_TLR4_SIGNALING_REPERTOIRE Regulated 0.014 -- REACTOME_MYD88_MAL_CASCADE_INITIATED_ON_PLASM All Up- msigdb_c2_curatedsets A_MEMBRANE Regulated < 1.0E-6 -- All Up- msigdb_c2_curatedsets REACTOME_ACTIVATED_TLR4_SIGNALLING Regulated < 1.0E-6 -- All Up- msigdb_c2_curatedsets PICCALUGA_ANGIOIMMUNOBLASTIC_LYMPHOMA_UP Regulated < 1.0E-6 -- All Up- msigdb_c2_curatedsets PICCALUGA_ANGIOIMMUNOBLASTIC_LYMPHOMA_DN Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c2_curatedsets HOLLMANN_APOPTOSIS_VIA_CD40_DN Regulated 0.028 -- All Up- msigdb_c2_curatedsets LIU_PROSTATE_CANCER_DN Regulated < 1.0E-6 --

152 All Up- msigdb_c2_curatedsets SCHUETZ_BREAST_CANCER_DUCTAL_INVASIVE_UP Regulated 0.009 -- All Up- msigdb_c2_curatedsets SENGUPTA_NASOPHARYNGEAL_CARCINOMA_UP Regulated 0.018 -- All Up- msigdb_c2_curatedsets DAVICIONI_TARGETS_OF_PAX_FOXO1_FUSIONS_UP Regulated 0.009 -- SENGUPTA_NASOPHARYNGEAL_CARCINOMA_WITH_LMP All Up- msigdb_c2_curatedsets 1_UP Regulated < 1.0E-6 All Up-Regulated TURASHVILI_BREAST_DUCTAL_CARCINOMA_VS_DUCTAL Mixed Up- msigdb_c2_curatedsets _NORMAL_DN Regulated 0.005 Mixed Up-Regulated FULCHER_INFLAMMATORY_RESPONSE_LECTIN_VS_LPS_ All Up- msigdb_c2_curatedsets DN Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c2_curatedsets GARY_CD5_TARGETS_UP Regulated < 1.0E-6 -- All Up- msigdb_c2_curatedsets GARY_CD5_TARGETS_DN Regulated < 1.0E-6 -- All Up- msigdb_c2_curatedsets ZHOU_INFLAMMATORY_RESPONSE_LIVE_UP Regulated < 1.0E-6 -- All Up- msigdb_c2_curatedsets PRAMOONJAGO_SOX4_TARGETS_UP Regulated < 1.0E-6 -- All Up- msigdb_c2_curatedsets THUM_SYSTOLIC_HEART_FAILURE_UP Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c2_curatedsets THUM_SYSTOLIC_HEART_FAILURE_DN Regulated < 1.0E-6 -- All Up- msigdb_c2_curatedsets DEURIG_T_CELL_PROLYMPHOCYTIC_LEUKEMIA_DN Regulated < 1.0E-6 -- All Up- msigdb_c2_curatedsets LIU_CMYB_TARGETS_UP Regulated < 1.0E-6 -- All Up- msigdb_c2_curatedsets LIU_VMYB_TARGETS_UP Regulated 0.046 -- All Up- msigdb_c2_curatedsets CHARAFE_BREAST_CANCER_LUMINAL_VS_BASAL_DN Regulated < 1.0E-6 All Up-Regulated CHARAFE_BREAST_CANCER_LUMINAL_VS_MESENCHYM All Up- msigdb_c2_curatedsets AL_DN Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c2_curatedsets LAIHO_COLORECTAL_CANCER_SERRATED_UP Regulated < 1.0E-6 -- All Up- msigdb_c2_curatedsets BORCZUK_MALIGNANT_MESOTHELIOMA_UP Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c2_curatedsets HORIUCHI_WTAP_TARGETS_UP Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c2_curatedsets BASAKI_YBX1_TARGETS_DN Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c2_curatedsets RODRIGUES_DCC_TARGETS_DN Regulated 0.005 -- All Up- msigdb_c2_curatedsets WANG_LMO4_TARGETS_UP Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c2_curatedsets WANG_LMO4_TARGETS_DN Regulated 0.005 -- All Up- msigdb_c2_curatedsets WANG_CLIM2_TARGETS_DN Regulated < 1.0E-6 --

153 All Up- msigdb_c2_curatedsets VECCHI_GASTRIC_CANCER_EARLY_DN Regulated < 1.0E-6 -- SMIRNOV_CIRCULATING_ENDOTHELIOCYTES_IN_CANCE All Up- msigdb_c2_curatedsets R_UP Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c2_curatedsets OSMAN_BLADDER_CANCER_UP Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c2_curatedsets OSMAN_BLADDER_CANCER_DN Regulated < 1.0E-6 -- All Up- msigdb_c2_curatedsets GINESTIER_BREAST_CANCER_ZNF217_AMPLIFIED_UP Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c2_curatedsets GINESTIER_BREAST_CANCER_20Q13_AMPLIFICATION_UP Regulated 0.046 -- OSWALD_HEMATOPOIETIC_STEM_CELL_IN_COLLAGEN_GEL All Up- msigdb_c2_curatedsets _DN Regulated < 1.0E-6 -- GARGALOVIC_RESPONSE_TO_OXIDIZED_PHOSPHOLIPIDS_B All Up- msigdb_c2_curatedsets LUE_UP Regulated < 1.0E-6 -- All Up- msigdb_c2_curatedsets TAKEDA_TARGETS_OF_NUP98_HOXA9_FUSION_8D_DN Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c2_curatedsets RHEIN_ALL_GLUCOCORTICOID_THERAPY_UP Regulated 0.009 -- All Up- msigdb_c2_curatedsets RHEIN_ALL_GLUCOCORTICOID_THERAPY_DN Regulated < 1.0E-6 -- All Up- msigdb_c2_curatedsets MULLIGHAN_NPM1_MUTATED_SIGNATURE_1_UP Regulated 0.018 -- All Up- msigdb_c2_curatedsets MULLIGHAN_NPM1_SIGNATURE_3_UP Regulated < 1.0E-6 -- All Up- msigdb_c2_curatedsets TONKS_TARGETS_OF_RUNX1_RUNX1T1_FUSION_HSC_UP Regulated < 1.0E-6 All Up-Regulated TONKS_TARGETS_OF_RUNX1_RUNX1T1_FUSION_ERYTHROC All Up- msigdb_c2_curatedsets YTE_UP Regulated < 1.0E-6 -- All Up- msigdb_c2_curatedsets DITTMER_PTHLH_TARGETS_DN Regulated 0.014 -- All Up- msigdb_c2_curatedsets UDAYAKUMAR_MED1_TARGETS_UP Regulated 0.046 -- All Up- msigdb_c2_curatedsets UDAYAKUMAR_MED1_TARGETS_DN Regulated 0.005 All Up-Regulated All Up- msigdb_c2_curatedsets SENESE_HDAC1_TARGETS_UP Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c2_curatedsets SENESE_HDAC3_TARGETS_UP Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c2_curatedsets TIEN_INTESTINE_PROBIOTICS_24HR_DN Regulated < 1.0E-6 -- All Up- msigdb_c2_curatedsets KIM_WT1_TARGETS_DN Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c2_curatedsets KIM_WT1_TARGETS_8HR_UP Regulated < 1.0E-6 -- All Up- msigdb_c2_curatedsets KIM_WT1_TARGETS_12HR_UP Regulated 0.005 -- All Up- msigdb_c2_curatedsets ELVIDGE_HYPOXIA_UP Regulated 0.005 --

154 All Up- msigdb_c2_curatedsets ELVIDGE_HYPOXIA_BY_DMOG_UP Regulated < 1.0E-6 -- All Up- msigdb_c2_curatedsets JAATINEN_HEMATOPOIETIC_STEM_CELL_UP Regulated < 1.0E-6 -- All Up- msigdb_c2_curatedsets JAATINEN_HEMATOPOIETIC_STEM_CELL_DN Regulated 0.018 -- All Up- msigdb_c2_curatedsets GRAHAM_CML_DIVIDING_VS_NORMAL_QUIESCENT_DN Regulated < 1.0E-6 All Up-Regulated GRAHAM_NORMAL_QUIESCENT_VS_NORMAL_DIVIDING_ All Up- msigdb_c2_curatedsets UP Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c2_curatedsets BIDUS_METASTASIS_UP Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c2_curatedsets WAMUNYOKOLI_OVARIAN_CANCER_LMP_DN Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c2_curatedsets RODRIGUES_THYROID_CARCINOMA_ANAPLASTIC_DN Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c2_curatedsets HAHTOLA_MYCOSIS_FUNGOIDES_CD4_UP Regulated 0.005 -- All Up- msigdb_c2_curatedsets HAHTOLA_MYCOSIS_FUNGOIDES_CD4_DN Regulated 0.005 -- All Up- msigdb_c2_curatedsets PACHER_TARGETS_OF_IGF1_AND_IGF2_UP Regulated 0.023 -- All Up- msigdb_c2_curatedsets KOKKINAKIS_METHIONINE_DEPRIVATION_48HR_UP Regulated 0.014 -- All Up- msigdb_c2_curatedsets ENK_UV_RESPONSE_EPIDERMIS_DN Regulated < 1.0E-6 -- All Up- msigdb_c2_curatedsets ENK_UV_RESPONSE_KERATINOCYTE_DN Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c2_curatedsets DELYS_THYROID_CANCER_UP Regulated 0.028 -- CHIARADONNA_NEOPLASTIC_TRANSFORMATION_KRAS_ Mixed Up- msigdb_c2_curatedsets CDC25_DN Regulated < 1.0E-6 Mixed Up-Regulated CHIARADONNA_NEOPLASTIC_TRANSFORMATION_KRAS_ Mixed Up- msigdb_c2_curatedsets DN Regulated 0.023 Mixed Up-Regulated CHIARADONNA_NEOPLASTIC_TRANSFORMATION_CDC25_D All Up- msigdb_c2_curatedsets N Regulated 0.032 -- All Up- msigdb_c2_curatedsets BERENJENO_ROCK_SIGNALING_NOT_VIA_RHOA_DN Regulated 0.014 -- All Up- msigdb_c2_curatedsets LINDGREN_BLADDER_CANCER_CLUSTER_1_UP Regulated 0.009 -- All Up- msigdb_c2_curatedsets LINDGREN_BLADDER_CANCER_CLUSTER_2A_DN Regulated < 1.0E-6 -- All Up- msigdb_c2_curatedsets MARKEY_RB1_ACUTE_LOF_UP Regulated < 1.0E-6 -- All Up- msigdb_c2_curatedsets CONCANNON_APOPTOSIS_BY_EPOXOMICIN_UP Regulated < 1.0E-6 -- All Up- msigdb_c2_curatedsets GAUSSMANN_MLL_AF4_FUSION_TARGETS_F_UP Regulated 0.009 -- All Up- msigdb_c2_curatedsets BERENJENO_TRANSFORMED_BY_RHOA_DN Regulated 0.005 --

155 All Up- msigdb_c2_curatedsets DUNNE_TARGETS_OF_AML1_MTG8_FUSION_UP Regulated 0.023 All Up-Regulated All Up- msigdb_c2_curatedsets MCBRYAN_PUBERTAL_TGFB1_TARGETS_UP Regulated < 1.0E-6 -- All Up- msigdb_c2_curatedsets KAN_RESPONSE_TO_ARSENIC_TRIOXIDE Regulated < 1.0E-6 -- All Up- msigdb_c2_curatedsets SEIDEN_ONCOGENESIS_BY_MET Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c2_curatedsets LINDGREN_BLADDER_CANCER_CLUSTER_2B Regulated < 1.0E-6 -- All Up- msigdb_c2_curatedsets FARMER_BREAST_CANCER_BASAL_VS_LULMINAL Regulated 0.009 -- All Up- msigdb_c2_curatedsets SCHLOSSER_MYC_TARGETS_REPRESSED_BY_SERUM Regulated 0.005 -- All Up- msigdb_c2_curatedsets DAUER_STAT3_TARGETS_UP Regulated < 1.0E-6 -- All Up- msigdb_c2_curatedsets DACOSTA_UV_RESPONSE_VIA_ERCC3_COMMON_DN Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c2_curatedsets AMUNDSON_RESPONSE_TO_ARSENITE Regulated < 1.0E-6 -- All Up- msigdb_c2_curatedsets GRUETZMANN_PANCREATIC_CANCER_UP Regulated < 1.0E-6 -- All Up- msigdb_c2_curatedsets TOMLINS_PROSTATE_CANCER_DN Regulated 0.014 -- All Up- msigdb_c2_curatedsets GROSS_HYPOXIA_VIA_ELK3_DN Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c2_curatedsets GROSS_HYPOXIA_VIA_HIF1A_DN Regulated 0.014 -- All Up- msigdb_c2_curatedsets GROSS_HYPOXIA_VIA_ELK3_AND_HIF1A_UP Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c2_curatedsets LIU_COMMON_CANCER_GENES Regulated < 1.0E-6 -- All Up- msigdb_c2_curatedsets WEI_MIR34A_TARGETS Regulated < 1.0E-6 -- All Up- msigdb_c2_curatedsets CAFFAREL_RESPONSE_TO_THC_UP Regulated 0.032 -- All Up- msigdb_c2_curatedsets RICKMAN_TUMOR_DIFFERENTIATED_WELL_VS_POORLY_UP Regulated 0.046 -- All Up- msigdb_c2_curatedsets SCHAEFFER_PROSTATE_DEVELOPMENT_6HR_UP Regulated 0.005 All Up-Regulated All Up- msigdb_c2_curatedsets SCHAEFFER_PROSTATE_DEVELOPMENT_6HR_DN Regulated < 1.0E-6 -- All Up- msigdb_c2_curatedsets KOYAMA_SEMA3B_TARGETS_DN Regulated 0.023 -- All Up- msigdb_c2_curatedsets WU_CELL_MIGRATION Regulated < 1.0E-6 -- All Up- msigdb_c2_curatedsets COLIN_PILOCYTIC_ASTROCYTOMA_VS_GLIOBLASTOMA_UP Regulated < 1.0E-6 -- GOTZMANN_EPITHELIAL_TO_MESENCHYMAL_TRANSITION_ All Up- msigdb_c2_curatedsets UP Regulated 0.005 --

156 All Up- msigdb_c2_curatedsets BENPORATH_ES_1 Regulated 0.014 -- All Up- msigdb_c2_curatedsets BENPORATH_OCT4_TARGETS Regulated 0.018 -- All Up- msigdb_c2_curatedsets BENPORATH_NOS_TARGETS Regulated 0.005 -- All Up- msigdb_c2_curatedsets STARK_PREFRONTAL_CORTEX_22Q11_DELETION_UP Regulated 0.005 -- All Up- msigdb_c2_curatedsets SHEN_SMARCA2_TARGETS_UP Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c2_curatedsets ZHANG_RESPONSE_TO_IKK_INHIBITOR_AND_TNF_UP Regulated < 1.0E-6 -- All Up- msigdb_c2_curatedsets MORI_LARGE_PRE_BII_LYMPHOCYTE_DN Regulated < 1.0E-6 -- All Up- msigdb_c2_curatedsets MORI_MATURE_B_LYMPHOCYTE_UP Regulated 0.037 -- All Up- msigdb_c2_curatedsets NIKOLSKY_BREAST_CANCER_8Q12_Q22_AMPLICON Regulated < 1.0E-6 -- All Up- msigdb_c2_curatedsets DING_LUNG_CANCER_EXPRESSION_BY_COPY_NUMBER Regulated 0.018 -- All Up- msigdb_c2_curatedsets LE_EGR2_TARGETS_DN Regulated 0.046 -- All Up- msigdb_c2_curatedsets ASTON_MAJOR_DEPRESSIVE_DISORDER_DN Regulated 0.005 -- All Up- msigdb_c2_curatedsets GOLDRATH_HOMEOSTATIC_PROLIFERATION Regulated < 1.0E-6 -- FLECHNER_BIOPSY_KIDNEY_TRANSPLANT_REJECTED_VS_O All Up- msigdb_c2_curatedsets K_UP Regulated < 1.0E-6 -- All Up- msigdb_c2_curatedsets UEDA_PERIFERAL_CLOCK Regulated 0.023 -- All Up- msigdb_c2_curatedsets WIELAND_UP_BY_HBV_INFECTION Regulated < 1.0E-6 -- All Up- msigdb_c2_curatedsets DER_IFN_BETA_RESPONSE_UP Regulated 0.005 -- All Up- msigdb_c2_curatedsets BROCKE_APOPTOSIS_REVERSED_BY_IL6 Regulated < 1.0E-6 -- All Up- msigdb_c2_curatedsets OKUMURA_INFLAMMATORY_RESPONSE_LPS Regulated 0.005 -- All Up- msigdb_c2_curatedsets MANALO_HYPOXIA_UP Regulated 0.009 -- All Up- msigdb_c2_curatedsets GALINDO_IMMUNE_RESPONSE_TO_ENTEROTOXIN Regulated 0.009 -- All Up- msigdb_c2_curatedsets ZHAN_MULTIPLE_MYELOMA_CD1_AND_CD2_UP Regulated 0.009 -- All Up- msigdb_c2_curatedsets LENAOUR_DENDRITIC_CELL_MATURATION_DN Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c2_curatedsets VERHAAK_AML_WITH_NPM1_MUTATED_UP Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c2_curatedsets PETROVA_ENDOTHELIUM_LYMPHATIC_VS_BLOOD_DN Regulated 0.009 --

157 All Up- msigdb_c2_curatedsets LI_WILMS_TUMOR_VS_FETAL_KIDNEY_1_UP Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c2_curatedsets THEILGAARD_NEUTROPHIL_AT_SKIN_WOUND_DN Regulated 0.005 All Up-Regulated All Up- msigdb_c2_curatedsets MENSE_HYPOXIA_UP Regulated < 1.0E-6 -- All Up- msigdb_c2_curatedsets IGLESIAS_E2F_TARGETS_UP Regulated < 1.0E-6 -- All Up- msigdb_c2_curatedsets BASSO_CD40_SIGNALING_UP Regulated 0.041 -- All Up- msigdb_c2_curatedsets REN_ALVEOLAR_RHABDOMYOSARCOMA_DN Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c2_curatedsets BLALOCK_ALZHEIMERS_DISEASE_INCIPIENT_UP Regulated < 1.0E-6 -- All Up- msigdb_c2_curatedsets JIANG_AGING_CEREBRAL_CORTEX_UP Regulated 0.005 -- All Up- msigdb_c2_curatedsets DEBIASI_APOPTOSIS_BY_REOVIRUS_INFECTION_UP Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c2_curatedsets ONGUSAHA_TP53_TARGETS Regulated 0.014 -- All Up- msigdb_c2_curatedsets HARRIS_HYPOXIA Regulated 0.005 -- All Up- msigdb_c2_curatedsets RAMALHO_STEMNESS_UP Regulated < 1.0E-6 -- All Up- msigdb_c2_curatedsets WANG_SMARCE1_TARGETS_DN Regulated < 1.0E-6 -- All Up- msigdb_c2_curatedsets RODWELL_AGING_KIDNEY_NO_BLOOD_DN Regulated 0.005 -- All Up- msigdb_c2_curatedsets YAMAZAKI_TCEB3_TARGETS_UP Regulated < 1.0E-6 -- All Up- msigdb_c2_curatedsets BROWNE_INTERFERON_RESPONSIVE_GENES Regulated 0.023 -- All Up- msigdb_c2_curatedsets JI_RESPONSE_TO_FSH_DN Regulated 0.018 -- All Up- msigdb_c2_curatedsets BROWNE_HCMV_INFECTION_20HR_DN Regulated < 1.0E-6 -- All Up- msigdb_c2_curatedsets RUAN_RESPONSE_TO_TNF_DN Regulated 0.005 -- All Up- msigdb_c2_curatedsets LEONARD_HYPOXIA Regulated < 1.0E-6 -- All Up- msigdb_c2_curatedsets CHEN_LVAD_SUPPORT_OF_FAILING_HEART_UP Regulated < 1.0E-6 -- All Up- msigdb_c2_curatedsets LU_AGING_BRAIN_UP Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c2_curatedsets TAKAO_RESPONSE_TO_UVB_RADIATION_DN Regulated 0.018 -- All Up- msigdb_c2_curatedsets ZHENG_RESPONSE_TO_ARSENITE_UP Regulated < 1.0E-6 -- All Up- msigdb_c2_curatedsets GENTILE_UV_HIGH_DOSE_DN Regulated < 1.0E-6 All Up-Regulated

158 All Up- msigdb_c2_curatedsets KANG_DOXORUBICIN_RESISTANCE_DN Regulated 0.041 -- All Up- msigdb_c2_curatedsets MCLACHLAN_DENTAL_CARIES_DN Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c2_curatedsets RODWELL_AGING_KIDNEY_UP Regulated < 1.0E-6 -- All Up- msigdb_c2_curatedsets WELCSH_BRCA1_TARGETS_UP Regulated < 1.0E-6 -- Mixed Up- msigdb_c2_curatedsets BURTON_ADIPOGENESIS_7 Regulated 0.009 Mixed Up-Regulated All Up- msigdb_c2_curatedsets KEEN_RESPONSE_TO_ROSIGLITAZONE_DN Regulated < 1.0E-6 -- All Up- msigdb_c2_curatedsets HEDENFALK_BREAST_CANCER_BRCA1_VS_BRCA2 Regulated 0.009 -- All Up- msigdb_c2_curatedsets DAZARD_RESPONSE_TO_UV_NHEK_DN Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c2_curatedsets DAZARD_RESPONSE_TO_UV_SCC_UP Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c2_curatedsets JIANG_HYPOXIA_NORMAL Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c2_curatedsets BAELDE_DIABETIC_NEPHROPATHY_DN Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c2_curatedsets WANG_SMARCE1_TARGETS_UP Regulated < 1.0E-6 -- All Up- msigdb_c2_curatedsets MARTINEZ_RESPONSE_TO_TRABECTEDIN_DN Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c2_curatedsets MCLACHLAN_DENTAL_CARIES_UP Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c2_curatedsets BILD_HRAS_ONCOGENIC_SIGNATURE Regulated 0.005 -- All Up- msigdb_c2_curatedsets KRIGE_AMINO_ACID_DEPRIVATION Regulated 0.023 -- All Up- msigdb_c2_curatedsets DURCHDEWALD_SKIN_CARCINOGENESIS_DN Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c2_curatedsets MONNIER_POSTRADIATION_TUMOR_ESCAPE_DN Regulated 0.018 -- All Up- msigdb_c2_curatedsets CREIGHTON_ENDOCRINE_THERAPY_RESISTANCE_4 Regulated 0.028 -- All Up- msigdb_c2_curatedsets CREIGHTON_ENDOCRINE_THERAPY_RESISTANCE_5 Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c2_curatedsets LEIN_OLIGODENDROCYTE_MARKERS Regulated 0.041 -- All Up- msigdb_c2_curatedsets ZHENG_BOUND_BY_FOXP3 Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c2_curatedsets ZHENG_FOXP3_TARGETS_IN_THYMUS_UP Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c2_curatedsets RIGGINS_TAMOXIFEN_RESISTANCE_DN Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c2_curatedsets FOSTER_TOLERANT_MACROPHAGE_UP Regulated 0.037 --

159 All Up- msigdb_c2_curatedsets FOSTER_TOLERANT_MACROPHAGE_DN Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c2_curatedsets KYNG_DNA_DAMAGE_DN Regulated 0.014 -- All Up- msigdb_c2_curatedsets STEARMAN_LUNG_CANCER_EARLY_VS_LATE_UP Regulated < 1.0E-6 -- All Up- msigdb_c2_curatedsets HELLER_SILENCED_BY_METHYLATION_DN Regulated < 1.0E-6 -- All Up- msigdb_c2_curatedsets HELLER_HDAC_TARGETS_DN Regulated < 1.0E-6 -- All Up- msigdb_c2_curatedsets HELLER_HDAC_TARGETS_SILENCED_BY_METHYLATION_UP Regulated 0.046 -- All Up- msigdb_c2_curatedsets HELLER_HDAC_TARGETS_SILENCED_BY_METHYLATION_DN Regulated < 1.0E-6 -- All Up- msigdb_c2_curatedsets SARRIO_EPITHELIAL_MESENCHYMAL_TRANSITION_DN Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c2_curatedsets DE_YY1_TARGETS_DN Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c2_curatedsets MITSIADES_RESPONSE_TO_APLIDIN_UP Regulated < 1.0E-6 -- All Up- msigdb_c2_curatedsets SMID_BREAST_CANCER_RELAPSE_IN_BONE_DN Regulated 0.032 -- All Up- msigdb_c2_curatedsets SMID_BREAST_CANCER_NORMAL_LIKE_UP Regulated 0.046 -- All Up- msigdb_c2_curatedsets BASSO_HAIRY_CELL_LEUKEMIA_DN Regulated < 1.0E-6 -- All Up- msigdb_c2_curatedsets ZHANG_BREAST_CANCER_PROGENITORS_UP Regulated < 1.0E-6 -- Mixed Up- msigdb_c2_curatedsets BOQUEST_STEM_CELL_UP Regulated < 1.0E-6 Mixed Up-Regulated All Up- msigdb_c2_curatedsets BOQUEST_STEM_CELL_CULTURED_VS_FRESH_UP Regulated < 1.0E-6 -- All Up- msigdb_c2_curatedsets PODAR_RESPONSE_TO_ADAPHOSTIN_UP Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c2_curatedsets CHEN_HOXA5_TARGETS_9HR_UP Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c2_curatedsets QI_PLASMACYTOMA_UP Regulated 0.046 -- All Up- msigdb_c2_curatedsets BLUM_RESPONSE_TO_SALIRASIB_UP Regulated 0.032 -- All Up- msigdb_c2_curatedsets WINTER_HYPOXIA_METAGENE Regulated < 1.0E-6 -- All Up- msigdb_c2_curatedsets ALONSO_METASTASIS_UP Regulated 0.005 -- All Up- msigdb_c2_curatedsets WANG_TUMOR_INVASIVENESS_UP Regulated < 1.0E-6 -- All Up- msigdb_c2_curatedsets WANG_TUMOR_INVASIVENESS_DN Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c2_curatedsets LEE_LIVER_CANCER_SURVIVAL_DN Regulated < 1.0E-6 --

160 All Up- msigdb_c2_curatedsets BOYLAN_MULTIPLE_MYELOMA_C_D_DN Regulated 0.041 -- All Up- msigdb_c2_curatedsets TOOKER_GEMCITABINE_RESISTANCE_DN Regulated < 1.0E-6 -- All Up- msigdb_c2_curatedsets SEKI_INFLAMMATORY_RESPONSE_LPS_UP Regulated < 1.0E-6 -- All Up- msigdb_c2_curatedsets LINDSTEDT_DENDRITIC_CELL_MATURATION_D Regulated 0.023 -- All Up- msigdb_c2_curatedsets NUTT_GBM_VS_AO_GLIOMA_UP Regulated 0.009 -- All Up- msigdb_c2_curatedsets YOSHIMURA_MAPK8_TARGETS_DN Regulated < 1.0E-6 -- All Up- msigdb_c2_curatedsets RUIZ_TNC_TARGETS_UP Regulated 0.028 -- All Up- msigdb_c2_curatedsets RUTELLA_RESPONSE_TO_CSF2RB_AND_IL4_UP Regulated < 1.0E-6 -- All Up- msigdb_c2_curatedsets RUTELLA_RESPONSE_TO_CSF2RB_AND_IL4_DN Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c2_curatedsets RUTELLA_RESPONSE_TO_HGF_UP Regulated < 1.0E-6 -- All Up- msigdb_c2_curatedsets RUTELLA_RESPONSE_TO_HGF_DN Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c2_curatedsets RUTELLA_RESPONSE_TO_HGF_VS_CSF2RB_AND_IL4_UP Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c2_curatedsets RUTELLA_RESPONSE_TO_HGF_VS_CSF2RB_AND_IL4_DN Regulated < 1.0E-6 -- All Up- msigdb_c2_curatedsets CHANG_CORE_SERUM_RESPONSE_DN Regulated < 1.0E-6 -- All Up- msigdb_c2_curatedsets SWEET_LUNG_CANCER_KRAS_UP Regulated < 1.0E-6 -- All Up- msigdb_c2_curatedsets LEE_RECENT_THYMIC_EMIGRANT Regulated < 1.0E-6 -- All Up- msigdb_c2_curatedsets LEE_DIFFERENTIATING_T_LYMPHOCYTE Regulated 0.005 -- All Up- msigdb_c2_curatedsets ZHANG_TLX_TARGETS_36HR_DN Regulated 0.009 -- All Up- msigdb_c2_curatedsets HAN_SATB1_TARGETS_DN Regulated < 1.0E-6 -- All Up- msigdb_c2_curatedsets COLINA_TARGETS_OF_4EBP1_AND_4EBP2 Regulated < 1.0E-6 -- SHAFFER_IRF4_TARGETS_IN_MYELOMA_VS_MATURE_B_LY All Up- msigdb_c2_curatedsets MPHOCYTE Regulated 0.005 -- All Up- msigdb_c2_curatedsets HSIAO_HOUSEKEEPING_GENES Regulated < 1.0E-6 -- All Up- msigdb_c2_curatedsets MILI_PSEUDOPODIA_CHEMOTAXIS_DN Regulated < 1.0E-6 -- All Up- msigdb_c2_curatedsets MILI_PSEUDOPODIA_HAPTOTAXIS_UP Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c2_curatedsets ROME_INSULIN_TARGETS_IN_MUSCLE_UP Regulated < 1.0E-6 --

161 All Up- msigdb_c2_curatedsets BOYAULT_LIVER_CANCER_SUBCLASS_G5_DN Regulated 0.009 -- All Up- msigdb_c2_curatedsets CHIANG_LIVER_CANCER_SUBCLASS_UNANNOTATED_DN Regulated 0.005 -- All Up- msigdb_c2_curatedsets MARSON_FOXP3_TARGETS_UP Regulated < 1.0E-6 -- Mixed Up- msigdb_c2_curatedsets SCHOEN_NFKB_SIGNALING Regulated 0.009 Mixed Up-Regulated All Up- msigdb_c2_curatedsets SAKAI_CHRONIC_HEPATITIS_VS_LIVER_CANCER_UP Regulated < 1.0E-6 -- All Up- msigdb_c2_curatedsets KOBAYASHI_EGFR_SIGNALING_24HR_UP Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c2_curatedsets HOSHIDA_LIVER_CANCER_SUBCLASS_S1 Regulated < 1.0E-6 -- All Up- msigdb_c2_curatedsets HOSHIDA_LIVER_CANCER_SUBCLASS_S2 Regulated 0.023 -- Mixed Up- msigdb_c2_curatedsets HOSHIDA_LIVER_CANCER_SUBCLASS_S3 Regulated 0.005 Mixed Up-Regulated All Up- msigdb_c2_curatedsets DANG_REGULATED_BY_MYC_DN Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c2_curatedsets NAKAMURA_ADIPOGENESIS_EARLY_UP Regulated < 1.0E-6 -- All Up- msigdb_c2_curatedsets KARLSSON_TGFB1_TARGETS_DN Regulated < 1.0E-6 -- All Up- msigdb_c2_curatedsets VERHAAK_GLIOBLASTOMA_MESENCHYMAL Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c2_curatedsets HIRSCH_CELLULAR_TRANSFORMATION_SIGNATURE_UP Regulated 0.005 All Up-Regulated All Up- msigdb_c2_curatedsets WANG_RESPONSE_TO_GSK3_INHIBITOR_SB216763_UP Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c2_curatedsets GREGORY_SYNTHETIC_LETHAL_WITH_IMATINIB Regulated 0.014 -- All Up- msigdb_c2_curatedsets LU_EZH2_TARGETS_DN Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c2_curatedsets HOELZEL_NF1_TARGETS_UP Regulated < 1.0E-6 -- All Up- msigdb_c2_curatedsets DEMAGALHAES_AGING_UP Regulated < 1.0E-6 -- All Up- msigdb_c2_curatedsets DUTERTRE_ESTRADIOL_RESPONSE_24HR_DN Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c2_curatedsets GABRIELY_MIR21_TARGETS Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c2_curatedsets JOHNSTONE_PARVB_TARGETS_1_DN Regulated < 1.0E-6 -- All Up- msigdb_c2_curatedsets JOHNSTONE_PARVB_TARGETS_2_DN Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c2_curatedsets JOHNSTONE_PARVB_TARGETS_3_UP Regulated < 1.0E-6 -- All Up- msigdb_c2_curatedsets MIYAGAWA_TARGETS_OF_EWSR1_ETS_FUSIONS_DN Regulated 0.005 --

162 All Up- msigdb_c2_curatedsets PASINI_SUZ12_TARGETS_DN Regulated < 1.0E-6 -- All Up- msigdb_c2_curatedsets TORCHIA_TARGETS_OF_EWSR1_FLI1_FUSION_DN Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c2_curatedsets IKEDA_MIR30_TARGETS_UP Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c2_curatedsets SERVITJA_ISLET_HNF1A_TARGETS_UP Regulated < 1.0E-6 -- All Up- msigdb_c2_curatedsets PEDERSEN_METASTASIS_BY_ERBB2_ISOFORM_7 Regulated < 1.0E-6 -- WAKABAYASHI_ADIPOGENESIS_PPARG_RXRA_BOUND_WIT All Up- msigdb_c2_curatedsets H_H4K20ME1_MARK Regulated 0.009 -- All Up- msigdb_c2_curatedsets PLASARI_TGFB1_TARGETS_10HR_DN Regulated < 1.0E-6 -- All Up- msigdb_c2_curatedsets PLASARI_TGFB1_SIGNALING_VIA_NFIC_1HR_DN Regulated < 1.0E-6 -- All Up- msigdb_c2_curatedsets FORTSCHEGGER_PHF8_TARGETS_UP Regulated < 1.0E-6 -- All Up- msigdb_c2_curatedsets PHONG_TNF_RESPONSE_VIA_P38_COMPLETE Regulated < 1.0E-6 -- All Up- msigdb_c2_curatedsets PHONG_TNF_RESPONSE_VIA_P38_PARTIAL Regulated < 1.0E-6 -- All Up- msigdb_c2_curatedsets PHONG_TNF_RESPONSE_NOT_VIA_P38 Regulated < 1.0E-6 -- All Up- msigdb_c2_curatedsets RAO_BOUND_BY_SALL4_ISOFORM_B Regulated 0.028 -- All Up- msigdb_c2_curatedsets RAO_BOUND_BY_SALL4 Regulated < 1.0E-6 -- All Up- msigdb_c2_curatedsets PECE_MAMMARY_STEM_CELL_DN Regulated < 1.0E-6 -- ALTEMEIER_RESPONSE_TO_LPS_WITH_MECHANICAL_VENTI All Up- msigdb_c2_curatedsets LATION Regulated < 1.0E-6 -- All Up- msigdb_c2_curatedsets LIM_MAMMARY_LUMINAL_MATURE_DN Regulated < 1.0E-6 -- All Up- msigdb_c2_curatedsets GHANDHI_DIRECT_IRRADIATION_UP Regulated 0.005 -- All Up- msigdb_c2_curatedsets GHANDHI_BYSTANDER_IRRADIATION_UP Regulated < 1.0E-6 -- All Up- msigdb_c2_curatedsets ZWANG_CLASS_1_TRANSIENTLY_INDUCED_BY_EGF Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c2_curatedsets ZWANG_CLASS_3_TRANSIENTLY_INDUCED_BY_EGF Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c3_regulatory V$E4BP4_01 Regulated 0.036 -- All Up- msigdb_c3_regulatory V$CDP_02 Regulated 0.017 -- All Up- msigdb_c3_regulatory V$CEBPB_02 Regulated < 1.0E-6 -- All Up- msigdb_c3_regulatory V$USF_01 Regulated 0.012 --

163 All Up- msigdb_c3_regulatory V$USF_02 Regulated 0.006 -- All Up- msigdb_c3_regulatory V$SRY_01 Regulated 0.041 -- All Up- msigdb_c3_regulatory V$ARNT_01 Regulated 0.008 -- All Up- msigdb_c3_regulatory V$CHOP_01 Regulated 0.003 -- All Up- msigdb_c3_regulatory V$HLF_01 Regulated 0.035 -- All Up- msigdb_c3_regulatory GCGSCMNTTT_UNKNOWN Regulated 0.029 -- All Up- msigdb_c3_regulatory V$SRF_Q5_01 Regulated 0.021 -- AGCACTT,MIR-93,MIR-302A,MIR-302B,MIR-302C,MIR- 302D,MIR-372,MIR-373,MIR-520E,MIR-520A,MIR-526B,MIR- All Up- msigdb_c3_regulatory 520B,MIR-520C,MIR-520D Regulated 0.008 All Up-Regulated All Up- msigdb_c3_regulatory GTGCAAT,MIR-25,MIR-32,MIR-92,MIR-363,MIR-367 Regulated 0.025 All Up-Regulated All Up- msigdb_c3_regulatory TGAATGT,MIR-181A,MIR-181B,MIR-181C,MIR-181D Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c3_regulatory TGCACTG,MIR-148A,MIR-152,MIR-148B Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c3_regulatory TTGCACT,MIR-130A,MIR-301,MIR-130B Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c3_regulatory TGCACTT,MIR-519C,MIR-519B,MIR-519A Regulated 0.009 All Up-Regulated All Up- msigdb_c3_regulatory CAGTATT,MIR-200B,MIR-200C,MIR-429 Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c3_regulatory ACATTCC,MIR-1,MIR-206 Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c3_regulatory AAAGGGA,MIR-204,MIR-211 Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c3_regulatory AATGTGA,MIR-23A,MIR-23B Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c3_regulatory ACTGAAA,MIR-30A-3P,MIR-30E-3P Regulated 0.014 All Up-Regulated All Up- msigdb_c3_regulatory TACTTGA,MIR-26A,MIR-26B Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c3_regulatory CAGTGTT,MIR-141,MIR-200A Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c3_regulatory ACTGTGA,MIR-27A,MIR-27B Regulated < 1.0E-6 -- All Up- msigdb_c3_regulatory GACTGTT,MIR-212,MIR-132 Regulated 0.002 -- All Up- msigdb_c3_regulatory AAGCCAT,MIR-135A,MIR-135B Regulated < 1.0E-6 -- All Up- msigdb_c3_regulatory ATGTTAA,MIR-302C Regulated < 1.0E-6 All Up-Regulated

164 All Up- msigdb_c3_regulatory CATTTCA,MIR-203 Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c3_regulatory AGTCAGC,MIR-345 Regulated 0.041 -- All Up- msigdb_c3_regulatory CTATGCA,MIR-153 Regulated < 1.0E-6 -- All Up- msigdb_c3_regulatory GTGCCAT,MIR-183 Regulated 0.001 -- All Up- msigdb_c3_regulatory TATTATA,MIR-374 Regulated 0.024 -- All Up- msigdb_c3_regulatory CTTGTAT,MIR-381 Regulated 0.007 All Up-Regulated All Up- msigdb_c3_regulatory AAAGGAT,MIR-501 Regulated < 1.0E-6 -- All Up- msigdb_c3_regulatory CTACTGT,MIR-199A Regulated 0.008 -- All Up- msigdb_c3_regulatory GTGTTGA,MIR-505 Regulated 0.009 All Up-Regulated All Up- msigdb_c3_regulatory ATACTGT,MIR-144 Regulated < 1.0E-6 -- All Up- msigdb_c3_regulatory GTATTAT,MIR-369-3P Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c3_regulatory GGGCATT,MIR-365 Regulated 0.007 -- All Up- msigdb_c3_regulatory GTGACTT,MIR-224 Regulated 0.001 All Up-Regulated All Up- msigdb_c3_regulatory GAGACTG,MIR-452 Regulated 0.014 -- All Up- msigdb_c3_regulatory ATAGGAA,MIR-202 Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c3_regulatory TTGCCAA,MIR-182 Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c3_regulatory TAGCTTT,MIR-9 Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c3_regulatory ATGTTTC,MIR-494 Regulated 0.011 -- All Up- msigdb_c3_regulatory GTACTGT,MIR-101 Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c3_regulatory AGGAAGC,MIR-516-3P Regulated 0.028 -- All Up- msigdb_c3_regulatory ATATGCA,MIR-448 Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c3_regulatory ATGAAGG,MIR-205 Regulated 0.006 All Up-Regulated All Up- msigdb_c3_regulatory CAGCTTT,MIR-320 Regulated < 1.0E-6 -- All Up- msigdb_c3_regulatory CATGTAA,MIR-496 Regulated 0.002 All Up-Regulated All Up- msigdb_c3_regulatory AAGCACA,MIR-218 Regulated 0.008 --

165 All Up- msigdb_c3_regulatory ATACCTC,MIR-202 Regulated 0.006 -- All Up- msigdb_c3_regulatory AGCATTA,MIR-155 Regulated 0.024 All Up-Regulated All Up- msigdb_c3_regulatory GCAAAAA,MIR-129 Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c3_regulatory CTTTGTA,MIR-524 Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c3_regulatory CTGTTAC,MIR-194 Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c3_regulatory ACATATC,MIR-190 Regulated 0.011 -- All Up- msigdb_c3_regulatory AAGCAAT,MIR-137 Regulated 0.019 All Up-Regulated All Up- msigdb_c3_regulatory ACCAAAG,MIR-9 Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c3_regulatory TTTGTAG,MIR-520D Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c3_regulatory ATAAGCT,MIR-21 Regulated 0.001 All Up-Regulated All Up- msigdb_c3_regulatory TCTGATA,MIR-361 Regulated 0.005 -- All Up- msigdb_c3_regulatory ACTTTAT,MIR-142-5P Regulated < 1.0E-6 -- All Up- msigdb_c3_regulatory TTTGCAG,MIR-518A-2 Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c3_regulatory AACATTC,MIR-409-3P Regulated 0.018 -- All Up- msigdb_c3_regulatory GTGCCAA,MIR-96 Regulated < 1.0E-6 -- All Up- msigdb_c3_regulatory TCATCTC,MIR-143 Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c3_regulatory AGTCTTA,MIR-499 Regulated 0.047 -- All Up- msigdb_c3_regulatory ACCATTT,MIR-522 Regulated 0.001 -- All Up- msigdb_c3_regulatory AACTGGA,MIR-145 Regulated 0.002 All Up-Regulated All Up- msigdb_c3_regulatory AAGCACT,MIR-520F Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c3_regulatory ACTGTAG,MIR-139 Regulated < 1.0E-6 -- All Up- msigdb_c3_regulatory GTGCAAA,MIR-507 Regulated 0.036 -- All Up- msigdb_c3_regulatory TTTTGAG,MIR-373 Regulated < 1.0E-6 -- All Up- msigdb_c3_regulatory ATGTACA,MIR-493 Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c3_regulatory GCATTTG,MIR-105 Regulated < 1.0E-6 All Up-Regulated

166 All Up- msigdb_c3_regulatory GTTATAT,MIR-410 Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c3_regulatory ATTCTTT,MIR-186 Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c3_regulatory ATCATGA,MIR-433 Regulated 0.043 -- All Up- msigdb_c3_regulatory CTTTGCA,MIR-527 Regulated 0.006 -- All Up- msigdb_c3_regulatory ATTACAT,MIR-380-3P Regulated 0.002 -- All Up- msigdb_c3_regulatory TGCAAAC,MIR-452 Regulated 0.004 -- All Up- msigdb_c3_regulatory TAATGTG,MIR-323 Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c3_regulatory SMTTTTGT_UNKNOWN Regulated < 1.0E-6 -- All Up- msigdb_c3_regulatory TTCYRGAA_UNKNOWN Regulated 0.011 -- All Up- msigdb_c3_regulatory TTAYRTAA_V$E4BP4_01 Regulated 0.038 -- All Up- msigdb_c3_regulatory TTCNRGNNNNTTC_V$HSF_Q6 Regulated 0.001 -- All Up- msigdb_c3_regulatory RGAANNTTC_V$HSF1_01 Regulated < 1.0E-6 -- All Up- msigdb_c3_regulatory CATTGTYY_V$SOX9_B1 Regulated 0.009 -- All Up- msigdb_c5_GOterms CELLULAR_BIOSYNTHETIC_PROCESS Regulated 0.049 -- All Up- msigdb_c5_GOterms PROTEIN_KINASE_CASCADE Regulated 0.013 -- All Up- msigdb_c5_GOterms MRNA_METABOLIC_PROCESS Regulated 0.043 -- All Up- msigdb_c5_GOterms REGULATION_OF_RNA_METABOLIC_PROCESS Regulated 0.035 -- All Up- msigdb_c5_GOterms RNA_BINDING Regulated < 1.0E-6 -- All Up- msigdb_c6_oncogenic E2F1_UP.V1_DN Regulated 0.000 -- All Up- msigdb_c6_oncogenic EGFR_UP.V1_DN Regulated 0.000 -- All Up- msigdb_c6_oncogenic EGFR_UP.V1_UP Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c6_oncogenic ERB2_UP.V1_DN Regulated 0.001 All Up-Regulated All Up- msigdb_c6_oncogenic GCNP_SHH_UP_EARLY.V1_UP Regulated 0.003 -- All Up- msigdb_c6_oncogenic HINATA_NFKB_IMMU_INF Regulated 0.010 -- All Up- msigdb_c6_oncogenic CSR_EARLY_UP.V1_UP Regulated 0.002 --

167 All Up- msigdb_c6_oncogenic PIGF_UP.V1_UP Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c6_oncogenic VEGF_A_UP.V1_DN Regulated 0.012 All Up-Regulated All Up- msigdb_c6_oncogenic ATF2_UP.V1_DN Regulated 0.001 -- All Up- msigdb_c6_oncogenic CAMP_UP.V1_DN Regulated 0.013 -- All Up- msigdb_c6_oncogenic LTE2_UP.V1_DN Regulated 0.000 -- All Up- msigdb_c6_oncogenic LTE2_UP.V1_UP Regulated 0.000 All Up-Regulated All Up- msigdb_c6_oncogenic MEK_UP.V1_DN Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c6_oncogenic RAF_UP.V1_UP Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c6_oncogenic MTOR_UP.N4.V1_UP Regulated 0.044 -- Mixed Up- msigdb_c6_oncogenic ESC_V6.5_UP_EARLY.V1_DN Regulated 0.000 Mixed Up-Regulated Mixed Up- msigdb_c6_oncogenic BMI1_DN_MEL18_DN.V1_UP Regulated 0.014 Mixed Up-Regulated All Up- msigdb_c6_oncogenic EIF4E_DN Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c6_oncogenic NRL_DN.V1_DN Regulated 0.011 -- All Up- msigdb_c6_oncogenic RB_P130_DN.V1_DN Regulated 0.023 -- All Up- msigdb_c6_oncogenic CAHOY_ASTROCYTIC Regulated 0.019 -- All Up- msigdb_c6_oncogenic RPS14_DN.V1_UP Regulated 0.001 All Up-Regulated All Up- msigdb_c6_oncogenic HOXA9_DN.V1_UP Regulated 0.034 All Up-Regulated All Up- msigdb_c6_oncogenic STK33_NOMO_UP Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c6_oncogenic STK33_SKM_UP Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c6_oncogenic STK33_UP Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c6_oncogenic TBK1.DF_DN Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c6_oncogenic TBK1.DF_UP Regulated 0.013 -- All Up- msigdb_c6_oncogenic JAK2_DN.V1_DN Regulated 0.007 -- All Up- msigdb_c6_oncogenic KRAS.KIDNEY_UP.V1_UP Regulated 0.001 -- All Up- msigdb_c7_immuno KAECH_NAIVE_VS_DAY8_EFF_CD8_TCELL_UP Regulated 0.008 --

168 All Up- msigdb_c7_immuno KAECH_NAIVE_VS_DAY8_EFF_CD8_TCELL_DN Regulated 0.010 -- All Up- msigdb_c7_immuno KAECH_NAIVE_VS_DAY15_EFF_CD8_TCELL_UP Regulated < 1.0E-6 -- All Up- msigdb_c7_immuno KAECH_NAIVE_VS_DAY15_EFF_CD8_TCELL_DN Regulated < 1.0E-6 -- All Up- msigdb_c7_immuno KAECH_NAIVE_VS_MEMORY_CD8_TCELL_UP Regulated < 1.0E-6 -- All Up- msigdb_c7_immuno KAECH_DAY8_EFF_VS_MEMORY_CD8_TCELL_UP Regulated 0.002 -- All Up- msigdb_c7_immuno GOLDRATH_NAIVE_VS_MEMORY_CD8_TCELL_DN Regulated 0.008 -- All Up- msigdb_c7_immuno GSE10239_NAIVE_VS_MEMORY_CD8_TCELL_UP Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c7_immuno GSE10239_MEMORY_VS_KLRG1HIGH_EFF_CD8_TCELL_DN Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c7_immuno GSE10239_KLRG1INT_VS_KLRG1HIGH_EFF_CD8_TCELL_DN Regulated < 1.0E-6 -- All Up- msigdb_c7_immuno GSE10325_CD4_TCELL_VS_MYELOID_DN Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c7_immuno GSE10325_BCELL_VS_MYELOID_DN Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c7_immuno GSE10325_LUPUS_CD4_TCELL_VS_LUPUS_MYELOID_DN Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c7_immuno GSE10325_LUPUS_BCELL_VS_LUPUS_MYELOID_DN Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c7_immuno GSE10325_BCELL_VS_LUPUS_BCELL_DN Regulated < 1.0E-6 -- All Up- msigdb_c7_immuno GSE10325_MYELOID_VS_LUPUS_MYELOID_DN Regulated < 1.0E-6 -- All Up- msigdb_c7_immuno GSE11057_NAIVE_VS_EFF_MEMORY_CD4_TCELL_UP Regulated 0.002 -- All Up- msigdb_c7_immuno GSE11057_NAIVE_VS_EFF_MEMORY_CD4_TCELL_DN Regulated < 1.0E-6 -- All Up- msigdb_c7_immuno GSE11057_NAIVE_VS_CENT_MEMORY_CD4_TCELL_DN Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c7_immuno GSE11057_NAIVE_CD4_VS_PBMC_CD4_TCELL_DN Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c7_immuno GSE11057_CD4_EFF_MEM_VS_PBMC_DN Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c7_immuno GSE11057_CD4_CENT_MEM_VS_PBMC_DN Regulated 0.008 All Up-Regulated All Up- msigdb_c7_immuno GSE11057_NAIVE_VS_MEMORY_CD4_TCELL_DN Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c7_immuno GSE12366_GC_VS_NAIVE_BCELL_UP Regulated 0.002 -- All Up- msigdb_c7_immuno GSE12366_GC_VS_MEMORY_BCELL_DN Regulated 0.019 -- All Up- msigdb_c7_immuno GSE12366_PLASMA_CELL_VS_NAIVE_BCELL_DN Regulated 0.002 --

169 All Up- msigdb_c7_immuno GSE12366_PLASMA_CELL_VS_MEMORY_BCELL_DN Regulated < 1.0E-6 -- All Up- msigdb_c7_immuno GSE12366_NAIVE_VS_MEMORY_BCELL_DN Regulated 0.002 -- All Up- msigdb_c7_immuno GSE12845_IGD_NEG_BLOOD_VS_NAIVE_TONSIL_BCELL_DN Regulated 0.036 -- GSE12845_IGD_NEG_BLOOD_VS_DARKZONE_GC_TONSIL_BC All Up- msigdb_c7_immuno ELL_DN Regulated < 1.0E-6 -- Mixed Up- msigdb_c7_immuno GSE12845_NAIVE_VS_DARKZONE_GC_TONSIL_BCELL_UP Regulated 0.031 Mixed Up-Regulated All Up- msigdb_c7_immuno GSE13306_RA_VS_UNTREATED_MEM_CD4_TCELL_DN Regulated 0.002 -- All Up- msigdb_c7_immuno GSE13484_3H_UNSTIM_VS_YF17D_VACCINE_STIM_PBMC_UP Regulated 0.040 -- All Up- msigdb_c7_immuno GSE13484_3H_UNSTIM_VS_YF17D_VACCINE_STIM_PBMC_DN Regulated 0.019 -- All Up- msigdb_c7_immuno GSE13484_UNSTIM_VS_YF17D_VACCINE_STIM_PBMC_UP Regulated < 1.0E-6 -- All Up- msigdb_c7_immuno GSE13484_UNSTIM_VS_YF17D_VACCINE_STIM_PBMC_DN Regulated 0.013 -- All Up- msigdb_c7_immuno GSE13485_CTRL_VS_DAY1_YF17D_VACCINE_PBMC_UP Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c7_immuno GSE13485_CTRL_VS_DAY7_YF17D_VACCINE_PBMC_DN Regulated 0.004 -- All Up- msigdb_c7_immuno GSE13485_DAY1_VS_DAY3_YF17D_VACCINE_PBMC_DN Regulated 0.010 All Up-Regulated All Up- msigdb_c7_immuno GSE13485_DAY1_VS_DAY7_YF17D_VACCINE_PBMC_DN Regulated < 1.0E-6 -- All Up- msigdb_c7_immuno GSE13485_DAY1_VS_DAY21_YF17D_VACCINE_PBMC_DN Regulated 0.013 -- All Up- msigdb_c7_immuno GSE13485_DAY3_VS_DAY7_YF17D_VACCINE_PBMC_DN Regulated 0.025 -- All Up- msigdb_c7_immuno GSE13485_PRE_VS_POST_YF17D_VACCINATION_PBMC_UP Regulated 0.019 All Up-Regulated All Up- msigdb_c7_immuno GSE13738_RESTING_VS_TCR_ACTIVATED_CD4_TCELL_UP Regulated 0.046 -- GSE13738_RESTING_VS_BYSTANDER_ACTIVATED_CD4_TCEL All Up- msigdb_c7_immuno L_UP Regulated 0.002 -- GSE13738_TCR_VS_BYSTANDER_ACTIVATED_CD4_TCELL_D All Up- msigdb_c7_immuno N Regulated < 1.0E-6 -- All Up- msigdb_c7_immuno GSE14000_UNSTIM_VS_4H_LPS_DC_TRANSLATED_RNA_DN Regulated < 1.0E-6 -- All Up- msigdb_c7_immuno GSE14000_UNSTIM_VS_16H_LPS_DC_TRANSLATED_RNA_DN Regulated 0.015 -- All Up- msigdb_c7_immuno GSE14000_UNSTIM_VS_4H_LPS_DC_DN Regulated 0.004 -- All Up- msigdb_c7_immuno GSE14026_TH1_VS_TH17_DN Regulated 0.050 -- All Up- msigdb_c7_immuno GSE14308_TH17_VS_INDUCED_TREG_UP Regulated 0.011 --

170 All Up- msigdb_c7_immuno GSE14308_INDUCED_VS_NATURAL_TREG_DN Regulated < 1.0E-6 -- All Up- msigdb_c7_immuno GSE1432_CTRL_VS_IFNG_6H_MICROGLIA_DN Regulated < 1.0E-6 -- All Up- msigdb_c7_immuno GSE1432_CTRL_VS_IFNG_24H_MICROGLIA_DN Regulated < 1.0E-6 -- All Up- msigdb_c7_immuno GSE1432_1H_VS_6H_IFNG_MICROGLIA_DN Regulated < 1.0E-6 -- All Up- msigdb_c7_immuno GSE1432_1H_VS_24H_IFNG_MICROGLIA_DN Regulated < 1.0E-6 -- All Up- msigdb_c7_immuno GSE1432_6H_VS_24H_IFNG_MICROGLIA_UP Regulated 0.036 -- GSE1460_INTRATHYMIC_T_PROGENITOR_VS_THYMIC_STRO All Up- msigdb_c7_immuno MAL_CELL_DN Regulated < 1.0E-6 -- GSE1460_NAIVE_CD4_TCELL_CORD_BLOOD_VS_THYMIC_ST All Up- msigdb_c7_immuno ROMAL_CELL_DN Regulated 0.004 -- All Up- msigdb_c7_immuno GSE14769_UNSTIM_VS_20MIN_LPS_BMDM_DN Regulated 0.008 -- All Up- msigdb_c7_immuno GSE14769_UNSTIM_VS_40MIN_LPS_BMDM_DN Regulated 0.002 -- All Up- msigdb_c7_immuno GSE14769_UNSTIM_VS_60MIN_LPS_BMDM_DN Regulated < 1.0E-6 -- All Up- msigdb_c7_immuno GSE14769_UNSTIM_VS_80MIN_LPS_BMDM_DN Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c7_immuno GSE14769_UNSTIM_VS_120MIN_LPS_BMDM_DN Regulated 0.011 -- All Up- msigdb_c7_immuno GSE15767_MED_VS_SCS_MAC_LN_UP Regulated 0.011 All Up-Regulated All Up- msigdb_c7_immuno GSE15930_STIM_VS_STIM_AND_IL-12_48H_CD8_T_CELL_DN Regulated 0.019 -- All Up- msigdb_c7_immuno GSE16755_CTRL_VS_IFNA_TREATED_MAC_DN Regulated 0.002 -- All Up- msigdb_c7_immuno GSE17580_TREG_VS_TEFF_S_MANSONI_INF_DN Regulated 0.042 -- All Up- msigdb_c7_immuno GSE17721_CTRL_VS_LPS_8H_BMDM_DN Regulated 0.023 -- All Up- msigdb_c7_immuno GSE17721_LPS_VS_POLYIC_4H_BMDM_UP Regulated < 1.0E-6 -- All Up- msigdb_c7_immuno GSE17721_LPS_VS_POLYIC_8H_BMDM_DN Regulated < 1.0E-6 -- All Up- msigdb_c7_immuno GSE17721_POLYIC_VS_PAM3CSK4_1H_BMDM_DN Regulated 0.031 -- All Up- msigdb_c7_immuno GSE17721_PAM3CSK4_VS_CPG_8H_BMDM_DN Regulated 0.002 -- All Up- msigdb_c7_immuno GSE17721_CPG_VS_GARDIQUIMOD_16H_BMDM_DN Regulated 0.006 -- All Up- msigdb_c7_immuno GSE17721_LPS_VS_PAM3CSK4_6H_BMDM_UP Regulated 0.044 -- All Up- msigdb_c7_immuno GSE17721_LPS_VS_PAM3CSK4_8H_BMDM_DN Regulated 0.017 --

171 All Up- msigdb_c7_immuno GSE17721_POLYIC_VS_CPG_2H_BMDM_DN Regulated 0.050 -- All Up- msigdb_c7_immuno GSE17721_PAM3CSK4_VS_GADIQUIMOD_1H_BMDM_DN Regulated 0.006 -- All Up- msigdb_c7_immuno GSE17721_PAM3CSK4_VS_GADIQUIMOD_8H_BMDM_DN Regulated < 1.0E-6 -- All Up- msigdb_c7_immuno GSE17721_LPS_VS_CPG_1H_BMDM_DN Regulated 0.011 -- All Up- msigdb_c7_immuno GSE17721_POLYIC_VS_GARDIQUIMOD_1H_BMDM_DN Regulated 0.002 -- All Up- msigdb_c7_immuno GSE17721_LPS_VS_GARDIQUIMOD_2H_BMDM_UP Regulated 0.015 -- All Up- msigdb_c7_immuno GSE17721_0.5H_VS_12H_LPS_BMDM_DN Regulated 0.015 -- All Up- msigdb_c7_immuno GSE17721_12H_VS_24H_LPS_BMDM_UP Regulated 0.048 -- All Up- msigdb_c7_immuno GSE17721_0.5H_VS_12H_POLYIC_BMDM_DN Regulated < 1.0E-6 -- All Up- msigdb_c7_immuno GSE17721_0.5H_VS_4H_POLYIC_BMDM_DN Regulated < 1.0E-6 -- All Up- msigdb_c7_immuno GSE17721_0.5H_VS_8H_POLYIC_BMDM_DN Regulated < 1.0E-6 -- All Up- msigdb_c7_immuno GSE17721_12H_VS_24H_POLYIC_BMDM_UP Regulated 0.002 -- All Up- msigdb_c7_immuno GSE17721_0.5H_VS_24H_PAM3CSK4_BMDM_DN Regulated 0.048 -- All Up- msigdb_c7_immuno GSE17721_ALL_VS_24H_PAM3CSK4_BMDM_UP Regulated 0.031 -- All Up- msigdb_c7_immuno GSE17721_12H_VS_24H_PAM3CSK4_BMDM_UP Regulated < 1.0E-6 -- All Up- msigdb_c7_immuno GSE17721_0.5H_VS_12H_CPG_BMDM_DN Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c7_immuno GSE17721_0.5H_VS_4H_CPG_BMDM_DN Regulated 0.006 -- All Up- msigdb_c7_immuno GSE17721_0.5H_VS_8H_CPG_BMDM_DN Regulated 0.015 -- All Up- msigdb_c7_immuno GSE17721_0.5H_VS_24H_CPG_BMDM_DN Regulated < 1.0E-6 -- All Up- msigdb_c7_immuno GSE17721_0.5H_VS_4H_GARDIQUIMOD_BMDM_DN Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c7_immuno GSE17721_0.5H_VS_8H_GARDIQUIMOD_BMDM_DN Regulated 0.004 -- All Up- msigdb_c7_immuno GSE17974_0H_VS_6H_IN_VITRO_ACT_CD4_TCELL_UP Regulated < 1.0E-6 -- GSE17974_CTRL_VS_ACT_IL4_AND_ANTI_IL12_6H_CD4_TCEL All Up- msigdb_c7_immuno L_UP Regulated 0.042 -- GSE17974_CTRL_VS_ACT_IL4_AND_ANTI_IL12_12H_CD4_TCE All Up- msigdb_c7_immuno LL_UP Regulated 0.025 -- GSE17974_CTRL_VS_ACT_IL4_AND_ANTI_IL12_24H_CD4_TCE All Up- msigdb_c7_immuno LL_UP Regulated < 1.0E-6 --

172 GSE17974_0.5H_VS_72H_IL4_AND_ANTI_IL12_ACT_CD4_TCEL All Up- msigdb_c7_immuno L_UP Regulated < 1.0E-6 -- All Up- msigdb_c7_immuno GSE18791_CTRL_VS_NEWCASTLE_VIRUS_DC_6H_DN Regulated < 1.0E-6 -- All Up- msigdb_c7_immuno GSE18791_CTRL_VS_NEWCASTLE_VIRUS_DC_10H_DN Regulated < 1.0E-6 -- GSE20151_CTRL_VS_FUSOBACT_NUCLEATUM_NEUTROPHIL_ All Up- msigdb_c7_immuno DN Regulated 0.011 -- All Up- msigdb_c7_immuno GSE20366_CD103_KLRG1_DP_VS_DN_TREG_DN Regulated 0.008 -- All Up- msigdb_c7_immuno GSE20366_TREG_VS_NAIVE_CD4_TCELL_UP Regulated 0.021 -- All Up- msigdb_c7_immuno GSE20715_WT_VS_TLR4_KO_LUNG_UP Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c7_immuno GSE20715_WT_VS_TLR4_KO_24H_OZONE_LUNG_UP Regulated < 1.0E-6 -- All Up- msigdb_c7_immuno GSE20715_0H_VS_6H_OZONE_LUNG_DN Regulated 0.011 -- All Up- msigdb_c7_immuno GSE20715_0H_VS_48H_OZONE_LUNG_DN Regulated < 1.0E-6 -- All Up- msigdb_c7_immuno GSE20715_0H_VS_6H_OZONE_TLR4_KO_LUNG_DN Regulated 0.034 All Up-Regulated All Up- msigdb_c7_immuno GSE20715_0H_VS_24H_OZONE_TLR4_KO_LUNG_DN Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c7_immuno GSE22886_NAIVE_CD8_TCELL_VS_MEMORY_TCELL_DN Regulated < 1.0E-6 -- All Up- msigdb_c7_immuno GSE22886_NAIVE_CD4_TCELL_VS_MEMORY_TCELL_DN Regulated 0.046 -- All Up- msigdb_c7_immuno GSE22886_NAIVE_TCELL_VS_NKCELL_DN Regulated 0.002 All Up-Regulated All Up- msigdb_c7_immuno GSE22886_NAIVE_VS_MEMORY_TCELL_DN Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c7_immuno GSE22886_NAIVE_CD8_TCELL_VS_NKCELL_DN Regulated 0.010 All Up-Regulated GSE22886_IGG_IGA_MEMORY_BCELL_VS_BM_PLASMA_CE All Up- msigdb_c7_immuno LL_UP Regulated 0.006 All Up-Regulated All Up- msigdb_c7_immuno GSE22886_DAY0_VS_DAY1_MONOCYTE_IN_CULTURE_DN Regulated 0.025 -- All Up- msigdb_c7_immuno GSE22886_NAIVE_TCELL_VS_DC_DN Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c7_immuno GSE22886_NAIVE_TCELL_VS_MONOCYTE_DN Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c7_immuno GSE22886_NAIVE_BCELL_VS_NEUTROPHIL_DN Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c7_immuno GSE22886_NAIVE_BCELL_VS_MONOCYTE_DN Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c7_immuno GSE22886_NAIVE_CD8_TCELL_VS_NEUTROPHIL_DN Regulated 0.006 All Up-Regulated All Up- msigdb_c7_immuno GSE22886_NAIVE_CD8_TCELL_VS_DC_DN Regulated < 1.0E-6 All Up-Regulated

173 All Up- msigdb_c7_immuno GSE22886_NAIVE_CD8_TCELL_VS_MONOCYTE_DN Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c7_immuno GSE22886_NAIVE_CD4_TCELL_VS_NEUTROPHIL_DN Regulated 0.010 All Up-Regulated All Up- msigdb_c7_immuno GSE22886_NAIVE_CD4_TCELL_VS_DC_DN Regulated 0.031 All Up-Regulated All Up- msigdb_c7_immuno GSE22886_NAIVE_CD4_TCELL_VS_MONOCYTE_DN Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c7_immuno GSE22886_TH1_VS_TH2_48H_ACT_DN Regulated 0.008 All Up-Regulated All Up- msigdb_c7_immuno GSE22886_CTRL_VS_LPS_24H_DC_DN Regulated < 1.0E-6 -- GSE24142_EARLY_THYMIC_PROGENITOR_VS_DN2_THYM All Up- msigdb_c7_immuno OCYTE_UP Regulated < 1.0E-6 All Up-Regulated GSE24142_EARLY_THYMIC_PROGENITOR_VS_DN3_THYM All Up- msigdb_c7_immuno OCYTE_UP Regulated 0.013 All Up-Regulated All Up- msigdb_c7_immuno GSE24142_DN2_VS_DN3_THYMOCYTE_UP Regulated 0.044 -- GSE24142_EARLY_THYMIC_PROGENITOR_VS_DN2_THYM All Up- msigdb_c7_immuno OCYTE_ADULT_UP Regulated 0.038 All Up-Regulated GSE24142_EARLY_THYMIC_PROGENITOR_VS_DN3_THYMOC All Up- msigdb_c7_immuno YTE_ADULT_UP Regulated 0.025 -- GSE24142_EARLY_THYMIC_PROGENITOR_VS_DN2_THYM All Up- msigdb_c7_immuno OCYTE_FETAL_UP Regulated < 1.0E-6 All Up-Regulated GSE24142_EARLY_THYMIC_PROGENITOR_VS_DN3_THYM All Up- msigdb_c7_immuno OCYTE_FETAL_UP Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c7_immuno GSE24142_DN2_VS_DN3_THYMOCYTE_FETAL_UP Regulated 0.010 All Up-Regulated All Up- msigdb_c7_immuno GSE24142_ADULT_VS_FETAL_DN3_THYMOCYTE_UP Regulated 0.002 -- GSE24634_TREG_VS_TCONV_POST_DAY3_IL4_CONVERSIO All Up- msigdb_c7_immuno N_DN Regulated < 1.0E-6 All Up-Regulated GSE24634_TREG_VS_TCONV_POST_DAY7_IL4_CONVERSIO All Up- msigdb_c7_immuno N_DN Regulated < 1.0E-6 All Up-Regulated GSE24634_TREG_VS_TCONV_POST_DAY10_IL4_CONVERSIO All Up- msigdb_c7_immuno N_DN Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c7_immuno GSE24634_TEFF_VS_TCONV_DAY3_IN_CULTURE_DN Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c7_immuno GSE24634_TEFF_VS_TCONV_DAY7_IN_CULTURE_DN Regulated < 1.0E-6 -- All Up- msigdb_c7_immuno GSE24634_TEFF_VS_TCONV_DAY10_IN_CULTURE_DN Regulated < 1.0E-6 All Up-Regulated GSE24634_IL4_VS_CTRL_TREATED_NAIVE_CD4_TCELL_DAY3 All Up- msigdb_c7_immuno _DN Regulated 0.004 -- GSE24634_IL4_VS_CTRL_TREATED_NAIVE_CD4_TCELL_DAY5 All Up- msigdb_c7_immuno _DN Regulated 0.006 -- GSE24634_IL4_VS_CTRL_TREATED_NAIVE_CD4_TCELL_DAY1 All Up- msigdb_c7_immuno 0_DN Regulated < 1.0E-6 -- All Up- msigdb_c7_immuno GSE25087_TREG_VS_TCONV_FETUS_UP Regulated < 1.0E-6 All Up-Regulated

174 All Up- msigdb_c7_immuno GSE25087_TREG_VS_TCONV_FETUS_DN Regulated 0.027 -- All Up- msigdb_c7_immuno GSE25087_TREG_VS_TCONV_ADULT_UP Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c7_immuno GSE26495_NAIVE_VS_PD1HIGH_CD8_TCELL_DN Regulated 0.006 -- All Up- msigdb_c7_immuno GSE26495_NAIVE_VS_PD1LOW_CD8_TCELL_DN Regulated 0.021 -- All Up- msigdb_c7_immuno GSE26669_CTRL_VS_COSTIM_BLOCK_MLR_CD4_TCELL_DN Regulated < 1.0E-6 -- All Up- msigdb_c7_immuno GSE2706_UNSTIM_VS_2H_R848_DC_DN Regulated < 1.0E-6 -- All Up- msigdb_c7_immuno GSE2706_UNSTIM_VS_8H_R848_DC_DN Regulated < 1.0E-6 -- All Up- msigdb_c7_immuno GSE2706_UNSTIM_VS_2H_LPS_DC_DN Regulated 0.002 -- All Up- msigdb_c7_immuno GSE2706_UNSTIM_VS_8H_LPS_DC_DN Regulated 0.023 -- All Up- msigdb_c7_immuno GSE2706_UNSTIM_VS_2H_LPS_AND_R848_DC_DN Regulated < 1.0E-6 -- All Up- msigdb_c7_immuno GSE2706_R848_VS_R848_AND_LPS_8H_STIM_DC_DN Regulated < 1.0E-6 -- All Up- msigdb_c7_immuno GSE2706_2H_VS_8H_R848_AND_LPS_STIM_DC_DN Regulated < 1.0E-6 -- All Up- msigdb_c7_immuno GSE27786_LSK_VS_NKCELL_DN Regulated 0.027 -- All Up- msigdb_c7_immuno GSE27786_BCELL_VS_NKCELL_DN Regulated 0.017 All Up-Regulated All Up- msigdb_c7_immuno GSE27786_CD4_TCELL_VS_NKCELL_DN Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c7_immuno GSE27786_CD8_TCELL_VS_NKCELL_DN Regulated < 1.0E-6 -- All Up- msigdb_c7_immuno GSE27786_NKCELL_VS_NKTCELL_UP Regulated < 1.0E-6 All Up-Regulated GSE29617_CTRL_VS_DAY7_TIV_FLU_VACCINE_PBMC_2008_ All Up- msigdb_c7_immuno UP Regulated < 1.0E-6 All Up-Regulated GSE29617_DAY3_VS_DAY7_TIV_FLU_VACCINE_PBMC_2008_U All Up- msigdb_c7_immuno P Regulated 0.025 -- All Up- msigdb_c7_immuno GSE29618_BCELL_VS_MONOCYTE_DN Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c7_immuno GSE29618_BCELL_VS_PDC_UP Regulated 0.002 All Up-Regulated All Up- msigdb_c7_immuno GSE29618_BCELL_VS_MDC_DN Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c7_immuno GSE29618_MONOCYTE_VS_PDC_UP Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c7_immuno GSE29618_MONOCYTE_VS_MDC_UP Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c7_immuno GSE29618_PDC_VS_MDC_DN Regulated < 1.0E-6 All Up-Regulated

175 GSE29618_BCELL_VS_MONOCYTE_DAY7_FLU_VACCINE_D All Up- msigdb_c7_immuno N Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c7_immuno GSE29618_BCELL_VS_MDC_DAY7_FLU_VACCINE_DN Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c7_immuno GSE29618_MONOCYTE_VS_PDC_DAY7_FLU_VACCINE_UP Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c7_immuno GSE29618_MONOCYTE_VS_MDC_DAY7_FLU_VACCINE_UP Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c7_immuno GSE29618_PDC_VS_MDC_DAY7_FLU_VACCINE_DN Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c7_immuno GSE29618_PRE_VS_DAY7_FLU_VACCINE_PDC_DN Regulated 0.019 -- All Up- msigdb_c7_immuno GSE29618_PRE_VS_DAY7_POST_TIV_FLU_VACCINE_PDC_DN Regulated < 1.0E-6 -- All Up- msigdb_c7_immuno GSE30083_SP1_VS_SP4_THYMOCYTE_DN Regulated 0.034 -- All Up- msigdb_c7_immuno GSE30083_SP2_VS_SP4_THYMOCYTE_DN Regulated 0.006 -- All Up- msigdb_c7_immuno GSE31082_DP_VS_CD4_SP_THYMOCYTE_DN Regulated 0.023 -- All Up- msigdb_c7_immuno GSE32423_MEMORY_VS_NAIVE_CD8_TCELL_DN Regulated 0.008 All Up-Regulated All Up- msigdb_c7_immuno GSE32423_MEMORY_VS_NAIVE_CD8_TCELL_IL7_IL4_DN Regulated 0.002 All Up-Regulated All Up- msigdb_c7_immuno GSE32423_CTRL_VS_IL7_IL4_MEMORY_CD8_TCELL_UP Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c7_immuno GSE32423_IL7_VS_IL7_IL4_MEMORY_CD8_TCELL_UP Regulated 0.011 All Up-Regulated All Up- msigdb_c7_immuno GSE32423_IL7_VS_IL7_IL4_NAIVE_CD8_TCELL_DN Regulated < 1.0E-6 -- All Up- msigdb_c7_immuno GSE339_EX_VIVO_VS_IN_CULTURE_CD4POS_DC_DN Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c7_immuno GSE34205_HEALTHY_VS_FLU_INF_INFANT_PBMC_DN Regulated < 1.0E-6 -- All Up- msigdb_c7_immuno GSE360_CTRL_VS_M_TUBERCULOSIS_DC_DN Regulated < 1.0E-6 -- All Up- msigdb_c7_immuno GSE360_CTRL_VS_L_DONOVANI_MAC_DN Regulated 0.021 -- All Up- msigdb_c7_immuno GSE360_DC_VS_MAC_L_DONOVANI_DN Regulated < 1.0E-6 -- All Up- msigdb_c7_immuno GSE360_DC_VS_MAC_B_MALAYI_HIGH_DOSE_UP Regulated 0.029 -- All Up- msigdb_c7_immuno GSE360_DC_VS_MAC_B_MALAYI_HIGH_DOSE_DN Regulated 0.006 -- All Up- msigdb_c7_immuno GSE360_L_DONOVANI_VS_B_MALAYI_HIGH_DOSE_DC_UP Regulated 0.008 -- All Up- msigdb_c7_immuno GSE360_L_DONOVANI_VS_B_MALAYI_LOW_DOSE_DC_UP Regulated 0.036 -- All Up- msigdb_c7_immuno GSE360_T_GONDII_VS_M_TUBERCULOSIS_DC_DN Regulated 0.015 --

176 GSE360_HIGH_DOSE_B_MALAYI_VS_M_TUBERCULOSIS_DC_ All Up- msigdb_c7_immuno DN Regulated < 1.0E-6 -- GSE360_LOW_DOSE_B_MALAYI_VS_M_TUBERCULOSIS_DC_ All Up- msigdb_c7_immuno DN Regulated < 1.0E-6 -- All Up- msigdb_c7_immuno GSE360_L_DONOVANI_VS_B_MALAYI_HIGH_DOSE_MAC_UP Regulated < 1.0E-6 -- All Up- msigdb_c7_immuno GSE360_L_MAJOR_VS_B_MALAYI_HIGH_DOSE_MAC_UP Regulated < 1.0E-6 -- GSE360_HIGH_DOSE_B_MALAYI_VS_M_TUBERCULOSIS_MAC All Up- msigdb_c7_immuno _DN Regulated < 1.0E-6 -- GSE36476_CTRL_VS_TSST_ACT_16H_MEMORY_CD4_TCELL_Y All Up- msigdb_c7_immuno OUNG_UP Regulated 0.002 -- GSE36476_CTRL_VS_TSST_ACT_40H_MEMORY_CD4_TCELL_Y All Up- msigdb_c7_immuno OUNG_UP Regulated 0.006 -- GSE37416_CTRL_VS_3H_F_TULARENSIS_LVS_NEUTROPHIL_D All Up- msigdb_c7_immuno N Regulated 0.006 -- GSE37416_CTRL_VS_6H_F_TULARENSIS_LVS_NEUTROPHIL All Up- msigdb_c7_immuno _UP Regulated < 1.0E-6 All Up-Regulated GSE37416_CTRL_VS_6H_F_TULARENSIS_LVS_NEUTROPHIL_D All Up- msigdb_c7_immuno N Regulated < 1.0E-6 -- GSE37416_CTRL_VS_24H_F_TULARENSIS_LVS_NEUTROPHIL_ All Up- msigdb_c7_immuno UP Regulated 0.019 -- GSE37416_CTRL_VS_24H_F_TULARENSIS_LVS_NEUTROPHIL_ All Up- msigdb_c7_immuno DN Regulated < 1.0E-6 -- All Up- msigdb_c7_immuno GSE37416_0H_VS_3H_F_TULARENSIS_LVS_NEUTROPHIL_DN Regulated 0.002 -- GSE37416_0H_VS_6H_F_TULARENSIS_LVS_NEUTROPHIL_U All Up- msigdb_c7_immuno P Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c7_immuno GSE37416_0H_VS_6H_F_TULARENSIS_LVS_NEUTROPHIL_DN Regulated 0.006 -- GSE37416_0H_VS_12H_F_TULARENSIS_LVS_NEUTROPHIL_ All Up- msigdb_c7_immuno UP Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c7_immuno GSE37416_0H_VS_12H_F_TULARENSIS_LVS_NEUTROPHIL_DN Regulated 0.006 -- GSE37416_0H_VS_24H_F_TULARENSIS_LVS_NEUTROPHIL_ All Up- msigdb_c7_immuno UP Regulated 0.002 All Up-Regulated GSE37416_0H_VS_48H_F_TULARENSIS_LVS_NEUTROPHIL_ All Up- msigdb_c7_immuno UP Regulated 0.002 All Up-Regulated All Up- msigdb_c7_immuno GSE37416_0H_VS_48H_F_TULARENSIS_LVS_NEUTROPHIL_DN Regulated 0.011 -- GSE37416_12H_VS_24H_F_TULARENSIS_LVS_NEUTROPHIL_ All Up- msigdb_c7_immuno UP Regulated 0.015 All Up-Regulated GSE37416_12H_VS_48H_F_TULARENSIS_LVS_NEUTROPHIL_ All Up- msigdb_c7_immuno UP Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c7_immuno GSE3982_CTRL_VS_LPS_4H_MAC_DN Regulated < 1.0E-6 -- All Up- msigdb_c7_immuno GSE3982_EOSINOPHIL_VS_EFF_MEMORY_CD4_TCELL_UP Regulated 0.050 All Up-Regulated GSE3982_EOSINOPHIL_VS_CENT_MEMORY_CD4_TCELL_U All Up- msigdb_c7_immuno P Regulated 0.006 All Up-Regulated

177 All Up- msigdb_c7_immuno GSE3982_EOSINOPHIL_VS_NKCELL_UP Regulated 0.008 All Up-Regulated All Up- msigdb_c7_immuno GSE3982_EOSINOPHIL_VS_TH1_UP Regulated < 1.0E-6 -- All Up- msigdb_c7_immuno GSE3982_MAST_CELL_VS_BCELL_UP Regulated 0.025 All Up-Regulated All Up- msigdb_c7_immuno GSE3982_MAST_CELL_VS_EFF_MEMORY_CD4_TCELL_UP Regulated 0.017 -- All Up- msigdb_c7_immuno GSE3982_MAST_CELL_VS_NKCELL_UP Regulated 0.015 -- All Up- msigdb_c7_immuno GSE3982_MAC_VS_TH2_UP Regulated 0.006 All Up-Regulated All Up- msigdb_c7_immuno GSE3982_NEUTROPHIL_VS_EFF_MEMORY_CD4_TCELL_UP Regulated 0.004 All Up-Regulated GSE3982_NEUTROPHIL_VS_CENT_MEMORY_CD4_TCELL_U All Up- msigdb_c7_immuno P Regulated 0.002 All Up-Regulated All Up- msigdb_c7_immuno GSE3982_NEUTROPHIL_VS_NKCELL_UP Regulated 0.031 All Up-Regulated All Up- msigdb_c7_immuno GSE3982_BCELL_VS_BASOPHIL_DN Regulated 0.046 All Up-Regulated All Up- msigdb_c7_immuno GSE3982_BASOPHIL_VS_EFF_MEMORY_CD4_TCELL_UP Regulated 0.006 All Up-Regulated All Up- msigdb_c7_immuno GSE3982_BASOPHIL_VS_NKCELL_UP Regulated 0.006 All Up-Regulated All Up- msigdb_c7_immuno GSE3982_BASOPHIL_VS_TH1_UP Regulated 0.010 All Up-Regulated All Up- msigdb_c7_immuno GSE3982_BASOPHIL_VS_TH2_UP Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c7_immuno GSE39820_CTRL_VS_IL1B_IL6_CD4_TCELL_DN Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c7_immuno GSE39820_CTRL_VS_IL1B_IL6_IL23A_CD4_TCELL_DN Regulated < 1.0E-6 -- All Up- msigdb_c7_immuno GSE39820_CTRL_VS_TGFBETA1_IL6_CD4_TCELL_DN Regulated < 1.0E-6 -- All Up- msigdb_c7_immuno GSE39820_CTRL_VS_TGFBETA1_IL6_IL23A_CD4_TCELL_UP Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c7_immuno GSE39820_CTRL_VS_TGFBETA3_IL6_IL23A_CD4_TCELL_UP Regulated 0.004 -- GSE39820_TGFBETA3_IL6_VS_TGFBETA3_IL6_IL23A_TREA All Up- msigdb_c7_immuno TED_CD4_TCELL_UP Regulated 0.008 All Up-Regulated All Up- msigdb_c7_immuno GSE6269_HEALTHY_VS_STREP_AUREUS_INF_PBMC_DN Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c7_immuno GSE6269_HEALTHY_VS_STREP_PNEUMO_INF_PBMC_DN Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c7_immuno GSE7460_TCONV_VS_TREG_THYMUS_DN Regulated 0.002 -- All Up- msigdb_c7_immuno GSE7460_TREG_VS_TCONV_ACT_UP Regulated 0.015 All Up-Regulated All Up- msigdb_c7_immuno GSE7460_FOXP3_MUT_VS_WT_ACT_TCONV_DN Regulated 0.025 --

178 GSE7460_WT_VS_FOXP3_HET_ACT_WITH_TGFB_TCONV_D Mixed Up- msigdb_c7_immuno N Regulated 0.031 Mixed Up-Regulated All Up- msigdb_c7_immuno GSE7764_IL15_NK_CELL_24H_VS_SPLENOCYTE_DN Regulated 0.013 All Up-Regulated All Up- msigdb_c7_immuno GSE7852_TREG_VS_TCONV_LN_UP Regulated 0.010 -- All Up- msigdb_c7_immuno GSE7852_TREG_VS_TCONV_THYMUS_UP Regulated 0.013 -- All Up- msigdb_c7_immuno GSE7852_LN_VS_THYMUS_TREG_DN Regulated < 1.0E-6 -- All Up- msigdb_c7_immuno GSE7852_LN_VS_FAT_TCONV_DN Regulated < 1.0E-6 -- All Up- msigdb_c7_immuno GSE7852_THYMUS_VS_FAT_TCONV_DN Regulated < 1.0E-6 -- All Up- msigdb_c7_immuno GSE7852_TREG_VS_TCONV_UP Regulated < 1.0E-6 -- GSE9006_HEALTHY_VS_TYPE_1_DIABETES_PBMC_1MONTH_ All Up- msigdb_c7_immuno POST_DX_UP Regulated 0.044 -- GSE9006_HEALTHY_VS_TYPE_1_DIABETES_PBMC_4MONTH_ All Up- msigdb_c7_immuno POST_DX_UP Regulated < 1.0E-6 -- All Up- msigdb_c7_immuno GSE9006_TYPE_1_VS_TYPE_2_DIABETES_PBMC_AT_DX_DN Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c7_immuno GSE9037_CTRL_VS_LPS_1H_STIM_BMDM_UP Regulated 0.002 -- All Up- msigdb_c7_immuno GSE9037_CTRL_VS_LPS_4H_STIM_BMDM_UP Regulated 0.002 -- All Up- msigdb_c7_immuno GSE9037_CTRL_VS_LPS_4H_STIM_BMDM_DN Regulated 0.002 -- All Up- msigdb_c7_immuno GSE9037_WT_VS_IRAK4_KO_BMDM_UP Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c7_immuno GSE9650_NAIVE_VS_EFF_CD8_TCELL_UP Regulated 0.015 -- All Up- msigdb_c7_immuno GSE9650_NAIVE_VS_MEMORY_CD8_TCELL_UP Regulated < 1.0E-6 -- All Up- msigdb_c7_immuno GSE9988_ANTI_TREM1_VS_LPS_MONOCYTE_DN Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c7_immuno GSE9988_ANTI_TREM1_VS_LOW_LPS_MONOCYTE_DN Regulated < 1.0E-6 All Up-Regulated GSE9988_ANTI_TREM1_VS_ANTI_TREM1_AND_LPS_MONOCY All Up- msigdb_c7_immuno TE_DN Regulated < 1.0E-6 -- All Up- msigdb_c7_immuno GSE9988_ANTI_TREM1_VS_CTRL_TREATED_MONOCYTES_UP Regulated < 1.0E-6 -- GSE9988_ANTI_TREM1_VS_VEHICLE_TREATED_MONOCYTES All Up- msigdb_c7_immuno _UP Regulated < 1.0E-6 -- All Up- msigdb_c7_immuno GSE9988_LPS_VS_LPS_AND_ANTI_TREM1_MONOCYTE_UP Regulated < 1.0E-6 All Up-Regulated All Up- msigdb_c7_immuno GSE9988_LPS_VS_CTRL_TREATED_MONOCYTE_UP Regulated 0.002 -- All Up- msigdb_c7_immuno GSE9988_LPS_VS_VEHICLE_TREATED_MONOCYTE_UP Regulated 0.004 --

179 GSE9988_LOW_LPS_VS_ANTI_TREM1_AND_LPS_MONOCYT All Up- msigdb_c7_immuno E_UP Regulated < 1.0E-6 All Up-Regulated GSE9988_LOW_LPS_VS_ANTI_TREM1_AND_LPS_MONOCYTE_ All Up- msigdb_c7_immuno DN Regulated < 1.0E-6 -- All Up- msigdb_c7_immuno GSE9988_LOW_LPS_VS_CTRL_TREATED_MONOCYTE_UP Regulated 0.002 -- All Up- msigdb_c7_immuno GSE9988_LOW_LPS_VS_VEHICLE_TREATED_MONOCYTE_UP Regulated < 1.0E-6 -- GSE9988_ANTI_TREM1_AND_LPS_VS_CTRL_TREATED_MO All Up- msigdb_c7_immuno NOCYTES_UP Regulated 0.006 All Up-Regulated GSE9988_ANTI_TREM1_AND_LPS_VS_VEHICLE_TREATED_M All Up- msigdb_c7_immuno ONOCYTES_UP Regulated 0.021 -- Non- Directionally msigdb_c1_chrregions chr11p14 Dysregulated 0.013 -- All Down- msigdb_c1_chrregions chr19p13 Regulated 0.000 All Down-Regualted All Down- msigdb_c1_chrregions chr22q11 Regulated 0.001 -- All Down- msigdb_c1_chrregions chryq11 Regulated 0.007 -- All Down- msigdb_c1_chrregions chr16p13 Regulated 0.001 All Down-Regualted All Down- msigdb_c2_curatedsets KEGG_DRUG_METABOLISM_OTHER_ENZYMES Regulated 0.023 -- All Down- msigdb_c2_curatedsets KEGG_OLFACTORY_TRANSDUCTION Regulated < 1.0E-6 -- All Down- msigdb_c2_curatedsets REACTOME_OLFACTORY_SIGNALING_PATHWAY Regulated < 1.0E-6 -- All Down- msigdb_c2_curatedsets NIKOLSKY_BREAST_CANCER_16P13_AMPLICON Regulated 0.018 All Down-Regualted All Down- msigdb_c2_curatedsets MEISSNER_NPC_HCP_WITH_H3K4ME2_AND_H3K27ME3 Regulated 0.009 -- All Down- msigdb_c2_curatedsets MIKKELSEN_MCV6_HCP_WITH_H3K27ME3 Regulated < 1.0E-6 -- All Down- msigdb_c2_curatedsets MIKKELSEN_NPC_HCP_WITH_H3K27ME3 Regulated < 1.0E-6 -- All Down- msigdb_c2_curatedsets KIM_ALL_DISORDERS_DURATION_CORR_DN Regulated < 1.0E-6 All Down-Regualted All Down- msigdb_c5_GOterms G_PROTEIN_COUPLED_RECEPTOR_ACTIVITY Regulated 0.003 -- All Down- msigdb_c6_oncogenic SRC_UP.V1_UP Regulated 0.041 -- All Down- msigdb_c6_oncogenic KRAS.300_UP.V1_DN Regulated 0.009 -- All Down- msigdb_c6_oncogenic KRAS.600_UP.V1_DN Regulated 0.017 -- All Down- msigdb_c7_immuno GSE13485_CTRL_VS_DAY1_YF17D_VACCINE_PBMC_DN Regulated < 1.0E-6 --

180 All Down- msigdb_c7_immuno GSE13485_DAY1_VS_DAY3_YF17D_VACCINE_PBMC_UP Regulated < 1.0E-6 -- All Down- msigdb_c7_immuno GSE29618_PRE_VS_DAY7_POST_TIV_FLU_VACCINE_PDC_UP Regulated < 1.0E-6 -- *Rows are sorted broadly into up-regulated, non-directionally dysregulated, and down-regulated gene sets; secondarily, the original order of annotations as provided in the respective database was preserved in order to keep related annotations listed proximally to each other. † Reported gene sets (k = 745) reflect those reaching a Bonferroni-corrected p-value < 0.05 for at least one test hypothesis. Gene sets in bold font denote those also showing evidence of dysregulation in the blood analysis.

181 TABLE 5. RELATIVE PROPORTION OF 17 CIRCULATING IMMUNE CELLS IN SZ CASES AND UNAFFECTED COMPARISON SUBJECTS ESTIMATED FROM EXPRESSION LEVELS OF CELL-SPECIFIC GENES.

Relative cell type proportion (mean) Abbas et al. (2009) Cell Cell Types Description SZ cases Controls T p-value q-value Tc act Activated cytotoxic T cell 0.077 0.094 -1.93 0.05 0.37 B Resting B cell 0.026 0.037 -1.43 0.15 0.37 B aIgM B cell 0.038 0.029 1.51 0.13 0.37 NK act Natural Killer (active) 0.033 0.045 -1.55 0.12 0.37 mono Resting monocyte 0.069 0.053 1.68 0.09 0.37 DC Dendritic cell 0.078 0.099 -2.28 0.02 0.37 neutro Neutrophils 0.111 0.092 1.77 0.08 0.37 Mem IgM IgM memory B cells 0.020 0.015 1.32 0.19 0.40 PC Plasma cells 0.051 0.041 1.04 0.30 0.56 mono act Activated monocyte 0.126 0.116 0.98 0.33 0.56 Th Resting helper T cells 0.056 0.063 -0.80 0.43 0.66 Th act Activated helper T cells 0.040 0.044 -0.55 0.58 0.76 Mem IgG IgG memory B cells 0.018 0.022 -0.58 0.56 0.76 B act Activated B cells 0.056 0.053 0.45 0.65 0.78 NK Resting Natural Killer cells 0.045 0.041 0.40 0.69 0.78 Tc Resting cytotoxic T cells 0.052 0.052 -0.01 0.99 0.99 DC act Activated dendritic cells 0.104 0.103 0.06 0.95 0.99 Abbreviations: schizophrenia (SZ)

182 TABLE 6. GENES SIGNIFICANTLY DYSREGULATED (BONFERRONI P < 0.05) IN SCHIZOPHRENIA ACROSS STUDIES OF BLOOD TISSUE (K = 220). Gene Gene Product Estimated Marginal Mean F p FDR q- Symbol † Difference* Value SULT1B1 sulfotransferase family, cytosolic, 1B, member 1 0.58 50 4.70E-12 2.30E-08 TCN1 transcobalamin I (vitamin B12 binding protein, R binder family) 0.53 43.5 9.70E-11 2.40E-07 SLC40A1 solute carrier family 40 (iron-regulated transporter), member 1 0.53 36.7 2.80E-09 1.60E-06 TMEM30A transmembrane protein 30A 0.52 41.9 2.10E-10 2.70E-07 PID1 phosphotyrosine interaction domain containing 1 0.51 39.9 5.30E-10 5.20E-07 GPR160 G protein-coupled receptor 160 0.51 33.3 1.40E-08 4.20E-06 BMPR2 bone morphogenetic protein receptor, type II (serine/threonine 0.5 38 1.30E-09 1.20E-06 kinase) DCUN1D5 DCN1, defective in cullin neddylation 1, domain containing 5 0.5 37.1 2.10E-09 1.60E-06 POC1B- POC1B-GALNT4 readthrough 0.5 35.1 5.80E-09 2.50E-06 GALNT4 GALNT4 polypeptide N-acetylgalactosaminyltransferase 4 0.5 35.1 5.80E-09 2.50E-06 NDUFA4 NDUFA4, mitochondrial complex associated 0.5 33.6 1.20E-08 3.90E-06 CD302 CD302 molecule 0.48 36.5 2.70E-09 1.60E-06 LY75-CD302 LY75-CD302 readthrough 0.48 36.5 2.70E-09 1.60E-06 MS4A3 membrane-spanning 4-domains, subfamily A, member 3 0.48 34.9 6.20E-09 2.60E-06 (hematopoietic cell-specific) IFT74 intraflagellar transport 74 0.48 34.7 6.50E-09 2.60E-06 HRSP12 heat-responsive protein 12 0.48 33.7 1.10E-08 3.60E-06 ZNF281 zinc finger protein 281 0.48 33.4 1.30E-08 3.90E-06 NAIP NLR family, apoptosis inhibitory protein 0.48 32.3 2.20E-08 5.60E-06 PCMT1 protein-L-isoaspartate (D-aspartate) O-methyltransferase 0.48 31.4 3.60E-08 8.20E-06 GYG1 glycogenin 1 0.48 31.1 4.10E-08 9.30E-06 HMGB2 high mobility group box 2 0.48 30.2 6.40E-08 1.30E-05 PTEN phosphatase and tensin homolog 0.47 34.7 6.50E-09 2.60E-06 UBE2J1 ubiquitin-conjugating enzyme E2, J1 0.47 34.4 7.60E-09 2.90E-06 SKAP2 src kinase associated phosphoprotein 2 0.47 34.3 7.80E-09 2.90E-06

183 CCPG1 cell cycle progression 1 0.47 34 9.30E-09 3.20E-06 S100A12 S100 calcium binding protein A12 0.47 33.6 1.10E-08 3.70E-06 VCAN versican 0.47 32.5 1.90E-08 5.00E-06 ENTPD1 ectonucleoside triphosphate diphosphohydrolase 1 0.47 29.8 7.70E-08 1.50E-05 CYP1B1 cytochrome P450, family 1, subfamily B, polypeptide 1 0.47 28.6 1.40E-07 2.30E-05 CAPZA1 capping protein (actin filament) muscle Z-line, alpha 1 0.47 26.7 3.60E-07 5.00E-05 F5 coagulation factor V (proaccelerin, labile factor) 0.46 33 1.50E-08 4.30E-06 ARG1 arginase 1 0.46 32.5 1.90E-08 5.00E-06 CCDC88A coiled-coil domain containing 88A 0.46 31.9 2.60E-08 6.20E-06 CYSTM1 cysteine-rich transmembrane module containing 1 0.46 29 1.10E-07 2.00E-05 CD58 CD58 molecule 0.46 28.5 1.40E-07 2.50E-05 HPSE heparanase 0.46 28.1 1.70E-07 2.90E-05 CASP1 caspase 1, apoptosis-related cysteine peptidase 0.46 27 3.00E-07 4.50E-05 MMP8 matrix metallopeptidase 8 0.45 31.5 3.10E-08 7.30E-06 SLC38A6 solute carrier family 38, member 6 0.45 30.5 5.10E-08 1.10E-05 AQP9 aquaporin 9 0.45 30.4 5.30E-08 1.10E-05 CSF2RA colony stimulating factor 2 receptor, alpha, low-affinity 0.45 29.1 1.00E-07 1.90E-05 (granulocyte-macrophage) ASGR2 asialoglycoprotein receptor 2 0.45 26.8 3.30E-07 4.70E-05 PPP2R3C protein phosphatase 2, regulatory subunit B'', gamma 0.45 26.6 3.60E-07 5.00E-05 TNFSF13B tumor necrosis factor (ligand) superfamily, member 13b 0.45 24.2 1.20E-06 1.40E-04 PAPOLA poly(A) polymerase alpha 0.44 30 6.40E-08 1.30E-05 HSD17B12 hydroxysteroid (17-beta) dehydrogenase 12 0.44 29.7 7.60E-08 1.50E-05 LYST lysosomal trafficking regulator 0.44 29.5 8.10E-08 1.60E-05 IRAK3 interleukin-1 receptor-associated kinase 3 0.44 28.9 1.10E-07 2.00E-05 DSC2 desmocollin 2 0.44 27.6 2.30E-07 3.60E-05 LYPLAL1 lysophospholipase-like 1 0.44 27 2.90E-07 4.40E-05 ZNF33A zinc finger protein 33A 0.43 27 2.90E-07 4.40E-05 TNFSF10 tumor necrosis factor (ligand) superfamily, member 10 0.43 23.9 1.40E-06 1.50E-04

184 TLR10 toll-like receptor 10 0.43 23.5 1.70E-06 1.70E-04 VAMP7 vesicle-associated membrane protein 7 0.43 23.3 1.90E-06 1.80E-04 KBTBD11 kelch repeat and BTB (POZ) domain containing 11 0.42 27.7 2.00E-07 3.30E-05 HSDL2 hydroxysteroid dehydrogenase like 2 0.42 26.8 3.10E-07 4.60E-05 CYBRD1 cytochrome b reductase 1 0.42 26.7 3.30E-07 4.70E-05 RPS3A ribosomal protein S3A 0.42 26.3 3.90E-07 5.30E-05 PRCP prolylcarboxypeptidase (angiotensinase C) 0.41 26.4 3.80E-07 5.20E-05 HMGB1 high mobility group box 1 0.41 26 4.70E-07 6.10E-05 MCMBP minichromosome maintenance complex binding protein 0.41 25.6 5.80E-07 7.20E-05 CD24 CD24 molecule 0.41 24.8 8.30E-07 9.70E-05 GALNT1 polypeptide N-acetylgalactosaminyltransferase 1 0.41 23.1 2.00E-06 1.90E-04 PROK2 prokineticin 2 0.41 22.8 2.40E-06 2.20E-04 BEX1 brain expressed, X-linked 1 0.4 24.9 7.90E-07 9.40E-05 UBE2D3 ubiquitin-conjugating enzyme E2D 3 0.4 24.1 1.20E-06 1.30E-04 SUB1 SUB1 homolog (S. cerevisiae) 0.4 24.1 1.20E-06 1.30E-04 TGFA transforming growth factor, alpha 0.4 24 1.30E-06 1.40E-04 PCTP phosphatidylcholine transfer protein 0.4 23.6 1.50E-06 1.60E-04 RBM47 RNA binding motif protein 47 0.4 23.1 1.90E-06 1.90E-04 SLC22A15 solute carrier family 22, member 15 0.4 22.7 2.40E-06 2.20E-04 RPL7 ribosomal protein L7 0.39 23.6 1.60E-06 1.60E-04 ARHGAP19 Rho GTPase activating protein 19 0.39 23.2 1.90E-06 1.90E-04 FOXN2 forkhead box N2 0.39 22.7 2.40E-06 2.20E-04 PPME1 protein phosphatase methylesterase 1 -0.39 23.4 1.70E-06 1.70E-04 RBM4B RNA binding motif protein 4B -0.39 23.3 1.80E-06 1.80E-04 STIP1 stress-induced phosphoprotein 1 -0.39 23.3 1.80E-06 1.80E-04 PRKX protein kinase, X-linked -0.39 23.3 1.80E-06 1.80E-04 ADRB2 adrenoceptor beta 2, surface -0.39 23.2 1.90E-06 1.80E-04 RARRES3 retinoic acid receptor responder (tazarotene induced) 3 -0.39 23.1 2.00E-06 1.90E-04 RNF5 ring finger protein 5, E3 ubiquitin protein ligase -0.39 22.9 2.20E-06 2.00E-04

185 CACNA2D2 calcium channel, voltage-dependent, alpha 2/delta subunit 2 -0.39 22.9 2.20E-06 2.00E-04 MED15 mediator complex subunit 15 -0.39 22.7 2.40E-06 2.20E-04 TMEM214 transmembrane protein 214 -0.4 24.9 8.00E-07 9.40E-05 CD4 CD4 molecule -0.4 24.3 1.10E-06 1.20E-04 TUBB tubulin, beta class I -0.4 24.3 1.10E-06 1.20E-04 ANXA6 annexin A6 -0.4 24.1 1.20E-06 1.30E-04 ATXN10 ataxin 10 -0.4 24.1 1.20E-06 1.30E-04 HECTD3 HECT domain containing E3 ubiquitin protein ligase 3 -0.4 24 1.20E-06 1.40E-04 CALM3 calmodulin 3 (phosphorylase kinase, delta) -0.4 23.9 1.30E-06 1.50E-04 MED29 mediator complex subunit 29 -0.4 23.8 1.40E-06 1.50E-04 HIC2 hypermethylated in cancer 2 -0.4 23.6 1.60E-06 1.60E-04 TEX264 testis expressed 264 -0.4 23.5 1.60E-06 1.70E-04 ORAI1 ORAI calcium release-activated calcium modulator 1 -0.4 22.8 2.30E-06 2.20E-04 RPUSD3 RNA pseudouridylate synthase domain containing 3 -0.4 22.7 2.40E-06 2.20E-04 AHSA1 AHA1, activator of heat shock 90kDa protein ATPase homolog 1 -0.41 26.1 4.40E-07 5.90E-05 (yeast) C16orf58 chromosome 16 open reading frame 58 -0.41 26 4.60E-07 6.10E-05 DNMT1 DNA (cytosine-5-)-methyltransferase 1 -0.41 25.8 5.00E-07 6.50E-05 GIPC1 GIPC PDZ domain containing family, member 1 -0.41 25.6 5.50E-07 7.00E-05 EIF4A3 eukaryotic translation initiation factor 4A3 -0.41 25.6 5.80E-07 7.20E-05 NAGPA N-acetylglucosamine-1-phosphodiester alpha-N- -0.41 25.5 5.80E-07 7.20E-05 acetylglucosaminidase PA2G4 proliferation-associated 2G4, 38kDa -0.41 25.1 7.40E-07 9.00E-05 CDK11B cyclin-dependent kinase 11B -0.41 25 7.80E-07 9.30E-05 EXOSC6 exosome component 6 -0.41 23.9 1.30E-06 1.50E-04 TUT1 terminal uridylyl transferase 1, U6 snRNA-specific -0.41 23.8 1.40E-06 1.50E-04 ABCB1 ATP-binding cassette, sub-family B (MDR/TAP), member 1 -0.41 23.7 1.50E-06 1.60E-04 EVI5L ecotropic viral integration site 5-like -0.41 23.6 1.60E-06 1.60E-04 MAGED1 melanoma antigen family D1 -0.41 23 2.10E-06 2.00E-04 ASMTL acetylserotonin O-methyltransferase-like -0.41 22.8 2.30E-06 2.20E-04

186 FAM222B family with sequence similarity 222, member B -0.41 22.8 2.40E-06 2.20E-04 EVL Enah/Vasp-like -0.41 22.7 2.50E-06 2.30E-04 MAF v-maf avian musculoaponeurotic fibrosarcoma oncogene homolog -0.42 27.5 2.20E-07 3.50E-05 RNF115 ring finger protein 115 -0.42 27.1 2.70E-07 4.20E-05 QPRT quinolinate phosphoribosyltransferase -0.42 26.8 3.10E-07 4.60E-05 HDAC1 histone deacetylase 1 -0.42 26.8 3.10E-07 4.60E-05 ARIH2 ariadne RBR E3 ubiquitin protein ligase 2 -0.42 26.8 3.10E-07 4.60E-05 SLC9A3R1 solute carrier family 9, subfamily A (NHE3, cation proton -0.42 26.7 3.30E-07 4.70E-05 antiporter 3), member 3 regulator 1 ATF5 activating transcription factor 5 -0.42 26.5 3.60E-07 5.00E-05 HPS6 Hermansky-Pudlak syndrome 6 -0.42 26.1 4.40E-07 5.90E-05 PYCR2 pyrroline-5-carboxylate reductase family, member 2 -0.42 25.5 6.10E-07 7.50E-05 SLC25A38 solute carrier family 25, member 38 -0.42 25.4 6.30E-07 7.70E-05 FBXO31 F-box protein 31 -0.42 25.4 6.30E-07 7.70E-05 GPI glucose-6-phosphate isomerase -0.42 25.4 6.40E-07 7.70E-05 ELAC2 elaC ribonuclease Z 2 -0.42 24.6 9.80E-07 1.10E-04 PUF60 poly-U binding splicing factor 60KDa -0.42 24 1.30E-06 1.40E-04 AARS alanyl-tRNA synthetase -0.42 23.5 1.70E-06 1.70E-04 TRIM28 tripartite motif containing 28 -0.42 23.4 1.80E-06 1.80E-04 ITPKB inositol-trisphosphate 3-kinase B -0.42 23.3 1.80E-06 1.80E-04 NUDC nudC nuclear distribution protein -0.42 23.2 1.90E-06 1.90E-04 CCDC92 coiled-coil domain containing 92 -0.43 28.7 1.20E-07 2.10E-05 TRPV2 transient receptor potential cation channel, subfamily V, member 2 -0.43 28.1 1.70E-07 2.80E-05 NMT1 N-myristoyltransferase 1 -0.43 27.8 1.90E-07 3.30E-05 IRF3 interferon regulatory factor 3 -0.43 27.7 2.00E-07 3.30E-05 EEF2K eukaryotic elongation factor 2 kinase -0.43 25.8 5.30E-07 6.80E-05 SIPA1L3 signal-induced proliferation-associated 1 like 3 -0.43 25.8 5.30E-07 6.80E-05 METTL16 methyltransferase like 16 -0.43 23.2 1.90E-06 1.90E-04 VPS11 vacuolar protein sorting 11 homolog (S. cerevisiae) -0.44 30 6.40E-08 1.30E-05

187 MFSD5 major facilitator superfamily domain containing 5 -0.44 29.5 8.20E-08 1.60E-05 MLEC malectin -0.44 29.4 8.60E-08 1.60E-05 TMEM179B transmembrane protein 179B -0.44 28.7 1.20E-07 2.10E-05 E2F4 E2F transcription factor 4, p107/p130-binding -0.44 28.7 1.30E-07 2.20E-05 PINX1 PIN2/TERF1 interacting, telomerase inhibitor 1 -0.44 27.1 2.80E-07 4.30E-05 RCC2 regulator of chromosome condensation 2 -0.44 26.8 3.10E-07 4.60E-05 CD5 CD5 molecule -0.44 26.7 3.50E-07 5.00E-05 RNF216 ring finger protein 216 -0.44 24.7 9.10E-07 1.10E-04 SCAMP2 tubulin, beta 2A class IIa -0.44 24.4 1.10E-06 1.30E-04 LRFN3 leucine rich repeat and fibronectin type III domain containing 3 -0.45 31.4 3.20E-08 7.50E-06 GIGYF2 GRB10 interacting GYF protein 2 -0.45 29.5 8.30E-08 1.60E-05 L3MBTL2 l(3)mbt-like 2 (Drosophila) -0.45 29.4 8.80E-08 1.60E-05 USP7 ubiquitin specific peptidase 7 (herpes virus-associated) -0.45 27.3 2.50E-07 4.00E-05 EIF3B eukaryotic translation initiation factor 3, subunit B -0.45 26.9 3.20E-07 4.60E-05 DDB1 damage-specific DNA binding protein 1, 127kDa -0.45 26.5 3.70E-07 5.10E-05 PACSIN1 protein kinase C and casein kinase substrate in neurons 1 -0.45 26.4 4.00E-07 5.40E-05 FBXW4 F-box and WD repeat domain containing 4 -0.45 26 4.90E-07 6.30E-05 METTL13 methyltransferase like 13 -0.46 32.6 1.80E-08 4.80E-06 S1PR5 sphingosine-1-phosphate receptor 5 -0.46 32.1 2.30E-08 5.80E-06 MORC2 MORC family CW-type zinc finger 2 -0.46 32 2.50E-08 6.10E-06 NKG7 natural killer cell granule protein 7 -0.46 31.3 3.40E-08 7.90E-06 TNPO2 transportin 2 -0.46 30.8 4.50E-08 1.00E-05 ADA deaminase -0.46 29.9 7.20E-08 1.40E-05 CNOT11 CCR4-NOT transcription complex, subunit 11 -0.46 29.7 7.60E-08 1.50E-05 URGCP upregulator of cell proliferation -0.46 29.6 8.20E-08 1.60E-05 GOT2 glutamic-oxaloacetic transaminase 2, mitochondrial -0.46 27.8 2.00E-07 3.30E-05 RDH13 retinol dehydrogenase 13 (all-trans/9-cis) -0.46 27.5 2.30E-07 3.70E-05 CHST12 carbohydrate (chondroitin 4) sulfotransferase 12 -0.47 34.1 8.70E-09 3.10E-06 SLC10A3 solute carrier family 10, member 3 -0.47 33.6 1.10E-08 3.70E-06

188 PPM1G protein phosphatase, Mg2+/Mn2+ dependent, 1G -0.47 33.3 1.30E-08 3.90E-06 AIP aryl hydrocarbon receptor interacting protein -0.47 33 1.50E-08 4.30E-06 RANGAP1 Ran GTPase activating protein 1 -0.47 30.8 4.60E-08 1.00E-05 POM121 POM121 transmembrane nucleoporin -0.47 30.6 5.20E-08 1.10E-05 KARS lysyl-tRNA synthetase -0.47 30.3 5.80E-08 1.20E-05 CDK2AP2 cyclin-dependent kinase 2 associated protein 2 -0.48 36.9 2.20E-09 1.60E-06 DDX54 DEAD (Asp-Glu-Ala-Asp) box polypeptide 54 -0.48 36.4 2.80E-09 1.60E-06 CXXC1 CXXC finger protein 1 -0.48 36.1 3.30E-09 1.80E-06 PAXIP1 PAX interacting (with transcription-activation domain) protein 1 -0.48 35.4 4.60E-09 2.10E-06 CCDC102A coiled-coil domain containing 102A -0.48 34.3 8.00E-09 2.90E-06 SCAMP3 secretory carrier membrane protein 3 -0.48 32.8 1.80E-08 4.80E-06 SRM spermidine synthase -0.48 32.1 2.50E-08 6.10E-06 RNF220 ring finger protein 220 -0.48 29.9 7.10E-08 1.40E-05 CDK11A cyclin-dependent kinase 11A -0.49 35.6 4.30E-09 2.10E-06 IL10RA interleukin 10 receptor, alpha -0.49 35.5 4.50E-09 2.10E-06 GPR114 adhesion G protein-coupled receptor G5 -0.49 34.6 7.10E-09 2.70E-06 NOP14 NOP14 nucleolar protein -0.49 33.6 1.20E-08 3.90E-06 CD6 CD6 molecule -0.49 33.2 1.40E-08 4.20E-06 TBC1D13 TBC1 domain family, member 13 -0.49 33.2 1.40E-08 4.20E-06 SLC35A4 solute carrier family 35, member A4 -0.49 33 1.50E-08 4.30E-06 FAM168B family with sequence similarity 168, member B -0.49 32.8 1.70E-08 4.80E-06 DOLPP1 dolichyldiphosphatase 1 -0.5 38.2 1.20E-09 1.10E-06 RUNX3 runt-related transcription factor 3 -0.5 37.5 1.70E-09 1.40E-06 IL2RB interleukin 2 receptor, beta -0.5 37.1 2.10E-09 1.60E-06 LRRC8A leucine rich repeat containing 8 family, member A -0.5 36.8 2.50E-09 1.60E-06 RASA3 RAS p21 protein activator 3 -0.5 35.6 4.50E-09 2.10E-06 CYP4F22 cytochrome P450, family 4, subfamily F, polypeptide 22 -0.5 32.7 1.90E-08 5.00E-06 ZNF362 zinc finger protein 362 -0.5 32.3 2.20E-08 5.70E-06 APBA3 amyloid beta (A4) precursor protein-binding, family A, member 3 -0.51 40 5.20E-10 5.20E-07

189 C17orf49 open reading frame 49 -0.51 37.4 1.80E-09 1.40E-06 ECHS1 enoyl CoA hydratase, short chain, 1, mitochondrial -0.51 36.9 2.40E-09 1.60E-06 CLPTM1L CLPTM1-like -0.51 36.7 2.50E-09 1.60E-06 LAT linker for activation of T cells -0.51 36 3.70E-09 1.90E-06 MED24 mediator complex subunit 24 -0.51 35.8 4.30E-09 2.10E-06 COG2 component of oligomeric golgi complex 2 -0.51 34.9 6.60E-09 2.60E-06 INTS5 integrator complex subunit 5 -0.52 42.3 1.70E-10 2.50E-07 SEC24C SEC24 family member C -0.52 42 2.00E-10 2.70E-07 TAF15 TAF15 RNA polymerase II, TATA box binding protein (TBP)- -0.52 36.6 2.80E-09 1.60E-06 associated factor, 68kDa HCST hematopoietic cell signal transducer -0.52 34.5 8.10E-09 2.90E-06 ICAM2 intercellular adhesion molecule 2 -0.53 43.8 8.40E-11 2.40E-07 EOMES eomesodermin -0.53 42.9 1.30E-10 2.50E-07 PCSK7 proprotein convertase subtilisin/kexin type 7 -0.53 42.6 1.50E-10 2.50E-07 NAT10 N-acetyltransferase 10 (GCN5-related) -0.53 41 3.50E-10 4.30E-07 CD40LG CD40 ligand -0.53 38.9 9.80E-10 9.20E-07 INO80 INO80 complex subunit -0.53 35.3 5.50E-09 2.50E-06 PLEKHF1 pleckstrin homology domain containing, family F (with FYVE -0.54 45.1 4.60E-11 1.50E-07 domain) member 1 DHX30 DEAH (Asp-Glu-Ala-His) box helicase 30 -0.54 40.3 4.90E-10 5.20E-07 OSBPL5 oxysterol binding protein-like 5 -0.55 42.7 1.50E-10 2.50E-07 JADE2 jade family PHD finger 2 -0.55 42.8 1.50E-10 2.50E-07 POM121C POM121 transmembrane nucleoporin C -0.57 40.5 4.90E-10 5.20E-07 SCRN1 secernin 1 -0.59 48.8 9.10E-12 3.60E-08 SIGIRR single immunoglobulin and toll-interleukin 1 receptor (TIR) -0.6 58.9 7.00E-14 6.90E-10 domain SPOCK2 sparc/osteonectin, cwcv and kazal-like domains proteoglycan -0.6 51.1 3.20E-12 2.10E-08 (testican) 2 SRPR signal recognition particle receptor (docking protein) -0.66 59.8 6.00E-14 6.90E-10 Estimated Marginal Mean Differences are reported in units of the standard deviation for the gene’s expression values. Positive values for estimated marginal means reflect genes more highly expressed among SZ cases. *Rows are sorted by decreasing value of the difference in estimated marginal means between groups, which is displayed in units of gene expression standard deviation. † Rows in bold survived a Bonferonni-correction for family-wise error inflation.

190 TABLE 7. GENE SETS SIGNIFICANTLY DYSREGULATED AT A BONFERRONI P < 0.05 BASED ON PERMUTATIONS OF SINGLE-GENE TEST STATISTICS FROM THE BLOOD MEGA-ANALYSIS. Blood - Blood - Brain – Corresponding Gene Set with Hypothesis Bonferroni- Database Gene Set Name Significant Test Result (Bonferroni p < Test Corrected 0.05). p -Value All Up- gs_h_hallmarksets HALLMARK_TNFA_SIGNALING_VIA_NFKB Regulated < 1.0E-6 All Up-Regulated All Up- gs_h_hallmarksets HALLMARK_HYPOXIA Regulated 0.020 All Up-Regulated All Up- gs_h_hallmarksets HALLMARK_G2M_CHECKPOINT Regulated 0.024 -- All Up- gs_h_hallmarksets HALLMARK_ANDROGEN_RESPONSE Regulated < 1.0E-6 All Up-Regulated All Up- gs_h_hallmarksets HALLMARK_COMPLEMENT Regulated 0.002 All Up-Regulated All Up- gs_h_hallmarksets HALLMARK_INFLAMMATORY_RESPONSE Regulated 0.045 All Up-Regulated Mixed Up- gs_h_hallmarksets HALLMARK_GLYCOLYSIS Regulated 0.023 -- All Up- gs_h_hallmarksets HALLMARK_UV_RESPONSE_DN Regulated 0.022 All Up-Regulated All Up- gs_h_hallmarksets HALLMARK_HEME_METABOLISM Regulated 0.000 -- All Up- gs_c1_chrregions chr14q21 Regulated 0.001 -- All Up- gs_c1_chrregions chr12p12 Regulated 0.017 -- Mixed Up- gs_c1_chrregions chr6q22 Regulated < 1.0E-6 -- All Up- gs_c2_curatedsets KEGG_LEISHMANIA_INFECTION Regulated 0.028 All Up-Regulated All Up- gs_c2_curatedsets KEGG_SYSTEMIC_LUPUS_ERYTHEMATOSUS Regulated 0.009 -- All Up- gs_c2_curatedsets BIOCARTA_PPARA_PATHWAY Regulated 0.023 -- All Up- gs_c2_curatedsets PID_FCER1PATHWAY Regulated 0.018 -- All Up- gs_c2_curatedsets PID_MET_PATHWAY Regulated 0.005 -- All Up- gs_c2_curatedsets PID_IL6_7PATHWAY Regulated 0.005 -- All Up- gs_c2_curatedsets PID_PDGFRBPATHWAY Regulated < 1.0E-6 All Up-Regulated All Up- gs_c2_curatedsets PID_HIF1_TFPATHWAY Regulated 0.018 --

191 All Up- gs_c2_curatedsets PID_FAK_PATHWAY Regulated < 1.0E-6 -- All Up- gs_c2_curatedsets REACTOME_SIGNALING_BY_ILS Regulated 0.037 -- All Up- gs_c2_curatedsets REACTOME_HEMOSTASIS Regulated < 1.0E-6 -- All Up- gs_c2_curatedsets REACTOME_INNATE_IMMUNE_SYSTEM Regulated < 1.0E-6 -- All Up- gs_c2_curatedsets REACTOME_TOLL_RECEPTOR_CASCADES Regulated 0.032 -- All Up- gs_c2_curatedsets PICCALUGA_ANGIOIMMUNOBLASTIC_LYMPHOMA_DN Regulated < 1.0E-6 All Up-Regulated All Up- gs_c2_curatedsets SENGUPTA_NASOPHARYNGEAL_CARCINOMA_WITH_LMP1_UP Regulated < 1.0E-6 All Up-Regulated All Up- gs_c2_curatedsets WILCOX_PRESPONSE_TO_ROGESTERONE_UP Regulated 0.018 -- All Up- gs_c2_curatedsets FULCHER_INFLAMMATORY_RESPONSE_LECTIN_VS_LPS_DN Regulated < 1.0E-6 All Up-Regulated All Up- gs_c2_curatedsets PUIFFE_INVASION_INHIBITED_BY_ASCITES_DN Regulated < 1.0E-6 -- All Up- gs_c2_curatedsets THUM_SYSTOLIC_HEART_FAILURE_UP Regulated < 1.0E-6 All Up-Regulated All Up- gs_c2_curatedsets CASORELLI_ACUTE_PROMYELOCYTIC_LEUKEMIA_UP Regulated < 1.0E-6 -- All Up- gs_c2_curatedsets CHARAFE_BREAST_CANCER_LUMINAL_VS_BASAL_DN Regulated < 1.0E-6 All Up-Regulated CHARAFE_BREAST_CANCER_LUMINAL_VS_MESENCHYMAL_D All Up- gs_c2_curatedsets N Regulated < 1.0E-6 All Up-Regulated Mixed Up- gs_c2_curatedsets LAIHO_COLORECTAL_CANCER_SERRATED_UP Regulated 0.032 All Up-Regulated All Up- gs_c2_curatedsets BORCZUK_MALIGNANT_MESOTHELIOMA_UP Regulated 0.009 All Up-Regulated All Up- gs_c2_curatedsets HORIUCHI_WTAP_TARGETS_UP Regulated < 1.0E-6 All Up-Regulated All Up- gs_c2_curatedsets GAL_LEUKEMIC_STEM_CELL_DN Regulated 0.014 -- All Up- gs_c2_curatedsets BASAKI_YBX1_TARGETS_DN Regulated < 1.0E-6 All Up-Regulated All Up- gs_c2_curatedsets WANG_LMO4_TARGETS_UP Regulated < 1.0E-6 All Up-Regulated SMIRNOV_CIRCULATING_ENDOTHELIOCYTES_IN_CANCER_U All Up- gs_c2_curatedsets P Regulated < 1.0E-6 All Up-Regulated All Up- gs_c2_curatedsets OSMAN_BLADDER_CANCER_UP Regulated < 1.0E-6 All Up-Regulated All Up- gs_c2_curatedsets GINESTIER_BREAST_CANCER_ZNF217_AMPLIFIED_UP Regulated < 1.0E-6 All Up-Regulated GARGALOVIC_RESPONSE_TO_OXIDIZED_PHOSPHOLIPIDS_TURQ All Up- gs_c2_curatedsets UOISE_UP Regulated 0.014 -- All Up- gs_c2_curatedsets TAKEDA_TARGETS_OF_NUP98_HOXA9_FUSION_8D_DN Regulated < 1.0E-6 All Up-Regulated

192 All Up- gs_c2_curatedsets TONKS_TARGETS_OF_RUNX1_RUNX1T1_FUSION_HSC_UP Regulated < 1.0E-6 All Up-Regulated All Up- gs_c2_curatedsets TONKS_TARGETS_OF_RUNX1_RUNX1T1_FUSION_MONOCYTE_DN Regulated 0.037 -- All Up- gs_c2_curatedsets UDAYAKUMAR_MED1_TARGETS_DN Regulated 0.037 All Up-Regulated All Up- gs_c2_curatedsets SENESE_HDAC1_TARGETS_UP Regulated < 1.0E-6 All Up-Regulated All Up- gs_c2_curatedsets SENESE_HDAC3_TARGETS_UP Regulated < 1.0E-6 All Up-Regulated All Up- gs_c2_curatedsets KIM_WT1_TARGETS_DN Regulated 0.009 All Up-Regulated Mixed Up- gs_c2_curatedsets ELVIDGE_HYPOXIA_UP Regulated 0.005 All Up-Regulated Mixed Up- gs_c2_curatedsets ELVIDGE_HYPOXIA_BY_DMOG_UP Regulated < 1.0E-6 All Up-Regulated Mixed Up- gs_c2_curatedsets JAATINEN_HEMATOPOIETIC_STEM_CELL_DN Regulated < 1.0E-6 All Up-Regulated All Up- gs_c2_curatedsets GRAHAM_CML_DIVIDING_VS_NORMAL_QUIESCENT_DN Regulated 0.009 All Up-Regulated All Up- gs_c2_curatedsets GRAHAM_NORMAL_QUIESCENT_VS_NORMAL_DIVIDING_UP Regulated 0.046 All Up-Regulated All Up- gs_c2_curatedsets BIDUS_METASTASIS_UP Regulated < 1.0E-6 All Up-Regulated All Up- gs_c2_curatedsets WAMUNYOKOLI_OVARIAN_CANCER_LMP_DN Regulated 0.009 All Up-Regulated All Up- gs_c2_curatedsets HAHTOLA_SEZARY_SYNDROM_UP Regulated < 1.0E-6 -- All Up- gs_c2_curatedsets ENK_UV_RESPONSE_KERATINOCYTE_DN Regulated < 1.0E-6 All Up-Regulated All Up- gs_c2_curatedsets GAUSSMANN_MLL_AF4_FUSION_TARGETS_A_UP Regulated 0.023 -- All Up- gs_c2_curatedsets DUNNE_TARGETS_OF_AML1_MTG8_FUSION_UP Regulated 0.018 All Up-Regulated All Up- gs_c2_curatedsets SEIDEN_ONCOGENESIS_BY_MET Regulated < 1.0E-6 All Up-Regulated All Up- gs_c2_curatedsets DACOSTA_UV_RESPONSE_VIA_ERCC3_COMMON_DN Regulated < 1.0E-6 All Up-Regulated All Up- gs_c2_curatedsets GROSS_HYPOXIA_VIA_ELK3_DN Regulated < 1.0E-6 All Up-Regulated All Up- gs_c2_curatedsets GROSS_HYPOXIA_VIA_ELK3_AND_HIF1A_UP Regulated < 1.0E-6 All Up-Regulated All Up- gs_c2_curatedsets SCHAEFFER_PROSTATE_DEVELOPMENT_6HR_UP Regulated < 1.0E-6 All Up-Regulated All Up- gs_c2_curatedsets SHEN_SMARCA2_TARGETS_UP Regulated < 1.0E-6 All Up-Regulated All Up- gs_c2_curatedsets DORSAM_HOXA9_TARGETS_UP Regulated 0.037 -- All Up- gs_c2_curatedsets BROWN_MYELOID_CELL_DEVELOPMENT_UP Regulated < 1.0E-6 --

193 All Up- gs_c2_curatedsets HESS_TARGETS_OF_HOXA9_AND_MEIS1_DN Regulated < 1.0E-6 -- All Up- gs_c2_curatedsets LENAOUR_DENDRITIC_CELL_MATURATION_DN Regulated 0.032 All Up-Regulated All Up- gs_c2_curatedsets VERHAAK_AML_WITH_NPM1_MUTATED_UP Regulated < 1.0E-6 All Up-Regulated All Up- gs_c2_curatedsets LI_WILMS_TUMOR_VS_FETAL_KIDNEY_1_UP Regulated < 1.0E-6 All Up-Regulated All Up- gs_c2_curatedsets THEILGAARD_NEUTROPHIL_AT_SKIN_WOUND_DN Regulated < 1.0E-6 All Up-Regulated All Up- gs_c2_curatedsets LIAN_NEUTROPHIL_GRANULE_CONSTITUENTS Regulated 0.014 -- All Up- gs_c2_curatedsets REN_ALVEOLAR_RHABDOMYOSARCOMA_DN Regulated < 1.0E-6 All Up-Regulated All Up- gs_c2_curatedsets IVANOVA_HEMATOPOIESIS_MATURE_CELL Regulated < 1.0E-6 -- All Up- gs_c2_curatedsets DEBIASI_APOPTOSIS_BY_REOVIRUS_INFECTION_UP Regulated < 1.0E-6 All Up-Regulated All Up- gs_c2_curatedsets LU_AGING_BRAIN_UP Regulated < 1.0E-6 All Up-Regulated All Up- gs_c2_curatedsets GENTILE_UV_HIGH_DOSE_DN Regulated 0.009 All Up-Regulated All Up- gs_c2_curatedsets MCLACHLAN_DENTAL_CARIES_DN Regulated 0.005 All Up-Regulated All Up- gs_c2_curatedsets DAZARD_RESPONSE_TO_UV_NHEK_DN Regulated < 1.0E-6 All Up-Regulated All Up- gs_c2_curatedsets BURTON_ADIPOGENESIS_11 Regulated 0.005 -- All Up- gs_c2_curatedsets DAZARD_RESPONSE_TO_UV_SCC_UP Regulated < 1.0E-6 All Up-Regulated All Up- gs_c2_curatedsets JIANG_HYPOXIA_NORMAL Regulated 0.009 All Up-Regulated All Up- gs_c2_curatedsets BAELDE_DIABETIC_NEPHROPATHY_DN Regulated < 1.0E-6 All Up-Regulated All Up- gs_c2_curatedsets DAZARD_UV_RESPONSE_CLUSTER_G6 Regulated 0.009 -- All Up- gs_c2_curatedsets BROWNE_HCMV_INFECTION_6HR_DN Regulated 0.005 -- All Up- gs_c2_curatedsets MARTINEZ_RESPONSE_TO_TRABECTEDIN_DN Regulated < 1.0E-6 All Up-Regulated All Up- gs_c2_curatedsets MCLACHLAN_DENTAL_CARIES_UP Regulated < 1.0E-6 All Up-Regulated All Up- gs_c2_curatedsets DURCHDEWALD_SKIN_CARCINOGENESIS_DN Regulated < 1.0E-6 All Up-Regulated All Up- gs_c2_curatedsets CREIGHTON_ENDOCRINE_THERAPY_RESISTANCE_5 Regulated < 1.0E-6 All Up-Regulated Mixed Up- gs_c2_curatedsets GAVIN_PDE3B_TARGETS Regulated < 1.0E-6 -- All Up- gs_c2_curatedsets ZHENG_BOUND_BY_FOXP3 Regulated < 1.0E-6 All Up-Regulated

194 All Up- gs_c2_curatedsets ZHENG_FOXP3_TARGETS_IN_THYMUS_UP Regulated < 1.0E-6 All Up-Regulated All Up- gs_c2_curatedsets RIGGINS_TAMOXIFEN_RESISTANCE_DN Regulated < 1.0E-6 All Up-Regulated All Up- gs_c2_curatedsets FOSTER_TOLERANT_MACROPHAGE_DN Regulated < 1.0E-6 All Up-Regulated All Up- gs_c2_curatedsets SARRIO_EPITHELIAL_MESENCHYMAL_TRANSITION_DN Regulated < 1.0E-6 All Up-Regulated All Up- gs_c2_curatedsets DE_YY1_TARGETS_DN Regulated < 1.0E-6 All Up-Regulated All Up- gs_c2_curatedsets ACEVEDO_NORMAL_TISSUE_ADJACENT_TO_LIVER_TUMOR_UP Regulated < 1.0E-6 -- All Up- gs_c2_curatedsets CHUNG_BLISTER_CYTOTOXICITY_DN Regulated < 1.0E-6 -- All Up- gs_c2_curatedsets PODAR_RESPONSE_TO_ADAPHOSTIN_UP Regulated < 1.0E-6 All Up-Regulated All Up- gs_c2_curatedsets CHEN_HOXA5_TARGETS_9HR_UP Regulated 0.005 All Up-Regulated All Up- gs_c2_curatedsets WANG_TUMOR_INVASIVENESS_DN Regulated 0.005 All Up-Regulated Mixed Up- gs_c2_curatedsets DAVIES_MULTIPLE_MYELOMA_VS_MGUS_DN Regulated 0.028 -- Mixed Up- gs_c2_curatedsets BOYLAN_MULTIPLE_MYELOMA_C_D_DN Regulated 0.046 All Up-Regulated All Up- gs_c2_curatedsets MARTINELLI_IMMATURE_NEUTROPHIL_UP Regulated < 1.0E-6 -- All Up- gs_c2_curatedsets RUTELLA_RESPONSE_TO_CSF2RB_AND_IL4_DN Regulated < 1.0E-6 All Up-Regulated All Up- gs_c2_curatedsets RUTELLA_RESPONSE_TO_HGF_DN Regulated < 1.0E-6 All Up-Regulated All Up- gs_c2_curatedsets RUTELLA_RESPONSE_TO_HGF_VS_CSF2RB_AND_IL4_UP Regulated < 1.0E-6 All Up-Regulated All Up- gs_c2_curatedsets MILI_PSEUDOPODIA_HAPTOTAXIS_UP Regulated < 1.0E-6 All Up-Regulated All Up- gs_c2_curatedsets KOBAYASHI_EGFR_SIGNALING_24HR_UP Regulated < 1.0E-6 All Up-Regulated All Up- gs_c2_curatedsets DANG_REGULATED_BY_MYC_DN Regulated 0.018 All Up-Regulated All Up- gs_c2_curatedsets CHANDRAN_METASTASIS_UP Regulated 0.014 -- All Up- gs_c2_curatedsets KAMIKUBO_MYELOID_CEBPA_NETWORK Regulated 0.032 -- All Up- gs_c2_curatedsets VERHAAK_GLIOBLASTOMA_MESENCHYMAL Regulated < 1.0E-6 All Up-Regulated All Up- gs_c2_curatedsets HIRSCH_CELLULAR_TRANSFORMATION_SIGNATURE_UP Regulated 0.014 All Up-Regulated All Up- gs_c2_curatedsets WANG_RESPONSE_TO_GSK3_INHIBITOR_SB216763_UP Regulated < 1.0E-6 All Up-Regulated All Up- gs_c2_curatedsets LU_EZH2_TARGETS_DN Regulated < 1.0E-6 All Up-Regulated

195 All Up- gs_c2_curatedsets DUTERTRE_ESTRADIOL_RESPONSE_24HR_DN Regulated < 1.0E-6 All Up-Regulated All Up- gs_c2_curatedsets GABRIELY_MIR21_TARGETS Regulated < 1.0E-6 All Up-Regulated All Up- gs_c2_curatedsets JOHNSTONE_PARVB_TARGETS_2_DN Regulated < 1.0E-6 All Up-Regulated All Up- gs_c2_curatedsets TORCHIA_TARGETS_OF_EWSR1_FLI1_FUSION_DN Regulated 0.018 All Up-Regulated All Up- gs_c2_curatedsets IKEDA_MIR133_TARGETS_UP Regulated < 1.0E-6 -- All Up- gs_c2_curatedsets IKEDA_MIR30_TARGETS_UP Regulated < 1.0E-6 All Up-Regulated All Up- gs_c2_curatedsets HUANG_GATA2_TARGETS_UP Regulated < 1.0E-6 -- All Up- gs_c2_curatedsets ZWANG_CLASS_1_TRANSIENTLY_INDUCED_BY_EGF Regulated 0.005 All Up-Regulated All Up- gs_c2_curatedsets ZWANG_CLASS_3_TRANSIENTLY_INDUCED_BY_EGF Regulated < 1.0E-6 All Up-Regulated Mixed Up- gs_c3_regulatory V$CHOP_01 Regulated 0.040 All Up-Regulated All Up- gs_c3_regulatory V$MAF_Q6 Regulated 0.009 -- AGCACTT,MIR-93,MIR-302A,MIR-302B,MIR-302C,MIR-302D,MIR- 372,MIR-373,MIR-520E,MIR-520A,MIR-526B,MIR-520B,MIR- All Up- gs_c3_regulatory 520C,MIR-520D Regulated 0.018 All Up-Regulated All Up- gs_c3_regulatory GTGCAAT,MIR-25,MIR-32,MIR-92,MIR-363,MIR-367 Regulated 0.008 All Up-Regulated All Up- gs_c3_regulatory TGAATGT,MIR-181A,MIR-181B,MIR-181C,MIR-181D Regulated < 1.0E-6 All Up-Regulated All Up- gs_c3_regulatory TGCACTG,MIR-148A,MIR-152,MIR-148B Regulated < 1.0E-6 All Up-Regulated All Up- gs_c3_regulatory TTGCACT,MIR-130A,MIR-301,MIR-130B Regulated < 1.0E-6 All Up-Regulated All Up- gs_c3_regulatory TGCACTT,MIR-519C,MIR-519B,MIR-519A Regulated < 1.0E-6 All Up-Regulated All Up- gs_c3_regulatory CAGTATT,MIR-200B,MIR-200C,MIR-429 Regulated < 1.0E-6 All Up-Regulated All Up- gs_c3_regulatory ACATTCC,MIR-1,MIR-206 Regulated 0.007 All Up-Regulated All Up- gs_c3_regulatory AAAGGGA,MIR-204,MIR-211 Regulated < 1.0E-6 All Up-Regulated All Up- gs_c3_regulatory AATGTGA,MIR-23A,MIR-23B Regulated 0.012 All Up-Regulated All Up- gs_c3_regulatory ACTGAAA,MIR-30A-3P,MIR-30E-3P Regulated 0.018 All Up-Regulated All Up- gs_c3_regulatory TACTTGA,MIR-26A,MIR-26B Regulated < 1.0E-6 All Up-Regulated All Up- gs_c3_regulatory CAGTGTT,MIR-141,MIR-200A Regulated 0.017 All Up-Regulated

196 All Up- gs_c3_regulatory TTTGCAC,MIR-19A,MIR-19B Regulated < 1.0E-6 -- All Up- gs_c3_regulatory ATGTTAA,MIR-302C Regulated < 1.0E-6 All Up-Regulated All Up- gs_c3_regulatory CATTTCA,MIR-203 Regulated < 1.0E-6 All Up-Regulated All Up- gs_c3_regulatory TCCAGAT,MIR-516-5P Regulated 0.030 -- All Up- gs_c3_regulatory ACTGCAG,MIR-17-3P Regulated 0.001 -- All Up- gs_c3_regulatory CTTGTAT,MIR-381 Regulated < 1.0E-6 All Up-Regulated All Up- gs_c3_regulatory GTGTTGA,MIR-505 Regulated 0.017 All Up-Regulated All Up- gs_c3_regulatory GTATTAT,MIR-369-3P Regulated 0.014 All Up-Regulated All Up- gs_c3_regulatory GTGACTT,MIR-224 Regulated 0.004 All Up-Regulated All Up- gs_c3_regulatory ATAGGAA,MIR-202 Regulated 0.007 All Up-Regulated All Up- gs_c3_regulatory TTGCCAA,MIR-182 Regulated < 1.0E-6 All Up-Regulated All Up- gs_c3_regulatory TAGCTTT,MIR-9 Regulated < 1.0E-6 All Up-Regulated All Up- gs_c3_regulatory GTACTGT,MIR-101 Regulated < 1.0E-6 All Up-Regulated All Up- gs_c3_regulatory ATATGCA,MIR-448 Regulated 0.001 All Up-Regulated All Up- gs_c3_regulatory ATGAAGG,MIR-205 Regulated 0.002 All Up-Regulated All Up- gs_c3_regulatory CATGTAA,MIR-496 Regulated < 1.0E-6 All Up-Regulated All Up- gs_c3_regulatory AGCATTA,MIR-155 Regulated < 1.0E-6 All Up-Regulated All Up- gs_c3_regulatory GCAAAAA,MIR-129 Regulated < 1.0E-6 All Up-Regulated All Up- gs_c3_regulatory CTTTGTA,MIR-524 Regulated < 1.0E-6 All Up-Regulated All Up- gs_c3_regulatory CTGTTAC,MIR-194 Regulated 0.021 All Up-Regulated All Up- gs_c3_regulatory ATGCAGT,MIR-217 Regulated < 1.0E-6 -- All Up- gs_c3_regulatory AAGCAAT,MIR-137 Regulated 0.009 All Up-Regulated All Up- gs_c3_regulatory ACCAAAG,MIR-9 Regulated 0.015 All Up-Regulated All Up- gs_c3_regulatory TTTGTAG,MIR-520D Regulated < 1.0E-6 All Up-Regulated All Up- gs_c3_regulatory AACTGAC,MIR-223 Regulated 0.001 --

197 All Up- gs_c3_regulatory ATAAGCT,MIR-21 Regulated 0.007 All Up-Regulated All Up- gs_c3_regulatory TTTGCAG,MIR-518A-2 Regulated < 1.0E-6 All Up-Regulated All Up- gs_c3_regulatory TCATCTC,MIR-143 Regulated 0.002 All Up-Regulated All Up- gs_c3_regulatory AACTGGA,MIR-145 Regulated 0.001 All Up-Regulated All Up- gs_c3_regulatory AAGCACT,MIR-520F Regulated 0.024 All Up-Regulated All Up- gs_c3_regulatory TGCTTTG,MIR-330 Regulated 0.028 -- Mixed Up- gs_c3_regulatory GGCACTT,MIR-519E Regulated 0.012 -- All Up- gs_c3_regulatory ATGTACA,MIR-493 Regulated 0.001 All Up-Regulated All Up- gs_c3_regulatory GCATTTG,MIR-105 Regulated < 1.0E-6 All Up-Regulated All Up- gs_c3_regulatory TCCAGAG,MIR-518C Regulated 0.032 -- All Up- gs_c3_regulatory GTTATAT,MIR-410 Regulated 0.027 All Up-Regulated All Up- gs_c3_regulatory ACACTAC,MIR-142-3P Regulated 0.004 -- All Up- gs_c3_regulatory ATTCTTT,MIR-186 Regulated < 1.0E-6 All Up-Regulated Mixed Up- gs_c3_regulatory ATCTTGC,MIR-31 Regulated 0.032 -- Mixed Up- gs_c3_regulatory AAGGGAT,MIR-188 Regulated 0.014 -- All Up- gs_c3_regulatory TAATAAT,MIR-126 Regulated 0.016 -- All Up- gs_c3_regulatory TAATGTG,MIR-323 Regulated < 1.0E-6 All Up-Regulated All Up- gs_c6_oncogenic EGFR_UP.V1_UP Regulated < 1.0E-6 All Up-Regulated All Up- gs_c6_oncogenic ERB2_UP.V1_DN Regulated 0.003 All Up-Regulated All Up- gs_c6_oncogenic ERB2_UP.V1_UP Regulated 0.009 -- All Up- gs_c6_oncogenic PIGF_UP.V1_UP Regulated < 1.0E-6 All Up-Regulated All Up- gs_c6_oncogenic VEGF_A_UP.V1_DN Regulated < 1.0E-6 All Up-Regulated All Up- gs_c6_oncogenic LTE2_UP.V1_UP Regulated < 1.0E-6 All Up-Regulated All Up- gs_c6_oncogenic MEK_UP.V1_DN Regulated < 1.0E-6 All Up-Regulated All Up- gs_c6_oncogenic MEK_UP.V1_UP Regulated 0.049 --

198 All Up- gs_c6_oncogenic RAF_UP.V1_UP Regulated 0.006 All Up-Regulated Mixed Up- gs_c6_oncogenic PTEN_DN.V2_DN Regulated 0.009 -- All Up- gs_c6_oncogenic EIF4E_DN Regulated 0.000 All Up-Regulated All Up- gs_c6_oncogenic RB_P130_DN.V1_DN Regulated < 1.0E-6 All Up-Regulated All Up- gs_c6_oncogenic RPS14_DN.V1_UP Regulated < 1.0E-6 All Up-Regulated All Up- gs_c6_oncogenic HOXA9_DN.V1_UP Regulated < 1.0E-6 All Up-Regulated All Up- gs_c6_oncogenic STK33_NOMO_UP Regulated < 1.0E-6 All Up-Regulated All Up- gs_c6_oncogenic STK33_SKM_UP Regulated < 1.0E-6 All Up-Regulated All Up- gs_c6_oncogenic STK33_UP Regulated < 1.0E-6 All Up-Regulated All Up- gs_c6_oncogenic TBK1.DF_DN Regulated < 1.0E-6 All Up-Regulated All Up- gs_c7_immuno GSE10239_NAIVE_VS_MEMORY_CD8_TCELL_UP Regulated < 1.0E-6 All Up-Regulated All Up- gs_c7_immuno GSE10239_MEMORY_VS_KLRG1HIGH_EFF_CD8_TCELL_DN Regulated 0.002 All Up-Regulated All Up- gs_c7_immuno GSE10325_CD4_TCELL_VS_BCELL_DN Regulated 0.011 -- All Up- gs_c7_immuno GSE10325_CD4_TCELL_VS_MYELOID_DN Regulated < 1.0E-6 All Up-Regulated All Up- gs_c7_immuno GSE10325_BCELL_VS_MYELOID_DN Regulated < 1.0E-6 All Up-Regulated All Up- gs_c7_immuno GSE10325_LUPUS_CD4_TCELL_VS_LUPUS_BCELL_DN Regulated < 1.0E-6 -- All Up- gs_c7_immuno GSE10325_LUPUS_CD4_TCELL_VS_LUPUS_MYELOID_DN Regulated < 1.0E-6 All Up-Regulated All Up- gs_c7_immuno GSE10325_LUPUS_BCELL_VS_LUPUS_MYELOID_DN Regulated < 1.0E-6 All Up-Regulated All Up- gs_c7_immuno GSE11057_NAIVE_VS_CENT_MEMORY_CD4_TCELL_DN Regulated 0.002 All Up-Regulated All Up- gs_c7_immuno GSE11057_NAIVE_CD4_VS_PBMC_CD4_TCELL_DN Regulated < 1.0E-6 All Up-Regulated All Up- gs_c7_immuno GSE11057_CD4_EFF_MEM_VS_PBMC_DN Regulated < 1.0E-6 All Up-Regulated All Up- gs_c7_immuno GSE11057_CD4_CENT_MEM_VS_PBMC_DN Regulated < 1.0E-6 All Up-Regulated All Up- gs_c7_immuno GSE11057_NAIVE_VS_MEMORY_CD4_TCELL_DN Regulated < 1.0E-6 All Up-Regulated All Up- gs_c7_immuno GSE11057_PBMC_VS_MEM_CD4_TCELL_UP Regulated < 1.0E-6 -- All Up- gs_c7_immuno GSE11864_CSF1_VS_CSF1_PAM3CYS_IN_MAC_UP Regulated < 1.0E-6 --

199 All Up- gs_c7_immuno GSE11864_CSF1_IFNG_VS_CSF1_PAM3CYS_IN_MAC_UP Regulated 0.010 -- All Up- gs_c7_immuno GSE12366_GC_VS_NAIVE_BCELL_DN Regulated 0.004 -- All Up- gs_c7_immuno GSE12845_IGD_POS_BLOOD_VS_NAIVE_TONSIL_BCELL_DN Regulated 0.036 -- All Up- gs_c7_immuno GSE13411_IGM_VS_SWITCHED_MEMORY_BCELL_DN Regulated < 1.0E-6 -- All Up- gs_c7_immuno GSE13484_UNSTIM_VS_12H_YF17D_VACCINE_STIM_PBMC_DN Regulated 0.036 -- All Up- gs_c7_immuno GSE13484_12H_UNSTIM_VS_YF17D_VACCINE_STIM_PBMC_UP Regulated 0.004 -- All Up- gs_c7_immuno GSE13485_CTRL_VS_DAY1_YF17D_VACCINE_PBMC_UP Regulated < 1.0E-6 All Up-Regulated All Up- gs_c7_immuno GSE13485_DAY1_VS_DAY3_YF17D_VACCINE_PBMC_DN Regulated < 1.0E-6 All Up-Regulated All Up- gs_c7_immuno GSE13485_PRE_VS_POST_YF17D_VACCINATION_PBMC_UP Regulated 0.048 All Up-Regulated All Up- gs_c7_immuno GSE14000_4H_VS_16H_LPS_DC_TRANSLATED_RNA_UP Regulated < 1.0E-6 -- Mixed Up- gs_c7_immuno GSE14000_4H_VS_16H_LPS_DC_UP Regulated 0.048 -- All Up- gs_c7_immuno GSE14308_NAIVE_CD4_TCELL_VS_INDUCED_TREG_UP Regulated < 1.0E-6 -- GSE1460_CD4_THYMOCYTE_VS_NAIVE_CD4_TCELL_CORD_BLOO All Up- gs_c7_immuno D_UP Regulated < 1.0E-6 -- GSE1460_NAIVE_CD4_TCELL_CORD_BLOOD_VS_THYMIC_STR Mixed Up- gs_c7_immuno OMAL_CELL_DN Regulated < 1.0E-6 All Up-Regulated All Up- gs_c7_immuno GSE14769_UNSTIM_VS_80MIN_LPS_BMDM_DN Regulated 0.040 All Up-Regulated All Up- gs_c7_immuno GSE15767_MED_VS_SCS_MAC_LN_UP Regulated 0.013 All Up-Regulated Mixed Up- gs_c7_immuno GSE17580_TREG_VS_TEFF_S_MANSONI_INF_DN Regulated 0.031 All Up-Regulated All Up- gs_c7_immuno GSE17721_CTRL_VS_GARDIQUIMOD_1H_BMDM_DN Regulated 0.017 -- All Up- gs_c7_immuno GSE17721_POLYIC_VS_PAM3CSK4_6H_BMDM_UP Regulated 0.032 -- All Up- gs_c7_immuno GSE17721_CPG_VS_GARDIQUIMOD_4H_BMDM_DN Regulated 0.008 -- Mixed Up- gs_c7_immuno GSE17721_CPG_VS_GARDIQUIMOD_16H_BMDM_DN Regulated 0.002 All Up-Regulated All Up- gs_c7_immuno GSE17721_LPS_VS_PAM3CSK4_8H_BMDM_UP Regulated < 1.0E-6 -- All Up- gs_c7_immuno GSE17721_0.5H_VS_12H_CPG_BMDM_DN Regulated 0.032 All Up-Regulated All Up- gs_c7_immuno GSE17721_0.5H_VS_4H_GARDIQUIMOD_BMDM_DN Regulated 0.013 All Up-Regulated All Up- gs_c7_immuno GSE17974_0H_VS_12H_IN_VITRO_ACT_CD4_TCELL_UP Regulated 0.008 --

200 All Up- gs_c7_immuno GSE17974_0H_VS_24H_IN_VITRO_ACT_CD4_TCELL_UP Regulated < 1.0E-6 -- All Up- gs_c7_immuno GSE17974_0H_VS_48H_IN_VITRO_ACT_CD4_TCELL_UP Regulated < 1.0E-6 -- GSE17974_CTRL_VS_ACT_IL4_AND_ANTI_IL12_12H_CD4_TCELL All Up- gs_c7_immuno _UP Regulated 0.002 All Up-Regulated GSE17974_CTRL_VS_ACT_IL4_AND_ANTI_IL12_24H_CD4_TCELL All Up- gs_c7_immuno _UP Regulated < 1.0E-6 All Up-Regulated All Up- gs_c7_immuno GSE17974_1.5H_VS_72H_IL4_AND_ANTI_IL12_ACT_CD4_TCELL_UP Regulated 0.032 -- All Up- gs_c7_immuno GSE20715_WT_VS_TLR4_KO_LUNG_UP Regulated 0.002 All Up-Regulated All Up- gs_c7_immuno GSE20715_0H_VS_6H_OZONE_TLR4_KO_LUNG_DN Regulated < 1.0E-6 All Up-Regulated All Up- gs_c7_immuno GSE20715_0H_VS_24H_OZONE_TLR4_KO_LUNG_DN Regulated 0.034 All Up-Regulated All Up- gs_c7_immuno GSE22886_NAIVE_TCELL_VS_NKCELL_DN Regulated 0.002 All Up-Regulated All Up- gs_c7_immuno GSE22886_NAIVE_VS_MEMORY_TCELL_DN Regulated < 1.0E-6 All Up-Regulated All Up- gs_c7_immuno GSE22886_NAIVE_CD8_TCELL_VS_NKCELL_DN Regulated 0.021 All Up-Regulated All Up- gs_c7_immuno GSE22886_NAIVE_CD4_TCELL_VS_NKCELL_DN Regulated 0.034 -- GSE22886_IGG_IGA_MEMORY_BCELL_VS_BLOOD_PLASMA_CELL All Up- gs_c7_immuno _UP Regulated 0.038 -- GSE22886_IGG_IGA_MEMORY_BCELL_VS_BM_PLASMA_CELL_ All Up- gs_c7_immuno UP Regulated 0.008 All Up-Regulated All Up- gs_c7_immuno GSE22886_DAY0_VS_DAY1_MONOCYTE_IN_CULTURE_UP Regulated < 1.0E-6 -- All Up- gs_c7_immuno GSE22886_DAY0_VS_DAY7_MONOCYTE_IN_CULTURE_UP Regulated < 1.0E-6 -- All Up- gs_c7_immuno GSE22886_DC_VS_MONOCYTE_DN Regulated < 1.0E-6 -- All Up- gs_c7_immuno GSE22886_NAIVE_TCELL_VS_NEUTROPHIL_DN Regulated < 1.0E-6 -- All Up- gs_c7_immuno GSE22886_NAIVE_TCELL_VS_DC_DN Regulated < 1.0E-6 All Up-Regulated All Up- gs_c7_immuno GSE22886_NAIVE_TCELL_VS_MONOCYTE_DN Regulated < 1.0E-6 All Up-Regulated All Up- gs_c7_immuno GSE22886_NAIVE_BCELL_VS_NEUTROPHIL_DN Regulated < 1.0E-6 All Up-Regulated All Up- gs_c7_immuno GSE22886_NAIVE_BCELL_VS_DC_DN Regulated < 1.0E-6 -- All Up- gs_c7_immuno GSE22886_NAIVE_BCELL_VS_MONOCYTE_DN Regulated < 1.0E-6 All Up-Regulated All Up- gs_c7_immuno GSE22886_NAIVE_CD8_TCELL_VS_NEUTROPHIL_DN Regulated < 1.0E-6 All Up-Regulated All Up- gs_c7_immuno GSE22886_NAIVE_CD8_TCELL_VS_DC_DN Regulated < 1.0E-6 All Up-Regulated

201 All Up- gs_c7_immuno GSE22886_NAIVE_CD8_TCELL_VS_MONOCYTE_DN Regulated < 1.0E-6 All Up-Regulated All Up- gs_c7_immuno GSE22886_NAIVE_CD4_TCELL_VS_NEUTROPHIL_DN Regulated < 1.0E-6 All Up-Regulated All Up- gs_c7_immuno GSE22886_NAIVE_CD4_TCELL_VS_DC_DN Regulated < 1.0E-6 All Up-Regulated All Up- gs_c7_immuno GSE22886_NAIVE_CD4_TCELL_VS_MONOCYTE_DN Regulated < 1.0E-6 All Up-Regulated All Up- gs_c7_immuno GSE22886_TH1_VS_TH2_48H_ACT_DN Regulated < 1.0E-6 All Up-Regulated GSE24142_EARLY_THYMIC_PROGENITOR_VS_DN2_THYMOCY All Up- gs_c7_immuno TE_UP Regulated < 1.0E-6 All Up-Regulated GSE24142_EARLY_THYMIC_PROGENITOR_VS_DN3_THYMOCY All Up- gs_c7_immuno TE_UP Regulated < 1.0E-6 All Up-Regulated GSE24142_EARLY_THYMIC_PROGENITOR_VS_DN2_THYMOCY All Up- gs_c7_immuno TE_ADULT_UP Regulated < 1.0E-6 All Up-Regulated GSE24142_EARLY_THYMIC_PROGENITOR_VS_DN2_THYMOCY All Up- gs_c7_immuno TE_FETAL_UP Regulated 0.004 All Up-Regulated GSE24142_EARLY_THYMIC_PROGENITOR_VS_DN3_THYMOCY All Up- gs_c7_immuno TE_FETAL_UP Regulated < 1.0E-6 All Up-Regulated All Up- gs_c7_immuno GSE24142_DN2_VS_DN3_THYMOCYTE_FETAL_UP Regulated 0.013 All Up-Regulated All Up- gs_c7_immuno GSE24142_ADULT_VS_FETAL_EARLY_THYMIC_PROGENITOR_UP Regulated 0.006 -- GSE24634_TREG_VS_TCONV_POST_DAY3_IL4_CONVERSION_D All Up- gs_c7_immuno N Regulated < 1.0E-6 All Up-Regulated All Up- gs_c7_immuno GSE24634_TREG_VS_TCONV_POST_DAY5_IL4_CONVERSION_DN Regulated < 1.0E-6 -- GSE24634_TREG_VS_TCONV_POST_DAY7_IL4_CONVERSION_D All Up- gs_c7_immuno N Regulated < 1.0E-6 All Up-Regulated GSE24634_TREG_VS_TCONV_POST_DAY10_IL4_CONVERSION_D All Up- gs_c7_immuno N Regulated < 1.0E-6 All Up-Regulated All Up- gs_c7_immuno GSE24634_TEFF_VS_TCONV_DAY3_IN_CULTURE_DN Regulated < 1.0E-6 All Up-Regulated All Up- gs_c7_immuno GSE24634_TEFF_VS_TCONV_DAY10_IN_CULTURE_DN Regulated < 1.0E-6 All Up-Regulated All Up- gs_c7_immuno GSE24634_IL4_VS_CTRL_TREATED_NAIVE_CD4_TCELL_DAY5_UP Regulated 0.008 -- GSE24634_IL4_VS_CTRL_TREATED_NAIVE_CD4_TCELL_DAY10_ Mixed Up- gs_c7_immuno DN Regulated < 1.0E-6 All Up-Regulated All Up- gs_c7_immuno GSE25087_TREG_VS_TCONV_FETUS_UP Regulated < 1.0E-6 All Up-Regulated All Up- gs_c7_immuno GSE25087_TREG_VS_TCONV_ADULT_UP Regulated < 1.0E-6 All Up-Regulated All Up- gs_c7_immuno GSE26928_EFF_MEMORY_VS_CXCR5_POS_CD4_TCELL_DN Regulated 0.015 -- Mixed Up- gs_c7_immuno GSE2706_2H_VS_8H_R848_STIM_DC_DN Regulated 0.011 -- All Up- gs_c7_immuno GSE27786_BCELL_VS_NKCELL_DN Regulated 0.032 All Up-Regulated

202 All Up- gs_c7_immuno GSE27786_CD4_TCELL_VS_NKCELL_DN Regulated 0.019 All Up-Regulated All Up- gs_c7_immuno GSE27786_NKCELL_VS_NKTCELL_UP Regulated 0.004 All Up-Regulated All Up- gs_c7_immuno GSE29615_CTRL_VS_DAY3_LAIV_IFLU_VACCINE_PBMC_UP Regulated < 1.0E-6 -- All Up- gs_c7_immuno GSE29615_DAY3_VS_DAY7_LAIV_FLU_VACCINE_PBMC_DN Regulated 0.019 -- All Up- gs_c7_immuno GSE29617_CTRL_VS_DAY7_TIV_FLU_VACCINE_PBMC_2008_UP Regulated < 1.0E-6 All Up-Regulated All Up- gs_c7_immuno GSE29618_BCELL_VS_MONOCYTE_DN Regulated < 1.0E-6 All Up-Regulated All Up- gs_c7_immuno GSE29618_BCELL_VS_PDC_UP Regulated < 1.0E-6 All Up-Regulated All Up- gs_c7_immuno GSE29618_BCELL_VS_MDC_UP Regulated 0.031 -- All Up- gs_c7_immuno GSE29618_BCELL_VS_MDC_DN Regulated < 1.0E-6 All Up-Regulated All Up- gs_c7_immuno GSE29618_MONOCYTE_VS_PDC_UP Regulated < 1.0E-6 All Up-Regulated All Up- gs_c7_immuno GSE29618_MONOCYTE_VS_MDC_UP Regulated < 1.0E-6 All Up-Regulated All Up- gs_c7_immuno GSE29618_PDC_VS_MDC_DN Regulated < 1.0E-6 All Up-Regulated All Up- gs_c7_immuno GSE29618_BCELL_VS_MONOCYTE_DAY7_FLU_VACCINE_DN Regulated < 1.0E-6 All Up-Regulated All Up- gs_c7_immuno GSE29618_BCELL_VS_PDC_DAY7_FLU_VACCINE_UP Regulated < 1.0E-6 -- All Up- gs_c7_immuno GSE29618_BCELL_VS_MDC_DAY7_FLU_VACCINE_UP Regulated 0.006 -- All Up- gs_c7_immuno GSE29618_BCELL_VS_MDC_DAY7_FLU_VACCINE_DN Regulated < 1.0E-6 All Up-Regulated All Up- gs_c7_immuno GSE29618_MONOCYTE_VS_PDC_DAY7_FLU_VACCINE_UP Regulated < 1.0E-6 All Up-Regulated All Up- gs_c7_immuno GSE29618_MONOCYTE_VS_MDC_DAY7_FLU_VACCINE_UP Regulated < 1.0E-6 All Up-Regulated All Up- gs_c7_immuno GSE29618_PDC_VS_MDC_DAY7_FLU_VACCINE_DN Regulated < 1.0E-6 All Up-Regulated All Up- gs_c7_immuno GSE32423_MEMORY_VS_NAIVE_CD8_TCELL_DN Regulated < 1.0E-6 All Up-Regulated All Up- gs_c7_immuno GSE32423_MEMORY_VS_NAIVE_CD8_TCELL_IL7_IL4_DN Regulated 0.008 All Up-Regulated All Up- gs_c7_immuno GSE32423_CTRL_VS_IL7_IL4_MEMORY_CD8_TCELL_UP Regulated 0.032 All Up-Regulated All Up- gs_c7_immuno GSE32423_IL7_VS_IL7_IL4_MEMORY_CD8_TCELL_UP Regulated 0.006 All Up-Regulated All Up- gs_c7_immuno GSE339_EX_VIVO_VS_IN_CULTURE_CD4POS_DC_DN Regulated < 1.0E-6 All Up-Regulated All Up- gs_c7_immuno GSE34205_HEALTHY_VS_RSV_INF_INFANT_PBMC_DN Regulated < 1.0E-6 --

203 GSE36392_TYPE_2_MYELOID_VS_EOSINOPHIL_IL25_TREATED_LU All Up- gs_c7_immuno NG_DN Regulated 0.002 -- GSE36392_TYPE_2_MYELOID_VS_NEUTROPHIL_IL25_TREATED_L All Up- gs_c7_immuno UNG_UP Regulated 0.046 -- GSE36392_EOSINOPHIL_VS_NEUTROPHIL_IL25_TREATED_LUNG_ All Up- gs_c7_immuno UP Regulated 0.008 -- GSE36476_YOUNG_VS_OLD_DONOR_MEMORY_CD4_TCELL_72H_T All Up- gs_c7_immuno SST_ACT_DN Regulated 0.002 -- All Up- gs_c7_immuno GSE37416_CTRL_VS_6H_F_TULARENSIS_LVS_NEUTROPHIL_UP Regulated < 1.0E-6 All Up-Regulated All Up- gs_c7_immuno GSE37416_CTRL_VS_12H_F_TULARENSIS_LVS_NEUTROPHIL_UP Regulated < 1.0E-6 -- All Up- gs_c7_immuno GSE37416_0H_VS_3H_F_TULARENSIS_LVS_NEUTROPHIL_UP Regulated < 1.0E-6 -- All Up- gs_c7_immuno GSE37416_0H_VS_6H_F_TULARENSIS_LVS_NEUTROPHIL_UP Regulated < 1.0E-6 All Up-Regulated All Up- gs_c7_immuno GSE37416_0H_VS_12H_F_TULARENSIS_LVS_NEUTROPHIL_UP Regulated < 1.0E-6 All Up-Regulated All Up- gs_c7_immuno GSE37416_0H_VS_24H_F_TULARENSIS_LVS_NEUTROPHIL_UP Regulated < 1.0E-6 All Up-Regulated All Up- gs_c7_immuno GSE37416_0H_VS_48H_F_TULARENSIS_LVS_NEUTROPHIL_UP Regulated < 1.0E-6 All Up-Regulated All Up- gs_c7_immuno GSE37416_12H_VS_24H_F_TULARENSIS_LVS_NEUTROPHIL_UP Regulated < 1.0E-6 All Up-Regulated All Up- gs_c7_immuno GSE37416_12H_VS_48H_F_TULARENSIS_LVS_NEUTROPHIL_UP Regulated < 1.0E-6 All Up-Regulated All Up- gs_c7_immuno GSE3982_DC_VS_NEUTROPHIL_LPS_STIM_DN Regulated < 1.0E-6 -- All Up- gs_c7_immuno GSE3982_EOSINOPHIL_VS_NEUTROPHIL_DN Regulated < 1.0E-6 -- All Up- gs_c7_immuno GSE3982_EOSINOPHIL_VS_BCELL_UP Regulated < 1.0E-6 -- All Up- gs_c7_immuno GSE3982_EOSINOPHIL_VS_EFF_MEMORY_CD4_TCELL_UP Regulated < 1.0E-6 All Up-Regulated All Up- gs_c7_immuno GSE3982_EOSINOPHIL_VS_CENT_MEMORY_CD4_TCELL_UP Regulated < 1.0E-6 All Up-Regulated All Up- gs_c7_immuno GSE3982_EOSINOPHIL_VS_NKCELL_UP Regulated < 1.0E-6 All Up-Regulated All Up- gs_c7_immuno GSE3982_EOSINOPHIL_VS_TH2_UP Regulated 0.002 -- All Up- gs_c7_immuno GSE3982_MAST_CELL_VS_NEUTROPHIL_DN Regulated < 1.0E-6 -- All Up- gs_c7_immuno GSE3982_MAST_CELL_VS_BCELL_UP Regulated 0.017 All Up-Regulated Mixed Up- gs_c7_immuno GSE3982_MAST_CELL_VS_BASOPHIL_DN Regulated 0.002 -- All Up- gs_c7_immuno GSE3982_MAST_CELL_VS_TH1_UP Regulated 0.017 -- All Up- gs_c7_immuno GSE3982_MAST_CELL_VS_TH2_UP Regulated 0.006 --

204 All Up- gs_c7_immuno GSE3982_DC_VS_NEUTROPHIL_DN Regulated < 1.0E-6 -- All Up- gs_c7_immuno GSE3982_DC_VS_BASOPHIL_DN Regulated 0.027 -- All Up- gs_c7_immuno GSE3982_DC_VS_CENT_MEMORY_CD4_TCELL_UP Regulated 0.010 -- All Up- gs_c7_immuno GSE3982_DC_VS_NKCELL_UP Regulated 0.025 -- All Up- gs_c7_immuno GSE3982_DC_VS_TH2_UP Regulated < 1.0E-6 -- All Up- gs_c7_immuno GSE3982_MAC_VS_NKCELL_UP Regulated 0.021 -- All Up- gs_c7_immuno GSE3982_MAC_VS_TH1_UP Regulated < 1.0E-6 -- All Up- gs_c7_immuno GSE3982_MAC_VS_TH2_UP Regulated < 1.0E-6 All Up-Regulated All Up- gs_c7_immuno GSE3982_NEUTROPHIL_VS_BCELL_UP Regulated < 1.0E-6 -- All Up- gs_c7_immuno GSE3982_NEUTROPHIL_VS_BASOPHIL_UP Regulated 0.002 -- All Up- gs_c7_immuno GSE3982_NEUTROPHIL_VS_EFF_MEMORY_CD4_TCELL_UP Regulated < 1.0E-6 All Up-Regulated All Up- gs_c7_immuno GSE3982_NEUTROPHIL_VS_CENT_MEMORY_CD4_TCELL_UP Regulated < 1.0E-6 All Up-Regulated All Up- gs_c7_immuno GSE3982_NEUTROPHIL_VS_NKCELL_UP Regulated < 1.0E-6 All Up-Regulated All Up- gs_c7_immuno GSE3982_NEUTROPHIL_VS_TH1_UP Regulated < 1.0E-6 -- All Up- gs_c7_immuno GSE3982_NEUTROPHIL_VS_TH2_UP Regulated < 1.0E-6 -- All Up- gs_c7_immuno GSE3982_BCELL_VS_BASOPHIL_DN Regulated 0.004 All Up-Regulated All Up- gs_c7_immuno GSE3982_BASOPHIL_VS_EFF_MEMORY_CD4_TCELL_UP Regulated < 1.0E-6 All Up-Regulated All Up- gs_c7_immuno GSE3982_BASOPHIL_VS_CENT_MEMORY_CD4_TCELL_UP Regulated < 1.0E-6 -- All Up- gs_c7_immuno GSE3982_BASOPHIL_VS_NKCELL_UP Regulated 0.004 All Up-Regulated All Up- gs_c7_immuno GSE3982_BASOPHIL_VS_TH1_UP Regulated < 1.0E-6 All Up-Regulated All Up- gs_c7_immuno GSE3982_BASOPHIL_VS_TH2_UP Regulated < 1.0E-6 All Up-Regulated All Up- gs_c7_immuno GSE39820_CTRL_VS_IL1B_IL6_CD4_TCELL_DN Regulated < 1.0E-6 All Up-Regulated All Up- gs_c7_immuno GSE39820_CTRL_VS_TGFBETA1_IL6_IL23A_CD4_TCELL_UP Regulated 0.006 All Up-Regulated GSE39820_TGFBETA1_IL6_VS_TGFBETA1_IL6_IL23A_TREATED_CD All Up- gs_c7_immuno 4_TCELL_UP Regulated 0.004 -- GSE39820_TGFBETA3_IL6_VS_TGFBETA3_IL6_IL23A_TREATED_ All Up- gs_c7_immuno CD4_TCELL_UP Regulated < 1.0E-6 All Up-Regulated

205 GSE39820_TGFBETA1_VS_TGFBETA3_IN_IL6_TREATED_CD4_TCEL All Up- gs_c7_immuno L_UP Regulated < 1.0E-6 -- All Up- gs_c7_immuno GSE6269_HEALTHY_VS_E_COLI_INF_PBMC_DN Regulated 0.006 -- All Up- gs_c7_immuno GSE6269_HEALTHY_VS_STREP_AUREUS_INF_PBMC_DN Regulated < 1.0E-6 All Up-Regulated All Up- gs_c7_immuno GSE6269_HEALTHY_VS_STREP_PNEUMO_INF_PBMC_DN Regulated < 1.0E-6 All Up-Regulated All Up- gs_c7_immuno GSE6269_E_COLI_VS_STREP_AUREUS_INF_PBMC_DN Regulated < 1.0E-6 -- All Up- gs_c7_immuno GSE6269_E_COLI_VS_STREP_PNEUMO_INF_PBMC_DN Regulated < 1.0E-6 -- All Up- gs_c7_immuno GSE7460_TREG_VS_TCONV_ACT_UP Regulated 0.002 All Up-Regulated All Up- gs_c7_immuno GSE7460_FOXP3_MUT_VS_WT_ACT_WITH_TGFB_TCONV_DN Regulated 0.010 -- All Up- gs_c7_immuno GSE7764_IL15_NK_CELL_24H_VS_SPLENOCYTE_DN Regulated 0.017 All Up-Regulated All Up- gs_c7_immuno GSE9006_HEALTHY_VS_TYPE_1_DIABETES_PBMC_AT_DX_DN Regulated < 1.0E-6 -- All Up- gs_c7_immuno GSE9006_TYPE_1_VS_TYPE_2_DIABETES_PBMC_AT_DX_DN Regulated < 1.0E-6 All Up-Regulated All Up- gs_c7_immuno GSE9037_WT_VS_IRAK4_KO_BMDM_UP Regulated < 1.0E-6 All Up-Regulated All Up- gs_c7_immuno GSE9988_ANTI_TREM1_VS_LPS_MONOCYTE_UP Regulated 0.036 -- All Up- gs_c7_immuno GSE9988_ANTI_TREM1_VS_LPS_MONOCYTE_DN Regulated < 1.0E-6 All Up-Regulated All Up- gs_c7_immuno GSE9988_ANTI_TREM1_VS_LOW_LPS_MONOCYTE_DN Regulated < 1.0E-6 All Up-Regulated All Up- gs_c7_immuno GSE9988_ANTI_TREM1_VS_CTRL_TREATED_MONOCYTES_DN Regulated 0.010 -- All Up- gs_c7_immuno GSE9988_LPS_VS_LPS_AND_ANTI_TREM1_MONOCYTE_UP Regulated < 1.0E-6 All Up-Regulated GSE9988_LOW_LPS_VS_ANTI_TREM1_AND_LPS_MONOCYTE_U All Up- gs_c7_immuno P Regulated 0.010 All Up-Regulated All Up- gs_c7_immuno GSE9988_LOW_LPS_VS_CTRL_TREATED_MONOCYTE_DN Regulated 0.027 -- GSE9988_ANTI_TREM1_AND_LPS_VS_CTRL_TREATED_MONOC All Up- gs_c7_immuno YTES_UP Regulated 0.027 All Up-Regulated GSE9988_ANTI_TREM1_AND_LPS_VS_CTRL_TREATED_MONOCYT All Up- gs_c7_immuno ES_DN Regulated 0.002 -- Non- REACTOME_PYRUVATE_METABOLISM_AND_CITRIC_ACID_TCA_ Directionally gs_c2_curatedsets CYCLE Dysregulated 0.028 -- Non- Directionally gs_c2_curatedsets REACTOME_CITRIC_ACID_CYCLE_TCA_CYCLE Dysregulated 0.018 --

206 Non- Directionally gs_c2_curatedsets SHEPARD_CRUSH_AND_BURN_MUTANT_UP Dysregulated 0.005 -- Non- Directionally gs_c2_curatedsets WELCSH_BRCA1_TARGETS_UP Dysregulated 0.032 All Up-Regulated Non- Directionally gs_c2_curatedsets HSIAO_HOUSEKEEPING_GENES Dysregulated 0.046 All Up-Regulated Non- Directionally gs_c3_regulatory V$PEA3_Q6 Dysregulated 0.002 -- Non- Directionally gs_c3_regulatory V$ZF5_01 Dysregulated 0.009 -- Non- Directionally gs_c7_immuno GOLDRATH_NAIVE_VS_MEMORY_CD8_TCELL_DN Dysregulated 0.004 All Up-Regulated Non- Directionally gs_c7_immuno GSE12845_PRE_GC_VS_DARKZONE_GC_TONSIL_BCELL_DN Dysregulated 0.004 -- Non- Directionally gs_c7_immuno GSE17721_LPS_VS_POLYIC_4H_BMDM_DN Dysregulated 0.027 -- Non- Directionally gs_c7_immuno GSE17721_POLYIC_VS_PAM3CSK4_16H_BMDM_DN Dysregulated < 1.0E-6 -- Non- Directionally gs_c7_immuno GSE17721_POLYIC_VS_PAM3CSK4_24H_BMDM_DN Dysregulated 0.004 -- Non- Directionally gs_c7_immuno GSE17721_POLYIC_VS_GARDIQUIMOD_16H_BMDM_DN Dysregulated < 1.0E-6 -- Non- Directionally gs_c7_immuno GSE17721_LPS_VS_GARDIQUIMOD_12H_BMDM_UP Dysregulated 0.044 -- Non- Directionally gs_c7_immuno GSE17721_0.5H_VS_12H_CPG_BMDM_UP Dysregulated 0.048 -- Non- Directionally gs_c7_immuno GSE17721_0.5H_VS_24H_CPG_BMDM_UP Dysregulated < 1.0E-6 -- Non- Directionally gs_c7_immuno GSE22886_IGM_MEMORY_BCELL_VS_BLOOD_PLASMA_CELL_DN Dysregulated < 1.0E-6 -- Non- Directionally gs_c7_immuno GSE2706_2H_VS_8H_R848_AND_LPS_STIM_DC_UP Dysregulated 0.031 --

207 Non- Directionally gs_c7_immuno GSE27786_CD8_TCELL_VS_ERYTHROBLAST_UP Dysregulated 0.015 -- Non- Directionally gs_c7_immuno GSE360_HIGH_VS_LOW_DOSE_B_MALAYI_MAC_UP Dysregulated 0.032 -- Non- Directionally gs_c7_immuno GSE8384_CTRL_VS_B_ABORTUS_4H_MAC_CELL_LINE_DN Dysregulated 0.008 -- Mixed Down- gs_h_hallmarksets HALLMARK_DNA_REPAIR Regulated 0.045 -- Mixed Down- gs_h_hallmarksets HALLMARK_MYC_TARGETS_V1 Regulated 0.005 All Up-Regulated All Down- gs_h_hallmarksets HALLMARK_MYC_TARGETS_V2 Regulated 0.001 -- All Down- gs_h_hallmarksets HALLMARK_OXIDATIVE_PHOSPHORYLATION Regulated 0.000 -- All Down- gs_c1_chrregions chr19p13 Regulated < 1.0E-6 -- All Down- gs_c1_chrregions chr3p21 Regulated 0.007 -- All Down- gs_c1_chrregions chr9q34 Regulated < 1.0E-6 -- All Down- gs_c1_chrregions chr11q13 Regulated 0.009 -- All Down- gs_c1_chrregions chr16p13 Regulated < 1.0E-6 -- All Down- gs_c2_curatedsets KEGG_OXIDATIVE_PHOSPHORYLATION Regulated < 1.0E-6 -- All Down- gs_c2_curatedsets KEGG_RIBOSOME Regulated < 1.0E-6 -- All Down- gs_c2_curatedsets KEGG_PARKINSONS_DISEASE Regulated < 1.0E-6 -- All Down- gs_c2_curatedsets REACTOME_TRANSLATION Regulated < 1.0E-6 -- REACTOME_TCA_CYCLE_AND_RESPIRATORY_ELECTRON_TRAN All Down- gs_c2_curatedsets SPORT Regulated < 1.0E-6 -- REACTOME_SRP_DEPENDENT_COTRANSLATIONAL_PROTEIN_TA All Down- gs_c2_curatedsets RGETING_TO_MEMBRANE Regulated 0.005 -- All Down- gs_c2_curatedsets REACTOME_PEPTIDE_CHAIN_ELONGATION Regulated < 1.0E-6 -- All Down- gs_c2_curatedsets REACTOME_METABOLISM_OF_PROTEINS Regulated < 1.0E-6 -- All Down- gs_c2_curatedsets REACTOME_3_UTR_MEDIATED_TRANSLATIONAL_REGULATION Regulated < 1.0E-6 -- All Down- gs_c2_curatedsets REACTOME_METABOLISM_OF_MRNA Regulated < 1.0E-6 -- All Down- gs_c2_curatedsets REACTOME_METABOLISM_OF_RNA Regulated < 1.0E-6 --

208 All Down- gs_c2_curatedsets REACTOME_RESPIRATORY_ELECTRON_TRANSPORT Regulated 0.018 -- All Down- gs_c2_curatedsets REACTOME_INFLUENZA_LIFE_CYCLE Regulated < 1.0E-6 -- REACTOME_INFLUENZA_VIRAL_RNA_TRANSCRIPTION_AND_REP All Down- gs_c2_curatedsets LICATION Regulated < 1.0E-6 -- REACTOME_RESPIRATORY_ELECTRON_TRANSPORT_ATP_SYNTH ESIS_BY_CHEMIOSMOTIC_COUPLING_AND_HEAT_PRODUCTION_ All Down- gs_c2_curatedsets BY_UNCOUPLING_PROTEINS_ Regulated < 1.0E-6 -- REACTOME_FORMATION_OF_ATP_BY_CHEMIOSMOTIC_COUPLIN All Down- gs_c2_curatedsets G Regulated < 1.0E-6 -- REACTOME_NONSENSE_MEDIATED_DECAY_ENHANCED_BY_THE All Down- gs_c2_curatedsets _EXON_JUNCTION_COMPLEX Regulated < 1.0E-6 -- All Down- gs_c2_curatedsets GINESTIER_BREAST_CANCER_ZNF217_AMPLIFIED_DN Regulated < 1.0E-6 -- All Down- gs_c2_curatedsets HAHTOLA_SEZARY_SYNDROM_DN Regulated 0.041 -- Mixed Down- gs_c2_curatedsets DACOSTA_UV_RESPONSE_VIA_ERCC3_UP Regulated < 1.0E-6 -- All Down- gs_c2_curatedsets DAIRKEE_TERT_TARGETS_UP Regulated < 1.0E-6 -- Mixed Down- gs_c2_curatedsets SPIELMAN_LYMPHOBLAST_EUROPEAN_VS_ASIAN_UP Regulated < 1.0E-6 -- All Down- gs_c2_curatedsets STARK_PREFRONTAL_CORTEX_22Q11_DELETION_DN Regulated < 1.0E-6 -- All Down- gs_c2_curatedsets NIKOLSKY_BREAST_CANCER_16P13_AMPLICON Regulated < 1.0E-6 -- All Down- gs_c2_curatedsets MOOTHA_VOXPHOS Regulated < 1.0E-6 -- All Down- gs_c2_curatedsets PENG_RAPAMYCIN_RESPONSE_DN Regulated 0.041 -- Mixed Down- gs_c2_curatedsets PENG_GLUTAMINE_DEPRIVATION_DN Regulated < 1.0E-6 -- Mixed Down- gs_c2_curatedsets IVANOVA_HEMATOPOIESIS_EARLY_PROGENITOR Regulated 0.023 -- All Down- gs_c2_curatedsets CUI_TCF21_TARGETS_2_UP Regulated < 1.0E-6 -- All Down- gs_c2_curatedsets WONG_MITOCHONDRIA_GENE_MODULE Regulated < 1.0E-6 -- All Down- gs_c2_curatedsets ACEVEDO_NORMAL_TISSUE_ADJACENT_TO_LIVER_TUMOR_DN Regulated < 1.0E-6 -- All Down- gs_c2_curatedsets CHNG_MULTIPLE_MYELOMA_HYPERPLOID_UP Regulated < 1.0E-6 -- All Down- gs_c2_curatedsets MOOTHA_HUMAN_MITODB_6_2002 Regulated 0.032 -- Mixed Down- gs_c2_curatedsets MOOTHA_PGC Regulated 0.046 -- Mixed Down- gs_c2_curatedsets MOOTHA_MITOCHONDRIA Regulated < 1.0E-6 --

209 All Down- gs_c2_curatedsets YAO_TEMPORAL_RESPONSE_TO_PROGESTERONE_CLUSTER_13 Regulated < 1.0E-6 -- All Down- gs_c2_curatedsets YAO_TEMPORAL_RESPONSE_TO_PROGESTERONE_CLUSTER_17 Regulated 0.014 -- All Down- gs_c2_curatedsets KIM_ALL_DISORDERS_DURATION_CORR_DN Regulated < 1.0E-6 -- All Down- gs_c2_curatedsets LU_EZH2_TARGETS_UP Regulated < 1.0E-6 -- Mixed Down- gs_c2_curatedsets LI_DCP2_BOUND_MRNA Regulated < 1.0E-6 -- Mixed Down- gs_c3_regulatory V$SF1_Q6 Regulated 0.030 -- All Down- gs_c5_GOterms ORGANELLE_INNER_MEMBRANE Regulated 0.011 -- Mixed Down- gs_c5_GOterms MITOCHONDRIAL_PART Regulated 0.046 -- Mixed Down- gs_c5_GOterms RIBONUCLEOPROTEIN_COMPLEX Regulated 0.010 -- All Down- gs_c5_GOterms MITOCHONDRIAL_MEMBRANE Regulated 0.045 -- Mixed Down- gs_c5_GOterms ORGANELLE_MEMBRANE Regulated 0.046 -- All Down- gs_c5_GOterms MITOCHONDRIAL_INNER_MEMBRANE Regulated 0.003 -- All Down- gs_c5_GOterms MITOCHONDRIAL_MEMBRANE_PART Regulated 0.003 -- Mixed Down- gs_c5_GOterms TRANSLATION Regulated 0.006 -- All Down- gs_c5_GOterms STRUCTURAL_CONSTITUENT_OF_RIBOSOME Regulated < 1.0E-6 -- Mixed Down- gs_c6_oncogenic EIF4E_UP Regulated 0.032 -- All Down- gs_c7_immuno GSE10325_CD4_TCELL_VS_MYELOID_UP Regulated < 1.0E-6 -- Mixed Down- gs_c7_immuno GSE10325_LUPUS_CD4_TCELL_VS_LUPUS_BCELL_UP Regulated < 1.0E-6 -- All Down- gs_c7_immuno GSE10325_LUPUS_CD4_TCELL_VS_LUPUS_MYELOID_UP Regulated 0.008 -- All Down- gs_c7_immuno GSE11057_NAIVE_CD4_VS_PBMC_CD4_TCELL_UP Regulated 0.008 -- All Down- gs_c7_immuno GSE11057_CD4_CENT_MEM_VS_PBMC_UP Regulated < 1.0E-6 -- All Down- gs_c7_immuno GSE11057_PBMC_VS_MEM_CD4_TCELL_DN Regulated 0.011 -- Mixed Down- gs_c7_immuno GSE11864_CSF1_VS_CSF1_IFNG_IN_MAC_DN Regulated 0.006 -- Mixed Down- gs_c7_immuno GSE12845_IGD_NEG_BLOOD_VS_NAIVE_TONSIL_BCELL_UP Regulated < 1.0E-6 -- Mixed Down- gs_c7_immuno GSE12845_IGD_NEG_BLOOD_VS_PRE_GC_TONSIL_BCELL_UP Regulated < 1.0E-6 --

210 GSE12845_IGD_NEG_BLOOD_VS_DARKZONE_GC_TONSIL_BCELL_ Mixed Down- gs_c7_immuno UP Regulated 0.013 -- All Down- gs_c7_immuno GSE14000_TRANSLATED_RNA_VS_MRNA_DC_DN Regulated < 1.0E-6 -- Mixed Down- gs_c7_immuno GSE1460_CD4_THYMOCYTE_VS_THYMIC_STROMAL_CELL_UP Regulated 0.006 -- Mixed Down- gs_c7_immuno GSE15930_NAIVE_VS_48H_IN_VITRO_STIM_CD8_TCELL_DN Regulated 0.011 -- Mixed Down- gs_c7_immuno GSE17721_CTRL_VS_LPS_8H_BMDM_UP Regulated 0.011 -- Mixed Down- gs_c7_immuno GSE17721_PAM3CSK4_VS_GADIQUIMOD_8H_BMDM_UP Regulated 0.011 -- All Down- gs_c7_immuno GSE17974_0H_VS_4H_IN_VITRO_ACT_CD4_TCELL_DN Regulated < 1.0E-6 -- Mixed Down- gs_c7_immuno GSE17974_0H_VS_6H_IN_VITRO_ACT_CD4_TCELL_DN Regulated 0.023 -- Mixed Down- gs_c7_immuno GSE17974_0H_VS_12H_IN_VITRO_ACT_CD4_TCELL_DN Regulated < 1.0E-6 -- GSE17974_CTRL_VS_ACT_IL4_AND_ANTI_IL12_48H_CD4_TCELL_D All Down- gs_c7_immuno N Regulated 0.027 -- Mixed Down- gs_c7_immuno GSE19825_NAIVE_VS_DAY3_EFF_CD8_TCELL_DN Regulated 0.029 -- Mixed Down- gs_c7_immuno GSE22886_UNSTIM_VS_STIM_MEMORY_TCELL_DN Regulated 0.025 -- All Down- gs_c7_immuno GSE22886_NAIVE_TCELL_VS_NKCELL_UP Regulated 0.036 -- All Down- gs_c7_immuno GSE22886_TCELL_VS_BCELL_NAIVE_UP Regulated 0.042 -- Mixed Down- gs_c7_immuno GSE22886_CD8_TCELL_VS_BCELL_NAIVE_UP Regulated < 1.0E-6 -- Mixed Down- gs_c7_immuno GSE22886_CD4_TCELL_VS_BCELL_NAIVE_UP Regulated < 1.0E-6 -- Mixed Down- gs_c7_immuno GSE22886_NEUTROPHIL_VS_DC_DN Regulated 0.011 -- All Down- gs_c7_immuno GSE22886_NAIVE_TCELL_VS_NEUTROPHIL_UP Regulated < 1.0E-6 -- All Down- gs_c7_immuno GSE22886_NAIVE_TCELL_VS_DC_UP Regulated 0.006 -- Mixed Down- gs_c7_immuno GSE22886_NAIVE_TCELL_VS_MONOCYTE_UP Regulated 0.011 -- All Down- gs_c7_immuno GSE22886_NAIVE_CD8_TCELL_VS_DC_UP Regulated 0.008 -- All Down- gs_c7_immuno GSE22886_NAIVE_CD8_TCELL_VS_MONOCYTE_UP Regulated 0.002 -- Mixed Down- gs_c7_immuno GSE22886_NAIVE_CD4_TCELL_VS_NEUTROPHIL_UP Regulated 0.010 -- Mixed Down- gs_c7_immuno GSE22886_NAIVE_CD4_TCELL_VS_DC_UP Regulated < 1.0E-6 -- All Down- gs_c7_immuno GSE22886_NAIVE_CD4_TCELL_VS_MONOCYTE_UP Regulated 0.004 --

211 All Down- gs_c7_immuno GSE24634_TREG_VS_TCONV_POST_DAY3_IL4_CONVERSION_UP Regulated < 1.0E-6 -- Mixed Down- gs_c7_immuno GSE27786_CD8_TCELL_VS_NEUTROPHIL_UP Regulated 0.006 -- Mixed Down- gs_c7_immuno GSE29617_CTRL_VS_DAY7_TIV_FLU_VACCINE_PBMC_2008_DN Regulated < 1.0E-6 -- All Down- gs_c7_immuno GSE32423_IL7_VS_IL7_IL4_MEMORY_CD8_TCELL_DN Regulated 0.019 -- All Down- gs_c7_immuno GSE34205_HEALTHY_VS_RSV_INF_INFANT_PBMC_UP Regulated 0.006 -- All Down- gs_c7_immuno GSE360_L_DONOVANI_VS_T_GONDII_MAC_DN Regulated 0.002 -- Mixed Down- gs_c7_immuno GSE360_L_MAJOR_VS_B_MALAYI_HIGH_DOSE_MAC_DN Regulated 0.046 -- GSE36392_EOSINOPHIL_VS_NEUTROPHIL_IL25_TREATED_LUNG_ Mixed Down- gs_c7_immuno DN Regulated 0.021 -- All Down- gs_c7_immuno GSE3982_EOSINOPHIL_VS_BCELL_DN Regulated 0.040 -- All Down- gs_c7_immuno GSE3982_EOSINOPHIL_VS_CENT_MEMORY_CD4_TCELL_DN Regulated 0.025 -- Mixed Down- gs_c7_immuno GSE3982_EOSINOPHIL_VS_TH1_DN Regulated 0.010 -- All Down- gs_c7_immuno GSE3982_MAC_VS_TH2_DN Regulated < 1.0E-6 -- Mixed Down- gs_c7_immuno GSE3982_NEUTROPHIL_VS_NKCELL_DN Regulated 0.004 -- All Down- gs_c7_immuno GSE3982_NEUTROPHIL_VS_TH1_DN Regulated 0.034 -- GSE39820_TGFBETA1_IL6_VS_TGFBETA1_IL6_IL23A_TREATED_CD All Down- gs_c7_immuno 4_TCELL_DN Regulated 0.006 -- All Down- gs_c7_immuno GSE6269_HEALTHY_VS_E_COLI_INF_PBMC_UP Regulated 0.019 -- All Down- gs_c7_immuno GSE6269_HEALTHY_VS_STREP_PNEUMO_INF_PBMC_UP Regulated < 1.0E-6 -- GSE9006_HEALTHY_VS_TYPE_1_DIABETES_PBMC_4MONTH_POST All Down- gs_c7_immuno _DX_DN Regulated 0.006 -- All Down- gs_c7_immuno GSE9006_HEALTHY_VS_TYPE_2_DIABETES_PBMC_AT_DX_UP Regulated < 1.0E-6 -- All Down- gs_c7_immuno GSE9006_TYPE_1_VS_TYPE_2_DIABETES_PBMC_AT_DX_UP Regulated < 1.0E-6 -- Mixed Down- gs_c7_immuno GSE9650_EXHAUSTED_VS_MEMORY_CD8_TCELL_DN Regulated 0.023 -- *Rows are sorted broadly into up-regulated, non-directionally dysregulated, and down-regulated gene sets; secondarily, the original order of annotations was preserved in order to keep related annotations listed proximally to each other. Gene sets in bold denote those also identified as dysregulated within the brain analysis.

212 TABLE 8. PERMUTATION TESTS OF ENRICHMENT FOR EQTLS AMONG DIFFERENTIALLY EXPRESSED GENES DEFINED AT A RELATIVELY CONSERVATIVE THRESHOLD FROM SZ BRAIN AND BLOOD MEGA-ANALYSES.

NCBI GTEx eQTL Results Gene List Number of Genes Frontal Cortex Lymphoblastoid Cells Brain q < 0.1 92 12 (0.85) 3 (0.16) Blood q < 0.1 2238 607 ( < 0.001) 50 (0.007) Abbreviations: eQTL, expression quantitative trait loci; GTEx, Genotype-Tissue Expression Project Values represent the number of overlapping genes with permutated p-values in parentheses eQTL reference data were pruned to a set of unique genes to eliminate possibility of re-counting

213 TABLE 9. DYSREGULATED GENES FROM MEGA-ANALYSES (FDRP < 0.1) OVERLAPPING WITH GENES PROXIMAL TO THE 108 GENOME-WIDE SIGNIFICANT AND LINKAGE DISEQUILIBRIUM-INDEPENDENT LOCI.

Dysregulated Gene List "108 Loci" Genes DPYD, SFXN2, GIGYF2, DGKZ, ACTR5, FANCL, VRK2, L3MBTL2, RANGAP1, STAB1, HIRIP3, TMEM219, PCCB, PJA1, SGSM2, NCAN, PBX4, PLEKHO1, CLCN3, MED19, TMX2, Blood ZDHHC5, ETF1, BCL11B, GRIN2A, PSMD6, DDX28, EDC4, NUTF2, PSMB10, DRG2, SLC38A7, CD46, PTGIS, RRAS, TMCO6 Brain CLU Genes up-regulated in SZ cases are shown in bold.

214

TABLE 10. SZ-ASSOCIATED BLOOD NETWORK MODULES OVER-REPRESENTATION OF IMMUNE CELL-TYPE SIGNATURES. Number of Overlapping Genes (# of Genes in Module Label Immune Cell Types Module) p-values BH p-values steelblue NK-CD56 34 (74) 2.25E-67 4.51E-67 yellow Granulocyte-CD66b 90 (492) 3.07E-37 3.07E-36 grey60 Granulocyte-CD66b 50 (216) 8.27E-26 3.31E-25 darkolivegreen Granulocyte-CD66b 16 (34) 1.45E-14 4.35E-14 yellow B-CD19 16 (492) 1.73E-05 8.64E-05 steelblue NK 1 (74) 0.0042 0.0042 turquoise B-CD19 30 (1850) 0.0021 0.0300 Abbreviations: Benjamini-Hochberg corrected p-values (BH p-values)

215

TABLE 11. OVERLAP OF GENES AMONG SZ-ASSOCIATED CO-EXPRESSION NETWORK MODULES IDENTIFIED IN BRAIN AND BLOOD. Number of Blood Module Blood k Brain Module Brain k Overlapping Genes p-Value BH blue 654 green 595 63 6.16E-14 1.29E-12 cyan 225 green 595 25 1.94E-07 2.04E-06 yellow 492 salmon 174 13 0.001 8.86E-03 grey60 216 yellow 685 15 0.022 1.18E-01 turquoise 1850 yellow 685 87 0.037 1.53E-01 greenyellow 256 yellow 685 16 0.044 1.53E-01 Results from hypergeometric tests shown with a nominal p < 0.05 and Benjamini-Hochberg corrected p- values. Abbreviations: number of genes in module, k; Benjamini-Hochberg corrected p-values, BH.

216 TABLE 12. PERFORMANCE OF MACHINE-LEARNING CLASSIFIERS USING BLOOD-BASED GENE EXPRESSION DATA. Number of Training Matrix (n = Validation Matrix (n = Classifier Genes 413) 185) Supplied (k) 20 0.92 0.72 Random 60 0.95 0.77 Forest 150 0.96 0.77 20 0.90 0.72 Ensemble 60 0.95 0.75 SVM 150 0.99 0.72 Classifier accuracies are reported as area under the curve (AUC) for the receiver operating characteristic plot. Sets of k genes were identified as the top significantly dysregulated genes based on covariate-controlled mixed-effect linear modeling performed in the training matrix.

217

FIGURE 1. EFFECT OF NORMALIZATION ON BRAIN TISSUE GENE EXPRESSION VALUES. A quality-control assessment of distribution of normalized expression values from schizophrenia cases and controls across brain studies with respect to (A) GC-corrected robust multi-array average expression values with quantile normalization and log2 transformation applied to account for within-study variation, and (B) z-scale transformed values. The latter were used to roughly normalize expression values between studies. Blood studies showed similar distribution under these data import and pre-processing steps.

218

FIGURE 2. CROSS-TISSUE COMPARISON OF SIGNIFICANTLY DYSREGULATED GENES (FDR P <0.05). (A) Based on the mega-analyses, we identified the most significantly dysregulated genes in brain (n = 92) and blood (n = 2238) at a relatively conservative threshold (FDR p < .10). A total of 10 genes were common to both lists; 7 genes were coordinately up-regulated and 1 gene was coordinately down-regulated across tissues. The degree of cross-tissue overlap for each of the displayed intersections was non-significant based on hypergeometric test statistics.

219

FIGURE 3. CROSS-TISSUE COMPARISON OF SIGNIFICANTLY DYSREGULATED GENE SETS (BONFERRONI P <0.05). The results of the permutation-based gene-set analyses are shown (Panel A), highlighting gene sets that were significantly dysregulated in both tissues. A total of 745 gene sets were dysregulated (out of 9254 tested) based on single-gene test statistics

220 from the brain mega-analysis. A total of 526 gene sets were dysregulated (out of 9256 tested) based on single-gene test statistics from the blood mega-analysis. For the purpose of cross-tissue comparison, gene sets with either an absolute effect

(i.e., all genes in the target set) or a mixed effect (i.e., a subset of the genes in the target set) were considered to be directionally dysregulated. Among the dysregulated gene sets, 263 were common to both tissues (Bonferroni-corrected hypergeometric p <

1.7x10-158, and among these, 255 showed evidence of up-regulation across tissues (p < 2.8x10-198), while only 4 showed evidence of down-regulation across tissues (p < 4.6x10-4). Among dysregulated gene sets corresponding to the Molecular

Signature Database’s (Broad Institute) Hallmark category, we assessed whether SZ cases showed significant evidence for heterogeneity using a previously developed approach described in the Supplementary Methods. Within the brain data, we observed significant 2-group clustering of SZ cases based on the expression values corresponding to 5 gene sets showing a main-effect of upregulation in SZ (Panel B); cases are shown such that the individuals belonging to the cluster driving the up- regulation effect are depicted in red. These results suggest that different SZ cases contribute to the observed dysregulation in distinct biological pathways. Within the blood data, we observed significant 2-group clustering for a single gene set which showed a main-effect of down-regulation among SZ cases (Panel C).

221

FIGURE 4. COMPARISON OF GENE-LEVEL DIFFERENTIAL EXPRESSION STATISTICS ACROSS BRAIN AND BLOOD. (A) The correlation of differential expression t-statistics was calculated per pathway up-regulated in the brain and blood in schizophrenia cases. (B) Distribution of p-values for Pearson’s r with respect to Panel A. (C) Scatterplot comparing differential expression t-values between brain and blood for 199 genes in the TNF alpha signaling via NFkB pathway. (D)

222 Table showing the mean Pearson’s r reflecting the correlation of differential expression gene statistics between brain and blood across pathways that were commonly up-regulated in these tissues. P-values for correlation tests were combined across pathways once using Fisher’s method (follows chi-square distribution) and separately using the Stouffer-Liptak method to correct for non-independent tests (i.e., genes shared between pathways).

223

FIGURE 5. CO-EXPRESSION MODULES NOMINALLY ASSOCIATED WITH SZ IN BRAIN (P < 0.05). Comparison of module eigengene expression values (unadjusted for covariates) between SZ cases and unaffected comparisons within the “green” and

“salmon” co-expression modules identified by the WGCNA R package (A and D, respectively), which were nominally associated with SZ from linear mixed model (uncorrected p < 0.05). We cross-referenced the set of dysregulated genes

224 identified in the brain mega-analysis (q < .1) with the top 25 genes in each module ranked by intramodular connectivity

(overlaps denoted by asterisk *) (B and E). In panels B and E, the top 5 “hub” genes are found in the innermost circle.

Modules were biologically characterized by testing for enrichment of brain cell-type signatures (C and F).

225

FIGURE 6. A CO-EXPRESSION MODULE THAT WAS SIGNIFICANTLY ASSOCIATED WITH SZ IN BLOOD (FDRP < 0.05).

Comparison of module eigengene expression values (unadjusted for covariates) between SZ cases and unaffected comparisons within the “darkolivegreen” co-expression module identified by the WGCNA R package (A), which was significantly associated with SZ based on linear mixed model test (FDRp < 4.4×10-6). We cross-referenced the set of dysregulated genes identified in the blood mega-analysis (FDRp < .1) with the top 25 genes ranked by intramodular connectivity in this module

(overlaps denoted by asterisk *) (B). In panel B, the top 5 “hub” genes are found in the innermost circle. To biological characterize this module, a pathway-based approach was used to test for significant enrichment biological annotations mapping

226 to “darkolivegreen” genes (C). In panel C, annotations that surpassed a FDRp < 0.05 (cutoff depicted by vertical dotted line) from hypergeometric tests are shown [represented as -log10(P-value)].

227

FIGURE 7. ENRICHMENT OF RISK GENES FROM GWAS IN MODULES THAT ARE DIFFERENTIALLY EXPRESSED IN SZ CASES.

Permutation-based tests were used to calculate nominal p-values of significance for “gene set scores” denoting the cumulative association of modules’ gene sets to SZ. We used GWAS summary statistics (extended MHC region excluded on Chr 6: 25MB

– 35MB) to determine SNPs with the lowest p-value of association to SZ for all genes used to construct brain and blood networks, quantified running-sum statistics of GWAS association after regressing out effects of covariates, and compared these statistics to re-calculated “gene set scores” after permuting gene labels 1 000 times. Nominal p-values were adjusted using

228 Benjamini-Hochberg methods and a significance cutoff of p < 0.05 was set (denoted in Figure by horizontal dotted line). Y- axis: -log10(Benjamini-Hochberg p-values; B-H); x-axis: arbitrarily labels of SZ-associated modules identified in brain and blood co-expression networks (tissues separated by vertical red line).

229

FIGURE 8. PRESERVATION OF GENE CO-EXPRESSION NETWORKS ACROSS TISSUES AND DIAGNOSTIC GROUPS. Weighted gene co-expression network analysis (WGCNA) was used to identify modules in control brain and blood datasets and to test preservation across control and SZ datasets. The numbers of genes that constituted the networks were based on those genes common within each pair-wise comparison, thus different amounts of genes loaded into the reference networks for Panels (A) and (C). Zsummary > 10 was considered as the threshold for declaring a module highly preserved. Strong evidence of preservation (Z summary > 10) was identified in 24 out of 26 modules detected in the brain samples of non-psychotic (NC)

230 comparison subjects relative to SZ cases (88.4%) (Panel A) and 32 out of 32 (100%) of modules blood samples of NC subjects relative to SZ cases (Panel B). Module preservation analysis of only NC subjects determined that 13 out of 30 (43.3%) of modules detected in the brain were strongly preserved in blood (Panel C).

231 BIBLIOGRAPHY

Abbas, A. R., Wolslegel, K., Seshasayee, D., Modrusan, Z., & Clark, H. F. (2009).

Deconvolution of blood microarray data identifies cellular activation patterns in

systemic lupus erythematosus. PloS One, 4(7), e6098.

http://doi.org/10.1371/journal.pone.0006098

Aberg, K., Saetre, P., Lindholm, E., Ekholm, B., Pettersson, U., Adolfsson, R., & Jazin,

E. (2006). Human QKI, a new candidate gene for schizophrenia involved in

myelination. American Journal of Medical Genetics. Part B, Neuropsychiatric

Genetics : The Official Publication of the International Society of Psychiatric

Genetics, 141B(1), 84–90. http://doi.org/10.1002/ajmg.b.30243

Arion, D., Corradi, J. P., Tang, S., Datta, D., Boothe, F., He, a, … Lewis, D. a. (2015).

Distinctive transcriptome alterations of prefrontal pyramidal neurons in

schizophrenia and schizoaffective disorder. Molecular Psychiatry, (October 2014),

1–9. http://doi.org/10.1038/mp.2014.171

Arnedo, J., Svrakic, D. M., Del Val, C., Romero-Zaliz, R., Hernández-Cuervo, H.,

Fanous, A. H., … Zwir, I. (2014). Uncovering the Hidden Risk Architecture of the

Schizophrenias: Confirmation in Three Independent Genome-Wide Association

Studies. The American Journal of Psychiatry, 172(2), 139–53.

http://doi.org/10.1176/appi.ajp.2014.14040435

Bergon, A., Belzeaux, R., Comte, M., Pelletier, F., Hervé, M., Gardiner, E. J., …

Ibrahim, E. C. (2015). CX3CR1 is dysregulated in blood and brain from

schizophrenia patients. Schizophrenia Research.

232

http://doi.org/10.1016/j.schres.2015.08.010

Cotter, D., Mackay, D., Chana, G., Beasley, C., Landau, S., & Everall, I. P. (2002).

Reduced neuronal size and glial cell density in area 9 of the dorsolateral prefrontal

cortex in subjects with major depressive disorder. Cerebral Cortex (New York, N.Y. :

1991), 12(4), 386–94.

Dean, B. (2011). Understanding the role of inflammatory-related pathways in the

pathophysiology and treatment of psychiatric disorders: evidence from human

peripheral studies and CNS studies. The International Journal of

Neuropsychopharmacology / Official Scientific Journal of the Collegium

Internationale Neuropsychopharmacologicum (CINP), 14(7), 997–1012.

http://doi.org/10.1017/S1461145710001410

Devor, E. J., & Waziri, R. (1993). A familial/genetic study of plasma serine and glycine

concentrations. Biological Psychiatry, 34(4), 221–5. Retrieved from

http://www.ncbi.nlm.nih.gov/pubmed/8399818

Dougherty, J. D., Schmidt, E. F., Nakajima, M., & Heintz, N. (2010). Analytical

approaches to RNA profiling data for the identification of genes enriched in specific

cells. Nucleic Acids Research, 38(13), 4218–30. http://doi.org/10.1093/nar/gkq130

Duncan, L. E., Holmans, P. A., Lee, P. H., O’Dushlaine, C. T., Kirby, A. W., Smoller, J.

W., … Cohen, B. M. (2014). Pathway analyses implicate glial cells in

schizophrenia. PloS One, 9(2), e89441. http://doi.org/10.1371/journal.pone.0089441

Durinck, S., Spellman, P. T., Birney, E., & Huber, W. (2009). Mapping identifiers for the

integration of genomic datasets with the R/Bioconductor package biomaRt. Nature

Protocols, 4(8), 1184–91. http://doi.org/10.1038/nprot.2009.97

233

Eaton, W. W., Byrne, M., Ewald, H., Mors, O., Chen, C.-Y., Agerbo, E., & Mortensen, P.

B. (2006). Association of schizophrenia and autoimmune diseases: linkage of

Danish national registers. The American Journal of Psychiatry, 163(3), 521–8.

http://doi.org/10.1176/appi.ajp.163.3.521

Ellman, L. M., Deicken, R. F., Vinogradov, S., Kremen, W. S., Poole, J. H., Kern, D. M.,

… Brown, A. S. (2010). Structural brain alterations in schizophrenia following fetal

exposure to the inflammatory cytokine interleukin-8. Schizophrenia Research,

121(1–3), 46–54. http://doi.org/10.1016/j.schres.2010.05.014

Fillman, S. G., Cloonan, N., Catts, V. S., Miller, L. C., Wong, J., McCrossin, T., …

Weickert, C. S. (2013). Increased inflammatory markers identified in the

dorsolateral prefrontal cortex of individuals with schizophrenia. Molecular

Psychiatry, 18(2), 206–14. http://doi.org/10.1038/mp.2012.110

Fineberg, A. M., & Ellman, L. M. (2013). Inflammatory cytokines and neurological and

neurocognitive alterations in the course of schizophrenia. Biological Psychiatry,

73(10), 951–66. http://doi.org/10.1016/j.biopsych.2013.01.001

Fung, S. J., Joshi, D., Fillman, S. G., & Weickert, C. S. (2014). High white matter neuron

density with elevated cortical cytokine expression in schizophrenia. Biological

Psychiatry, 75(4), e5-7. http://doi.org/10.1016/j.biopsych.2013.05.031

Gautier, L., Cope, L., Bolstad, B. M., & Irizarry, R. A. (2004). affy--analysis of

Affymetrix GeneChip data at the probe level. Bioinformatics (Oxford, England),

20(3), 307–15. http://doi.org/10.1093/bioinformatics/btg405

Glatt, S. J., Tsuang, M. T., Winn, M., Chandler, S. D., Collins, M., Lopez, L., …

Courchesne, E. (2012). Blood-based gene expression signatures of infants and

234

toddlers with autism. Journal of the American Academy of Child and Adolescent

Psychiatry, 51(9), 934–44.e2. http://doi.org/10.1016/j.jaac.2012.07.007

Glatt, S. J., Tylee, D. S., Chandler, S. D., Pazol, J., Nievergelt, C. M., Woelk, C. H., …

Tsuang, M. T. (2013). Blood-based gene-expression predictors of PTSD risk and

resilience among deployed marines: a pilot study. American Journal of Medical

Genetics. Part B, Neuropsychiatric Genetics : The Official Publication of the

International Society of Psychiatric Genetics, 162B(4), 313–26.

http://doi.org/10.1002/ajmg.b.32167

Hannon, E., Lunnon, K., Schalkwyk, L., & Mill, J. (2015). Interindividual methylomic

variation across blood, cortex, and cerebellum: Implications for epigenetic studies of

neurological and neuropsychiatric phenotypes. Epigenetics, 10(11), 1024–1032.

http://doi.org/10.1080/15592294.2015.1100786

Huang, D. W., Sherman, B. T., & Lempicki, R. A. (2009). Systematic and integrative

analysis of large gene lists using DAVID bioinformatics resources. Nature

Protocols, 4(1), 44–57. http://doi.org/10.1038/nprot.2008.211

Irizarry, R. A., Hobbs, B., Collin, F., Beazer-Barclay, Y. D., Antonellis, K. J., Scherf, U.,

& Speed, T. P. (2003). Exploration, normalization, and summaries of high density

oligonucleotide array probe level data. Biostatistics (Oxford, England), 4(2), 249–

264. http://doi.org/10.1093/biostatistics/4.2.249

Kumarasinghe, N., Beveridge, N. J., Gardiner, E., Scott, R. J., Yasawardene, S., Perera,

A., … Tooney, P. a. (2013). Gene expression profiling in treatment-naive

schizophrenia patients identifies abnormalities in biological pathways involving

AKT1 that are corrected by antipsychotic medication. The International Journal of

235

Neuropsychopharmacology / Official Scientific Journal of the Collegium

Internationale Neuropsychopharmacologicum (CINP), 16(February), 1483–1503.

http://doi.org/10.1017/S1461145713000035

Langfelder, P., Luo, R., Oldham, M. C., & Horvath, S. (2011). Is my network module

preserved and reproducible? PLoS Computational Biology, 7(1), e1001057.

http://doi.org/10.1371/journal.pcbi.1001057

Leung, Y. Y., Kuksa, P. P., Amlie-Wolf, A., Valladares, O., Ungar, L. H., Kannan, S., …

Wang, L. S. (2016). DASHR: Database of Small human noncoding RNAs. Nucleic

Acids Research, 44(D1), D216–D222. http://doi.org/10.1093/nar/gkv1188

Lottaz, C., Toedling, J., & Spang, R. (2007). Annotation-based distance measures for

patient subgroup discovery in clinical microarray studies. Bioinformatics, 23(17),

2256–2264. http://doi.org/10.1093/bioinformatics/btm322

Maino, K., Gruber, R., Riedel, M., Seitz, N., Schwarz, M., & Müller, N. (2007). T- and

B-lymphocytes in patients with schizophrenia in acute psychotic episode and the

course of the treatment. Psychiatry Research, 152(2–3), 173–80.

http://doi.org/10.1016/j.psychres.2006.06.004

Mamdani, F., Berlim, M. T., Beaulieu, M.-M., Labbe, A., Merette, C., & Turecki, G.

(2011). Gene expression biomarkers of response to citalopram treatment in major

depressive disorder. Translational Psychiatry, 1, e13.

http://doi.org/10.1038/tp.2011.12

Middleton, F. A., Mirnics, K., Pierri, J. N., Lewis, D. A., & Levitt, P. (2002). Gene

expression profiling reveals alterations of specific metabolic pathways in

schizophrenia. The Journal of Neuroscience : The Official Journal of the Society for

236

Neuroscience, 22(7), 2718–29. http://doi.org/20026209

Miller, B. J., Buckley, P., Seabolt, W., Mellor, A., & Kirkpatrick, B. (2011). Meta-

analysis of cytokine alterations in schizophrenia: clinical status and antipsychotic

effects. Biological Psychiatry, 70(7), 663–71.

http://doi.org/10.1016/j.biopsych.2011.04.013

Mirnics, K., Levitt, P., & Lewis, D. A. (2006). Critical appraisal of DNA microarrays in

psychiatric genomics. Biological Psychiatry, 60(2), 163–76.

http://doi.org/10.1016/j.biopsych.2006.02.003

Mistry, M., Gillis, J., & Pavlidis, P. (2013). Genome-wide expression profiling of

schizophrenia using a large combined cohort. Molecular Psychiatry, 18(2), 215–25.

http://doi.org/10.1038/mp.2011.172

Mistry, M., Gillis, J., & Pavlidis, P. (2013). Meta-analysis of gene coexpression networks

in the post-mortem prefrontal cortex of patients with schizophrenia and unaffected

controls. BMC Neuroscience, 14, 105. http://doi.org/10.1186/1471-2202-14-105

Mistry, M., & Pavlidis, P. (2010). A cross-laboratory comparison of expression profiling

data from normal human postmortem brain. Neuroscience, 167(2), 384–95.

http://doi.org/10.1016/j.neuroscience.2010.01.016

Müller, N., & Schwarz, M. J. (2010). Immune System and Schizophrenia. Current

Immunology Reviews, 6(3), 213–220.

Pérez-Santiago, J., Diez-Alarcia, R., Callado, L. F., Zhang, J. X., Chana, G., White, C.

H., … Woelk, C. H. (2012). A combined analysis of microarray gene expression

studies of the human prefrontal cortex identifies genes implicated in schizophrenia.

Journal of Psychiatric Research, 46(11), 1464–74.

237

http://doi.org/10.1016/j.jpsychires.2012.08.005

Potvin, S., Stip, E., Sepehry, A. A., Gendron, A., Bah, R., & Kouassi, E. (2008).

Inflammatory cytokine alterations in schizophrenia: a systematic quantitative

review. Biological Psychiatry, 63(8), 801–8.

http://doi.org/10.1016/j.biopsych.2007.09.024

Rao, J. S., Kim, H.-W., Harry, G. J., Rapoport, S. I., & Reese, E. A. (2013). Increased

neuroinflammatory and arachidonic acid cascade markers, and reduced synaptic

proteins, in the postmortem frontal cortex from schizophrenia patients.

Schizophrenia Research, 147(1), 24–31. http://doi.org/10.1016/j.schres.2013.02.017

Ripke, S., Neale, B. M., Corvin, A., Walters, J. T. R., Farh, K.-H., Holmans, P. A., …

O’Donovan, M. C. (2014). Biological insights from 108 schizophrenia-associated

genetic loci. Nature, 511(7510), 421–7. http://doi.org/10.1038/nature13595

Roussos, P., Katsel, P., Davis, K. L., Giakoumaki, S. G., Lencz, T., Malhotra, A. K., …

Haroutunian, V. (2013). Convergent findings for abnormalities of the NF-κB

signaling pathway in schizophrenia. Neuropsychopharmacology : Official

Publication of the American College of Neuropsychopharmacology, 38(3), 533–9.

http://doi.org/10.1038/npp.2012.215

Sanders, A. R., Göring, H. H. H., Duan, J., Drigalenko, E. I., Moy, W., Freda, J., …

Gejman, P. V. (2013). Transcriptome study of differential expression in

schizophrenia. Human Molecular Genetics, 22(24), 5001–14.

http://doi.org/10.1093/hmg/ddt350

Schwieler, L., Larsson, M. K., Skogh, E., Kegel, M. E., Orhan, F., Abdelmoaty, S., …

Engberg, G. (2015). Increased levels of IL-6 in the cerebrospinal fluid of patients

238

with chronic schizophrenia - significance for activation of the kynurenine pathway.

Journal of Psychiatry & Neuroscience : JPN, 40(2), 126–33. Retrieved from

http://www.ncbi.nlm.nih.gov/pubmed/25455350

Seifuddin, F., Pirooznia, M., Judy, J. T., Goes, F. S., Potash, J. B., & Zandi, P. P. (2013).

Systematic review of genome-wide gene expression studies of bipolar disorder.

BMC Psychiatry, 13, 213. http://doi.org/10.1186/1471-244X-13-213

Sekar, A., Bialas, A. R., de Rivera, H., Davis, A., Hammond, T. R., Kamitaki, N., …

McCarroll, S. A. (2016). Schizophrenia risk from complex variation of complement

component 4. Nature, advance on. http://doi.org/10.1038/nature16549

Steiner, J., Jacobs, R., Panteli, B., Brauner, M., Schiltz, K., Bahn, S., … Bogerts, B.

(2010). Acute schizophrenia is accompanied by reduced T cell and increased B cell

immunity. European Archives of Psychiatry and Clinical Neuroscience, 260(7),

509–18. http://doi.org/10.1007/s00406-010-0098-x

Storey, J. D. (2003). The positive false discovery rate: a Bayesian interpretation and the

q-value. The Annals of Statistics, 31(6), 2013–2035.

Sugino, H., Futamura, T., Mitsumoto, Y., Maeda, K., & Marunaka, Y. (2009). Atypical

antipsychotics suppress production of proinflammatory cytokines and up-regulate

interleukin-10 in lipopolysaccharide-treated mice. Progress in Neuro-

Psychopharmacology & Biological Psychiatry, 33(2), 303–7.

http://doi.org/10.1016/j.pnpbp.2008.12.006

Tsuang, M. T., & Faraone, S. V. (1995). The case for heterogeneity in the etiology of

schizophrenia. Schizophrenia Research, 17(2), 161–75. Retrieved from

http://www.ncbi.nlm.nih.gov/pubmed/8562491

239

Tsuang, M. T., Nossova, N., Yager, T., Tsuang, M.-M., Guo, S.-C., Shyu, K. G., …

Liew, C. C. (2005). Assessing the validity of blood-based gene expression profiles

for the classification of schizophrenia and bipolar disorder: a preliminary report.

American Journal of Medical Genetics. Part B, Neuropsychiatric Genetics : The

Official Publication of the International Society of Psychiatric Genetics, 133B, 1–5.

http://doi.org/10.1002/ajmg.b.30161

Tylee, D. S., Chandler, S. D., Nievergelt, C. M., Liu, X., Pazol, J., Woelk, C. H., …

Tsuang, M. T. (2015). Blood-based gene-expression biomarkers of post-traumatic

stress disorder among deployed marines: A pilot study. Psychoneuroendocrinology,

51, 472–94. http://doi.org/10.1016/j.psyneuen.2014.09.024

Tylee, D. S., Kawaguchi, D. M., & Glatt, S. J. (2013). On the outside, looking in: a

review and evaluation of the comparability of blood and brain “-omes”. American

Journal of Medical Genetics. Part B, Neuropsychiatric Genetics : The Official

Publication of the International Society of Psychiatric Genetics, 162B(7), 595–603.

http://doi.org/10.1002/ajmg.b.32150

Vaquerizas, J. M., Kummerfeld, S. K., Teichmann, S. A., & Luscombe, N. M. (2009). A

census of human transcription factors: function, expression and evolution. Nature

Reviews. Genetics, 10(4), 252–263. http://doi.org/10.1038/nrg2538

Väremo, L., Nielsen, J., & Nookaew, I. (2013). Enriching the gene set analysis of

genome-wide data by incorporating directionality of gene expression and combining

statistical hypotheses and methods. Nucleic Acids Research, 41(8), 4378–91.

http://doi.org/10.1093/nar/gkt111

Venkatasubramanian, G., & Debnath, M. (2013). The TRIPS (Toll-like receptors in

240

immuno-inflammatory pathogenesis) Hypothesis: a novel postulate to understand

schizophrenia. Progress in Neuro-Psychopharmacology and Biological Psychiatry,

44, 301–311. http://doi.org/10.1016/j.pnpbp.2013.04.001

Watkins, N. A., Gusnanto, A., de Bono, B., De, S., Miranda-Saavedra, D., Hardie, D. L.,

… Ouwehand, W. H. (2009). A HaemAtlas: characterizing gene expression in

differentiated human blood cells. Blood, 113(19), e1-9. http://doi.org/10.1182/blood-

2008-06-162958

241

TRANSCRIPTOMIC ABNORMALITIES IN BIPOLAR DISORDER AND DISCRIMINATION OF THE MAJOR PSYCHOSES

Authors: Jonathan L. Hess and Stephen J. Glatt

ABSTRACT

Bipolar disorder (BD) is a complex and highly heritable psychiatric disorder with a world-wide prevalence near 1%. There is converging evidence from genetic, molecular, and clinical studies of BD linking numerous neurotransmitter and immune signaling abnormalities with the disorder. These abnormalities have been described previously in schizophrenia (SZ), which has close genetic and molecular ties to BD. The purpose of this investigation was to uncover generalizable biomarkers for BD using a cross-study analysis of six microarray data sets comprising peripheral blood transcriptomes for patients with a DSM-diagnosis of BD (n = 77) and unaffected comparison subjects (n =

81); furthermore, we sought to compare transcriptomic signatures between BD and SZ

(obtained from prior studies). We performed a combined-subject differential expression mega-analysis with regression models applied per gene, each adjusted for clinical factors and study heterogeneity. In addition, we surveyed gene sets and gene networks for convergent differential expression in BD. Three genes were associated with BD at a level of significance that withstood rigorous statistical correction. In addition, differential expression of gene sets in BD was found; functions of these gene sets included innate immunity, energy production in mitochondrial complexes, and metabolism of RNA and proteins, among other biological pathways previously related to BD or psychosis. Lastly, we deployed a machine-learning pipeline, which discriminated BD patients from unaffected comparison subjects and SZ patients with moderately high accuracy. This

242 study provides strong evidence in support of candidate biomarkers for BD and introduces potentially generalizable blood-based classifiers for the major psychoses.

INTRODUCTION

Bipolar disorder (BD) is a highly heritable mental illness characterized by extreme shifts in mood (e.g., mania and depression) that range from mild to severe, abnormal behavior and sleep habits, and cognitive impairments (McGuffin et al., 2003;

Harvey et al., 2009; Latalova et al., 2011). A neuro-maturation hypothesis of BD has been described in the literature based on genetic and molecular evidence pointing to genes that are involved in the regulation of neurodevelopment and their association with the disorder (Madison et al., 2015; Sklar et al., 2011). Systematic comparisons have been made between BD, and schizophrenia (SZ), both of which share numerous symptoms

(i.e. psychosis) and pathological abnormalities, to determine the relative contribution of genes and environment to shared illness susceptibility between these disorders (jointly called “major psychoses”). In brief, evidence has shown that BD and SZ share a common genetic background, but a larger number of SZ-specific risk genes have been found to play a more important role earlier in brain development (Murray et al., 2004). Ultimately, there is not a clear understanding as to the molecular mechanisms that give rise to BD.

To address this issue, it has been a common practice to assay RNA from brain tissue and compare molecular profiles between large groups of BD cases and unaffected subjects. Several studies have reported correlations between BD diagnosis and mRNA expression in postmortem tissue, which is an important step forward in describing molecular markers of BD (Seifuddin et al., 2013a, 2012; Ryan et al., 2006; Chen et al.,

243

2013b; Elashoff et al., 2007). In practice, it is extremely difficult to separate associations from postmortem brain transcriptomic studies into sets of markers directly related to the illness and markers of some cryptic, unaccounted confounding factors.

Gradually, interest has shifted from studies of postmortem brain to peripheral blood as a potential source of biological markers, and this is true across a variety of mental illnesses and neurodegenerative disorders. Many of the same mRNAs and proteins expressed in brain are also expressed in blood (Tylee et al., 2013). Blood may be as important a resource relative to brain tissue for identifying genes and biological pathways associated with BD. There is a mixture of evidence spread over nine microarray studies of BD, suggesting that fluctuations in sample sizes, analytical strategies, and unique characteristics of each cohort lead to differences in findings. From these studies, dysregulated mRNAs were exhibited among genes involved in apoptosis and cell survival, inflammation, cell proliferation, differentiation, and protein turnover (Beech et al., 2014; Matigian et al., 2007; Middleton et al., 2005; Beech et al., 2010; Bousman et al., 2010; Clelland et al., 2013; Padmos et al., 2008; Tsuang et al., 2005; Savitz et al.,

2013).

The question of what genes and biological pathways are associated with BD motivated our present study, which used a combined collection of peripheral blood gene- expression data derived from six previous microarray studies of BD cases and unrelated, unaffected comparison subjects. We provide an in-depth portrait of BD’s transcriptome in the blood and assess its similarity to SZ’s transcriptome. This study expands on the work of large-scale genomics studies of these disorders, provides best-estimates of gene and

244 gene-set expression differences in BD, and lays the groundwork for future mechanistic studies into the biological substrates that distinguish BD from SZ.

METHODS AND MATERIALS

STUDY DESIGN

We sought to provide robust estimates for differential gene and gene-network expression in BD using a combined-sample mega-analysis approach based on linear mixed models. We applied careful adjustments to gene expression data to control for confounding factors that could potentially drive differences in blood gene expression levels between groups. We repeated this analysis twice to handle confounding in separate ways; in the first effort, we specified linear models with known clinical covariates as predictors and measured the effect of diagnosis on gene expression levels, and in the second effort, we adjusted gene expression intensities using empirically-defined surrogate factors, then evaluate the effect of diagnosis on gene intensities. Results at the levels of genes and gene-sets were compared across these approaches to qualitatively assess the statistical power of each analysis and prioritize findings that were robust to choice of statistical corrections. Furthermore, we approached a separate question of the degree to which BD and SZ are similar with respect to transcriptomic associations by comparing the present results to our prior blood-based mega-analysis of SZ cases and unaffected comparison subjects, also run on microarray platforms. Lastly, we demonstrated the performance of several machine learning algorithms tasked with distinguishing BD cases

245 from SZ cases and unaffected comparison subjects based on blood-based gene expression intensities alone.

DESCRIPTION OF INCLUDED MICROARRAY STUDIES, DATA IMPORT, AND QUALITY CONTROL

A literature search performed using PubMed and Scopus for articles containing the keywords “bipolar disorder”, “microarray”, “gene expression”, and “blood” coupled with a query of Gene Expression Omnibus (GEO). We excluded studies that did not meet the following criteria: (1) recruited affected cases with a confirmed diagnosis of bipolar disorder according to DSM-IV or DSM-IV-TR, (2) ascertained affected cases and unrelated, unaffected comparison subjects, and (3) provided raw microarray data from whole blood or circulating immune cell isolates. A total of nine studies were initially found, six of which were retained. Table 1 outlines the demographics for the included studies. Transcriptome-wide data and clinical covariates for a total of 77 cases affected with bipolar disorder and 81 unaffected comparison subjects were acquired for this analysis. Gene expression had been measured on either Affymetrix (k = 4) or Illumina (k

= 2) microarrays.

We uniformly pre-processed the raw microarray data using a pipeline that we previously published (Hess et al., 2016). In brief, probe-level Affymetrix data were subject to GC-corrected robust multi-array averaging (RMA), quantile normalized, and log2 transformation. Pre-processing steps for data derived from Illumina platforms were restricted to quantile normalization and log2 transformation as these data were received with background corrections applied by Illumina GenomeStudio. Probe-level data were summarized to whole-gene expression values by assigning the median intensity of probes

246 to a gene. This allowed us to combine gene expression values across the whole transcriptome between Affymetrix and Illumina platforms.

Prior to combining studies, we scaled the expression of each gene to a mean of 0 and unit variance. This scaling procedure helped to normalize data to a single distribution that can be combined across studies and analyzed as a single batch. The first two principal component were extracted from the gene expression matrix to identify potential outliers on a per-study basis (>4 standard deviations from ellipsoid with respect to first and second components). Genes that were missing in more than three studies were discarded. Individual-level gene expression distributions were visually inspected to verify that all samples were approximately within and between studies after normalization

(Figure 1). Genes were then normalized to mean 0 and variance of 1 across all samples to minimize between-study variation (Figure 2). The final step of quality control was comparing predicted levels of circulating immune cell types using a data-driven algorithm distributed as an R package called CellMix (Gaujoux and Seoighe, 2013). The algorithm is referred to as “gene expression deconvolution” and essentially compares expression levels of cell-type specific markers from a reference data set with known cell type quantities to gene expression profiles from a target data set to predict cell abundances. The rationale for this approach was to identify potential clinical covariates to include in differential expression analyses, should groups differ strongly in a subset of immune cell types. Predicted quantities for 11 circulating immune cell types under different activation states were compared between BD cases and unaffected comparison subjects using Welch’s t-test (Table 2).

247

LINEAR MIXED MODELS USING CLINICAL COVARIATES (STAGE 1)

In Stage 1, normalized expression levels of 20,540 genes were compared across

BD cases and unaffected comparison subjects using a linear mixed effect model. A full description of linear mixed models is available at (Bates et al., 2014). The main factor of interest (i.e., diagnosis, coded as “BD” or “CT”) was set as a primary factor of interest in linear model with gene-level expression set at the outcome variable. Differential expression measurements across groups were calculated while correcting for the effects of technical and biological variation through specification of known clinical and technical covariates. The covariates that we conditioned each regression on included: age, gender, medication status, and circulating immune cell types collected for RNA extraction. A unique identifier for each study was set as a random effect predictor to correcting for between-study variation. After specifying linear mixed models, a step-wise backward elimination followed to achieve a best-estimate of the effect of diagnosis on each gene’s expression intensity. This elimination procedure iteratively re-specified models using progressively fewer fixed effect predictors and assessed their statistical significance.

Clinical covariates with a significance at p > 0.1 were declared non-significant and discarded, with the exception of the random effect variable being retained. Due to variations in array design and clinical measurements collected across studies, missing data points were present for a subset of samples in the combined mega-analysis data set.

This constrained the number of genes that could be successfully modeled using our statistical design that controlled for clinical covariates to 14,942 genes. We checked our top BD-associated against three additional methods and looked for genes that were consistently associated despite methodological approach to pooling results. Two

248 additional linear mixed models were performed after correcting for hidden confounding using surrogate variable analysis (SVA), which is embodied in the R package sva. The first run of sva was performed iteratively to detect hidden confounding one study at a time (Stage 2A), yielding 20,534 genes for analysis. The second run of sva was performed on the combined-subject data matrix simultaneously, which reduced the matrix down to 5,331 after removal of missing data points (Stage 2B). Lastly, a fixed- effect inverse variances meta-analysis was performed. Results from all statistical tests were visualized using forest plots for the top BD-associated genes.

SYSTEMATIC COMPARISON OF DIFFERENTIAL EXPRESSION STATISTICS ACROSS REGRESSION APPROACHES The motivation for repeated mega-analyses using different approaches for addressing confounding variation in the data was to identify a subset of differentially expressed genes in BD that are robust to choice of data normalization procedure. In addition, we sought to compare beta coefficients and standard errors generated with linear mixed models from mega-analyses with meta-analysis of linear regressions across each study to describe the conservativeness of each model among the most differentially expressed genes. The package metafor in the R statistical environment was used to calculate an inverse-variance fixed-effect meta-analysis of linear regressions across studies for each of the top k genes associated with BD from Stage 1 mega-analysis (FDR adjusted p < 0.1). Forest plots were generated depicting the beta coefficients and standard errors generated for: (1) linear regression analyses per study (for linear regressions that could be successfully modeled), (2) the pooled beta estimate from meta-analysis, (2)

Stage 1 mega-analysis, and (3) Stage 2A and Stage 2B mega-analyses (for available

249 genes). The degree of similarity between cross-study regression models (meta- and mega-analyses) was determined by calculating Pearson’s correlation coefficient between beta coefficients for the top k genes.

PERMUTATION-BASED GENE SET ENRICHMENT

Biological knowledge, in the form of annotated gene sets, was incorporated into a permutation-based gene set analysis to identify concordantly dysregulated pathways in

BD. These genes sets were curated from Gene Ontology, Kyoto Encyclopedia of Genes and Genomes, Reactome, PubMed, cytogenetic maps, and experimentally-derived pathways from microarray studies, and compiled in a single web-server hosted by the

Molecular Signatures Database (version 5). A permutation-based enrichment analysis was carried out using the piano package in R. In brief, this method inspects pathways for consensus differential expression patterns by combining gene-level summary statistics (t- values) from linear mixed models into pathway-level scores (mean t-score), then comparing observed scores to thousands of bootstrapped sets of genes to determine statistical significance. Four separate statistical tests were carried out on each pathway according to direction of differential expression, and one additional test of non- directional (absolute t-values used) enrichment of pathway-level associations with BD.

The four directional tests applied to pathways included: (1) genes concordantly up- regulated in BD, (2) genes concordantly down-regulated in bipolar disorder, (3) a net up- regulated pathway containing mixture of up- and down-regulated genes in BD, and (4) a net down-regulated pathway containing a mixture of up- and down-regulated genes in bipolar disorder. A total of 10,000 rounds of gene-label permutation provided an

250 empirical distribution of pathway-level statistics from which statistical significance had been determined for observed pathway-level scores. Empirical p-values were calculated by counting the number of permuted pathways that yielded association scores as extreme as the observed scores, and dividing the sum by 10,000. These empirical p-values were adjusted using the Benjamini-Hochberg procedure to control for multiple testing.

GENE CO-EXPRESSION NETWORKS AND PRESERVATION ACROSS BD CASES AND UNAFFECTED COMPARISONS

We sought to test for differentially expression of gene networks in BD patients relative to unaffected comparison subjects as a complementary approach to our gene- level mega-analyses. The strategy we used was an unsupervised hierarchical clustering of genes based on correlation patterns followed by gene cluster expression level derivation using singular value decomposition (SVD). The R package Weighted Gene Co-expression

Network Analysis or WGCNA was used for this analysis. The one-step function blockwiseModules was employed to carry out network construction, which was tuned using a set of conventional parameters for generating an un-signed network, wherein magnitude of a pair-wise correlation defines co-expression agnostic to sign of the correlation. The parameters defined for network derivation were: power = 6, deepSplit =

2, minModuleSize = 30, minCoreKME = 0.5, minCoreKMESize = 10, mergeCutHeight =

0.25, and detectCutHeight = 0.995.

After identifying highly co-expressed gene clusters, which are called “modules” and arbitrarily named after colors, SVD was automatically performed on the gene expression intensities of each module. The first principal component from the SVD solution captures the gene expression profile of each module, which is referred to as the

251

“eigengene”. Differential expression of gene modules across BD patients and unaffected comparison subjects was tested using eigengenes as the response variable in a linear mixed model. Age, gender, medication status, peripheral blood cell types used for RNA extraction, and study ID (random effect) were used as covariates in the linear mixed equation. We applied the the Benjamini-Hochberg procedure to adjust for testing multiple modules.

In a separate analysis, we constructed gene networks derived on unaffected comparisons (reference network) and BD patients (test network) using the blockwiseModules steps described above, then employed the WGCNA network preservation analysis with the function modulePreservation. This procedure determines the degree of similarity between a “reference” and “test” network based on several measures. Module preservation is quantified and expressed as a z-score per module.

Modules with a z-score < 2 are considered poorly preserved, 2 < z-score < 10 are considered moderately preserved, and z-score > 10 are highly preserved. The biology of the module with the lowest rank (i.e., very poorly preserved in BD patients) was evaluated using a gene set enrichment analysis (hypergeometric tests).

TESTING FOR ENRICHMENT OF GWAS SIGNALS ACROSS DIFFERENTIALLY EXPRESSED GENES IN BD

We sought to evaluate convergence between transcriptomic and genetic signatures of BD. We approached this question using a permutation-based GWAS enrichment method, which calculates GWAS associations for a differentially expressed gene set and compares it to bootstrapped gene sets. The process starts with identifying the largest association peak obtained from GWAS within a gene window, which we varied three

252 times in accordance with conventional definitions of gene windows and lines of eQTL evidence (Veyrieras et al., 2008; Zhang et al., 2014). We obtained the Psychiatric

Genomics Consortium’s most recent and publicly available summary statistics for BD

GWAS (Sklar et al., 2011), and a separate GWAS of BD and SZ cases distinguish BD- specific variants (Ruderfer et al., 2013), which were used to identify association peaks in genes. Association peaks were assigned per gene (–log10[p-value] to z-score conversion), which were adjusted for the number of SNPs per gene using linear regression to avoid inflating the value of genes with a high density of SNP. Associations were summarized across gene sets using using Stouffer’s combination method. We determined the significance of a gene set GWAS association through a permutation-based approach.

Genes were randomly sampled without replacement from the background (all genes with expression intensities, Stage 1) 1,000 times, and the process of deriving a GWAS association signal was repeated. The sum of bootstrapped gene sets with GWAS association scores as extreme or greater than differentially expressed genes was divided by the number of permutations to yield an empirical p-value. The whole process was performed a total of three times to evaluate different association peaks assigned to genes based on different gene windows. We present two p-values for each test (Stouffer’s and permutation-based).

CROSS-REFERENCING RESULTS WITH PREVIOUS MEGA-ANALYSES OF SCHIZOPHRENIA

We compared differential expression summary statistics for gene-level results obtained from brain- and blood-based analyses of SZ and BD. Gene symbols and differential expression statistics (i.e., regression coefficients, standard errors, and p-

253 values) were obtained from three studies: (1) the present analysis of BD, (2) our previous transcriptome-wide analyses of SZ (brain: SZ = , controls = ; blood: SZ = , controls = ), and (3) a transcriptome-wide mega-analysis of BD from a separate group that analyzed microarray data sets from six studies (overlapping) of frontal cortex (BD = 65, controls =

73), and separately, two studies of hippocampus (BD = 28, controls = 31) (Seifuddin et al., 2013a).

CLASSIFIER OF BD AND UNAFFECTED COMPARISON SUBJECTS

Our combined set of six microarray studies was used to develop a machine learning classifier that could accurately distinguish BD from unaffected comparison subjects based on blood-based gene expression measures. We are aware that machine learning classifiers are highly sensitive to picking technical artefacts and noise, which can disrupt performance and confound biologically driven variation.

Our approach was to limit that amount of noise that the classifier models would be trained on by first splitting our combined matrix of a set of four (BD = 39, unaffected comparisons = 34), and two studies (BD = 38, unaffected comparisons = 47) that correspond to samples profiled with Affymetrix or Illumina microarrays, respectively.

After stratifying the data by array type, we performed fit linear regression per gene on the

Affymetrix strata to rank genes according to their differential expression measures between the two diagnostic groups. We covaried for age, sex, and source of RNA (i.e., monocytes, leukocytes, or peripheral blood mononuclear cells) to ensure that the main effect of diagnosis on gene expression was not confounded by spurious relationships. The top 100 genes (2, 17, 32, 47, 62, 92, or 100) were brought forward to build classifier

254 models. We split the Affymetrix strata into a set of 25 BD and 25 unaffected comparison subjects to train a model that identifies patterns in the data that best discriminate the two diagnostic classes. The predictive performance of each model was evaluated by looking at the number of correctly versus misclassified subjects in the Illumina strata (for which diagnosis was known) across various cut points of the data, which were summarized as area under the receiver operating characteristic (AUCROC). This is a standard metric for evaluating a classifier’s performance and is valuable in situations of class imbalance. An

AUCROC was obtained for the number of modeled genes across 100-fold Monte Carlo re-samplings (i.e., sampling the training set and re-fitting a model), then summarized to obtain a mean and confidence interval of AUCROC. We repeated this analysis using the reciprocal process – fitting classifiers using samples profiled on Illumina arrays then testing the performance of models in Affymetrix samples – to investigate whether changing microarray platform make a difference to the generalizability of a classifier.

Three machine learning algorithms were examined using this pipeline: (1) linear kernel support vector machines (SVMs), (2) random forest, and (3) naïve Bayes. Random forest classifiers were generated with 10, 20, 50, or 100 trees with the algorithm provided by the randomForest package in R. We used the package caret to fit linear kernel SVMs and naïve Bayes classifiers. We included a step to find an optimal value for the cost parameter to use in single SVM model using the trainControl function in caret, wherein models with different values for cost (range: 1e-04 – 1) were repeatedly trained and tested through 10-fold Monte Carlo cross-validation and ranked by their performance.

255

CLASSIFIER OF SZ AND BD

We performed gene-based machine learning classification of SZ and BD for samples profiled on Affymetrix microarray chips (SZ = 50, BD = 42). We approached this by splitting the data set containing 92 samples with no missing gene expression values (k = 6,389 genes) into a training set in which machine learning models were fitted, and a validation set comprised of 14 BD and 12 SZ subjects from a single microarray study and Affymetrix array edition [i.e., Affymetrix Human Genome U133 Plus 2.0,

(Tsuang et al., 2005)] in which the performance of machine learning classifiers were assessed. In the training set, a linear regression framework was used to rank the 6,389 genes by the significance of their differential expression measures between the SZ and

BD. We covaried for the main effect of: (1) sex, and (2) source of RNA from blood samples (i.e., lymphocytes, monocytes, or peripheral blood mononuclear cells). We selected the top 2, 17, 32, 47, 62, 92, or 100 genes to fit classifiers in a set of 24 BD and

24 SZ cases sampled from the training set. Through 100-fold Monte Carlo cross- validation, we assessed the performance of each classifier in correctly predicting diagnostic labels for the remaining set of 11 BD and 38 cases in the training sample (not part of model fit). Further, we looked at the performance of each model in the withheld samples from the Tsuang et al. (2005) study. Performance of models was evaluated using

AUCROC summarized over 100-fold Monte Carlo re-samplings. We used three machine learning classifiers in this process: random forest, linear kernel support vector machines

(SVMs), and naïve Bayes using the same setup for hyper-parameters (i.e., random forest trees and cost) as in the previous section.

256

RESULTS

LINEAR MIXED MODELS IDENTIFY GENES ASSOCIATED WITH BIPOLAR DISORDER

From our Stage 1 analysis of 14,942 genes, 1,645 genes were nominally associated with BD at am uncorrected p-value < .05, which showed a near 50:50 split in the number of up- and down-regulated genes (two-tailed sign test p-value < .41). The difference in gene expression exhibited in BD cases relative to unaffected comparisons were modest across nominally significant genes; log2-fold changes ranged from –0.771 to +0.714, with a median |log2 fold-change| = 0.316. The same values expressed as beta coefficients equated to a range of –1.499 to +1.559, and a median |beta| = 0.752. Three genes showed highly significant differences in expression between BD cases and unaffected comparison subjects at a FDR-adjusted significance of q-value < .05 (FBXL8,

F-Box and Leucine-Rich Repeat Protein 8, log2 fold-change = 0.714; AK4, Adenylate

Kinase 4, log2 fold-change = 0.137; ARID4B, AT Rich Interactive Domain 4B, RBP1-

Like, log2 fold-change = -0.199). FBXL8 and ARIDB4 associations remained significant after multiplying their p-values by the total number of genes tested in Stage 1 (i.e.

Bonferroni-adjusted p < .05). We provided identities and differential expression statistics

(beta, t-statistics, d.f, p-values) for genes associated with BD at a FDR-adjusted p-value <

0.1 (n = 41) in Table 3. A full table of differential expression statistics can be made available upon request to corresponding author.

In our Stage 2A ARID4B ranked near the top of the list of differentially expressed genes (p-value < 3.36e-05, FDR-adjusted p-value < .097), suggesting that this gene is robustly associated with BD. Our Stage 2B analysis also identified ARID4B near the top of the list of differentially expressed genes (p-value < 3.63e-05, FDR-adjusted p-value <

257

0.016). ARID4B association with BD was replicated in a fixed-effect meta-analysis

(Figure 3). This suggests that the association of ARID4B with BD is robust to choice of cross-study regression method. Moreover, transcriptome-wide effect sizes generated across Stage 1, Stage 2A and Stage 2B were significantly correlated (Table 4), suggesting that surrogate variable-adjusted data analyzed with linear mixed models yields highly similar differential expression measurements relative to covariate-adjusted models. Although effect sizes were heterogeneous across studies for the three genes highly significant from Stage 1, per study directions of effect were consistent with respect to the cross-study pooled estimates for ARID4B; this is a pattern that was also observed for FBXL8 (Figure 3).

GENE SET ENRICHMENT ANALYSIS

Gene set enrichment analysis was performed on differential expression statistics obtained from in Stage 2A analysis due to wider coverage of the transcriptome relative to

Stage 1. We identified 230 gene sets significantly associated with BD (FDR-adjusted p- value < .05, Table 5). The ratio of down-regulated to up-regulated gene sets associated with BD was significantly greater than expected by chance (two-tailed sign test p < 2.2e-

16). This suggests that, on average, biological pathways associated with BD exhibit lower expression levels relative to unaffected comparison subjects. Pathways showing lower expression levels in BD patients included the following, among others: (1) metabolism and splicing of RNA, (2) metabolism and secretion of proteins, (3) genes involved in mitochondrial structure and electron transport chain, (4) genes encoding canonical protein targets for MYC, (5) signaling through mTOR complex, (6) unfolded

258 protein responses, and (7) guidance cues for nonsense mediated decay, among others.

Gene sets that were significantly up-regulated in BD captured several pathways involved cellular responses to inflammation (TFNα signaling via NFκB, IFγ signaling, IL2-STAT5 signaling chain) and management of reactive oxygen species.

GENE CO-EXPRESSION NETWORKS

We evaluated the main effect of BD on expression levels of gene networks assembled using an unsupervised clustering approach on combined-subject data from

Stage 1. A total of 51 tightly co-expressed gene modules were identified through network analysis. A linear mixed model was used to evaluate differential expression of each gene module in BD patients relative to unaffected comparisons. One gene module called

“grey60” showed a modest association with the main effect of diagnosis (uncorrected p <

0.019), but did not survive false-discovery rate multiple testing correction. The grey60 module was up-regulated in BD, and at the top of the list of gene sets enriched in this model were genes involved in heme and porphyrin metabolism (Bonferroni-adjusted p- value <.05, Table 6). Genes that were ranked among the top 5 in the grey60 module according to connectivity strength (i.e., summation of pair-wise correlations) were

SELENBP1, FAM104A, GMPR, CA1, and OR2W3 (Figure 6A).

After evaluating the main effect of diagnosis of gene network expression levels, we split the Stage 1 expression data into two sets for each diagnostic group and constructed two gene networks comprising 60 and 34 modules for BD patients and unaffected comparisons, respectively. We then employed a preservation analysis as per instructions and tutorials available at the WGCNA web-page (Langfelder et al., 2011) to

259 determine whether each module was preserved across groups. Out of 34 reference modules, 13 showed moderate preservation (2 < z-score < 10) and 20 showed very strong evidence of preservation in unaffected comparison subjects (z-score > 10, Figure 6B).

The last remaining module in the reference network were not significantly preserved across BD patients and unaffected comparison subjects (z-score = 0.77); suggesting that network structures for this lowest ranking module were disrupted in BD patients. We examined the top-most enriched gene sets identified in the lowest ranking module according evidence of preservation to determine its corresponding biology. The “tan” module (z-score =0.77) showed a significant enrichment of E2F targets and gene sets involved in the unwinding of DNA.

DIFFERENTIALLY EXPRESSED GENES IN BD WITH GWAS ASSOCIATIONS

No differentially expressed genes (FDR q-value < 0.1) identified in our Stage 1 mega-analysis exhibited genome-wide significance in association with BD. The strongest

GWAS association signal we identified was for ANKRD10 (GWAS p < 1e-04). Our permutation-based GWAS enrichment approach did not detect a statistical enrichment of

GWAS signals among genes differentially expressed (data not shown), however, these results are subject to change with inclusion of the next larger wave of GWAS data from the Psychiatric Genomics Consortium.

We then examined three pathways that are strongly enriched with GWAS signals for SZ, BD, and major depression (Network and Pathway Analysis Subgroup of

Psychiatric Genomics Consortium, 2015) to see if our differentially expressed genes for

SZ and BD are significantly over-represented in these pathways (Hess et al., 2016;

260

Seifuddin et al., 2013b). Our findings are summarized in Table 7. As expected, genes that are differentially expressed in SZ and BD are significantly over-represented among immune, histone, and synaptic pathways. Synaptic genes were significantly enriched among top genes from all four mega-analyses (differential expression q-value < 0.1).

Differentially expressed genes identified in our previous blood-based SZ mega-analysis were highly enriched in the three pathways.

GENES CONCORDANTLY AND DISCORDANTLY DYSREGULATED ACROSS SZ AND BD

Transcriptomic signatures that are common to SZ and BD have yet to be elucidated. To fill this cap, we cross-referenced results from our present and prior mega- analyses of ex vivo peripheral blood (BD and SZ) and postmortem brain (SZ only), in addition to a previously published postmortem brain mega-analysis of BD (Seifuddin et al., 2013a). We first examined the correlation of differential expression statistics for genes that showed nominally significant dysregulation in SZ and BD (p-value < .05) in the frontal cortex. Differential expression similarities between SZ and BD in the brain are summarized in Figure 4A – 4B. We found 47 genes that are dysregulated in the frontal cortex of SZ and BD, which show strongly correlated differential expression values (r =

0.776, Pearson’s p-value = 1.5e-10). The biology of these genes has relevance to SZ and

BD pathophysiology: brain development (BAG3, SOX9, and EMX2), regulation of synaptic transmission (CRHBP, CRH, NMU, PNOC, and SLC1A3), apoptosis (TP53BP2 and BAG3), regulation of growth/metal-ion binding (MTE1, MT1H, MTX1, MT2A), and cell adhesion (EMNC, FERMT2, ITGAM, PALLD). Expanding on these findings, there are 18 cross-disorder genes that nominally dysregulated across brain regions (i.e., frontal

261 cortex in SZ and hippocampus in BD), which are moderately strong correlated based on differential expression values (r = 0.541, Pearson’s p-value = 0.02, Figure 4C – 4D).

Four of the 18 genes dysregulated in the frontal cortex of SZ and hippocampus of BD

(KCNS3, MT2A, NMU, PNOC) were also found on the list of 47 differentially expressed genes in the frontal cortex of SZ and BD, including: (1) the neuropeptide encoding gene

NMU, which interacts with G-protein coupled receptors in nerve cells and relates to mast cell-mediated inflammatory regulation (Moriyama et al., 2005), and the neuropeptide encoding gene PNOC, which is a marker for cortical interneurons (Zeng et al., 2012), (2) the protein-coding gene MT2A (Metallothionein 2A) which is related to interferon gamma signaling and protects against oxidative stress-mediated cell death (Reinecke et al., 2006), and (4) the potassium voltage-gated channel subunit gene (KCNS3), which has reported associations with SZ and BD from previous GWAS (Goes et al., 2015;

Greenwood et al., 2012). One report showed a 23% reduction of KCNS3 mRNA expression in parvalbumin-positive neurons in the prefrontal cortex of SZ-affected individuals, which may underlie altered gamma-oscillation and cognitive impairments

(Georgiev et al., 2014). The gene S100B was found to be coordinately up-regulated in both SZ and BD, which appeared on the list of 18 genes that were dysregulated in frontal cortex of SZ and hippocampus of BD. S100B encodes a calcium-binding peptide that is released by astrocytes and elevated in response to brain damage/neurodegeneration

(Brozzi et al., 2009; Rothermundt et al., 2003). Taken together, these findings from transcriptome-wide studies provide insights into shared molecular substrates of SZ and

BD, which also helps us to interpret recent studies that reported polygenic overlap

262 between these disorders (Purcell et al., 2009; Bulik-Sullivan et al., 2015; Power et al.,

2015).

We then compared differential expression measures for nominally significant genes (p-value < 0.05) from our present study and our previous transcriptome-wide analysis of SZ using a combined set of peripheral blood data sets (Hess et al., 2016). A total of 352 genes showed evidence of dysregulation in SZ and BD within blood; remarkably, one gene that encodes a mitochondrial matrix protein (AK4) survived corrections for multiple testing within-trait for SZ (q-value= 0.018) and BD (q-value =

3.6e-03). Contrary to what we observed from cross-referencing differentially expressed genes from the brain, genes were largely discordantly dysregulated in the blood of SZ and BD subjects (r = -0.29, Pearson’s p-value = 3.2e-08, Figure 5). It is unclear at this point what the mechanism(s) is/are related to discordant gene expression profiles in the blood. However, discordant gene expression profiles in blood suggests that we can work towards developing a disorder-specific classifier of SZ and BD.

BLOOD-BASED MICROARRAY CLASSIFIERS OF BD, SZ, AND UNAFFECTED COMPARISON SUBJECTS

Our classification analysis of BD and unaffected comparison subjects

(summarized in Table 8) revealed that classification accuracies are drastically different between models trained on samples profiled on Affymetrix versus Illumina microarrays

(mean difference in AUROC ~ 0.20). These differences potentially emerged due to difference in power (20 more samples in Affymetrix training set), technical variation between platforms, or cryptic variation that impinges classifier performance in the hold-

263 out sample. Only two genes (WDR13 and TCTA) were common to the two lists of top 100 genes ranked by differential expression between BD and unaffected comparison subjects identified on either platform. Models that were trained on Affymetrix samples generalized better with classifying samples in the Illumina samples than models trained on Illumina studies. Linear kernel SVMs showed the best performance overall compared to random forests and naïve Bayes with eight configurations differing by number of genes showing a mean AUCROC > 0.70, and the top performing model showing a mean

AUCROC = 0.82 (95% CI = 0.818 – 0.822).

These observations informed our classifier analysis of BD and SZ subjects. Our approach was to fit machine learning models in a combined data set of seven studies (all

Affymetrix based) and test the performance of models using samples from a single study and Affymetrix array edition. Contrary to our earlier findings, linear kernel SVMs exhibited the poorest performance overlap with a mean AUCROC of ~0.5 for all numbers of genes fitted (Figure 7). Naïve Bayes and random forest models showed similar performance overall, however, the best model identified was a Naïve Bayes classifier trained on 77 genes (mean AUCROC = 0.849, 95% CI = 0.837 – 0.859).

DISCUSSION

We have employed a systematic comparison of blood-based gene expression profiles between BD patients and unaffected comparison subjects. In this study, we identified three genes strongly dysregulated (FBXLB8, ARID4B, and AK4) in BD and one gene co-expression network nominally associated with the disorder. Interestingly, close to 99% of gene modules identified in unaffected comparison subjects showed moderate to

264 high preservation in BD patients, suggesting a majority of gene network structures (i.e., network density and connectivity patterns) are relatively unperturbed in the periphery.

This is perhaps a generalizable characteristic for psychosis as evidenced in an earlier study that compared gene networks between SZ cases and unaffected comparison subjects, which reported a high degree of preservation between these groups (Chen et al.,

2013a). We confirmed that associations for the top three differentially expressed genes identified in our primary mega-analysis were robust to effects of confounding variables

(clinical factors and surrogate variables defined) and choice of cross-study regression approaches (i.e., agreement between mega- and meta-analysis). Our finding that FBXL8 is up-regulated in the peripheral blood of BD patients relative to unaffected subjects is consistent with a previous postmortem brain gene expression study of orbitofrontal cortex

(Ryan et al., 2006). The roles of AK4 and ARID4B in BD have not been characterized; however, ARID4B was found to be differentially expressed in the blood of males with autism spectrum disorder (Kong et al., 2012), suggesting that this gene may have a role in multiple psychiatric disorders.

We expanded gene-level findings using a biological gene set enrichment approach, which identified several pathways associated with BD: inflammation, protein and RNA metabolism, and mitochondrial-related functions surrounding energy production. Our microarray mega-analysis provides strongly supportive evidence that blood-based biomarkers for BD may reside in these pathways. Inflammatory signatures of BD are well-recognized and continue to be investigated for their diagnostic potential

(Becking et al., 2015; Munkholm et al., 2015; Haenisch et al., 2016; Fillman et al., 2014).

Increased auto-inflammatory signaling coupled with decreased mitochondrial-related

265 functions has the potential to disturb neurobiological pathways, which has strong implications for BD (Patel and Frey, 2015; Theoharides et al., 2011). Our study has not established causal links between transcription dysregulated and BD, but does shine a light on robust biological correlates of the disorder.

We have identified a set of candidate genes with evidence of differential dysregulation BD and SZ by cross-referencing top results produced in mega-analyses from our laboratory. This post hoc comparison of results at the gene-level suggested that overlapping signatures of BD and SZ generally show opposing directions of effects, suggesting that pathways related to these disorders might be discordantly altered. We examined biological annotations of those genes that were concordantly dysregulated across BD and SZ; the most frequently observed annotation represented by these genes was the hallmark pathway for complement system, a fundamental component of innate immunity (ARG1, AZU1, CR1, LCN2, LTF, MMP8, NCF4, PBX2, and CKAP4). The complement system is of prominent interest at the moment in psychiatry after the discovery a premature synaptic pruning mechanism involving the C4 risk gen for SZ

(Sekar et al., 2016). It is possible that complement system dysregulation is a cross- disorder pathophysiological mechanism of psychosis. Further exploration of the genetic and molecular connections that lead to disturbances in immune, metabolic, and cell survival pathways is warranted, along with deeper refinement of disorder-specific and cross-disorder genes associated with BD and SZ. Deconvolution of SZ and BD is possible at the genetic level (Ruderfer et al., 2013), and as evidenced in our classification analysis, these disorder can be discriminated according to difference in transcriptomic signatures. This supporting evidence lays the groundwork for identification of biological

266 substrates that are specifically altered in SZ and BD, which may eventually be adopted as means for differential diagnoses in psychiatric clinics, or lead to the identification of treatable targets that better mediate symptoms of these disorders.

267

TABLE 1. DEMOGRAPHICS OF THE SIX MICROARRAY STUDIES THAT WERE INCLUDED IN THE MEGA-ANALYSIS.

Sample size and characteristics Age

Study ID Array Platform BD CT Caucasian Other % Male % Medicated Cell type BD CT P-value Sex (P-value) Race (p-value)

Beech et al. (2010) Illumina Human-6v2 20 15 23 12 0.31 0.31 Whole blood 38.35 28.93 0.01 1.00 1.00

Bousman et al. (2010) Affymetrix Human Exon 1.0ST 9 8 12 5 0.71 0 PBL 42.33 45.00 0.46 0.88 0.88

Clelland et al. (2013) Affymetrix U1332P 21 14 0 0 0 0 PBL 40.10 31.50 0.02 NA NA

Padmos et al. (2008) Affymetrix U95Av2 5 6 0 0 0.45 0 Monocytes 25.96 21.83 0.50 1.00 NA

Savitz et al. (2013) Illumina HumanHT-12 V4.0 8 24 0 0 0 0 Monocytes 38.00 34.04 0.37 NA NA

Tsuang et al. (2005) Affymetrix U1332P 14 14 0 28 0.48 0.33 PBL 42.29 38.07 0.48 0.44 NA

Total 77 81 38.6 33.5 .004 1 0.76 Entries with “NA” denotes missing or invariant data. Abbreviations: Bipolar disorder (BD), control subjects (CT), Peripheral blood leukocytes (PBL).

268

TABLE 2. ESTIMATED PROPORTION OF PERIPHERALLY CIRCULATING IMMUNE CELLS IN BIPOLAR DISORDER CASES AND UNAFFECTED SUBJECTS. No credible difference in predicted cell type abundances was observed between BD cases and unaffected comparisons. Group mean coefficients (predicted cell abundance) Circulating immune cell types BD CT t-score p-value BH p-value* IgM Memory B cells 0.05 0.07 -2.12 0.04 0.32 Activated Monocytes 0.19 0.18 1.91 0.06 0.32 T-helper lymphocytes 0.02 0.01 1.55 0.12 0.45 Plasma cells 0.00 0.00 -1.00 0.32 0.74 Activated Cytotoxic T-cells 0.08 0.08 0.97 0.34 0.74 Activated Dendritic cells 0.11 0.11 0.56 0.58 0.95 Neurtrophils 0.16 0.15 0.39 0.69 0.95 IgG Memory B cells 0.32 0.32 0.30 0.77 0.95 Natural Killer cells 0.02 0.02 -0.28 0.78 0.95 Dendritic cells 0.05 0.05 -0.12 0.90 0.99 Activated T-helper lymphocytes 0.00 0.00 -0.01 0.99 0.99 *Benjamini-Hochberg adjusted p-values are presented in ascending order. Abbreviations: Bipolar disorder (BD), control subjects (CT), Benjamini-Hochberg adjustment (BH).

269

TABLE 3. TOP MOST DIFFERENTIALLY EXPRESSED GENES (N = 41) IN BIPOLAR DISORDER BASED ON STAGE 1 META- ANALYSIS (COVERAGE = 14,942 GENES; FDRP < 0.1). Gene appearing in bold survived correction for total number of tests performed (Bonferroni p < .05).

Gene symbol Βeta SE Df* t-value p-value FDR q-value Bonferroni p FBXL8 1.388 0.259 72 5.365 2.37E-07 3.53E-03 0.004 ARID4B -0.392 0.305 82 -1.285 3.07E-06 1.84E-02 0.046 AK4 0.646 0.384 46 1.685 3.69E-06 1.84E-02 0.055 ATG4D 1.245 0.305 58 4.076 1.42E-05 5.32E-02 0.213 OGT -0.769 0.321 81 -2.397 3.33E-05 7.79E-02 0.498 GADD45B 0.395 0.346 82 1.142 3.49E-05 7.79E-02 0.521 RBM25 -1.335 0.348 72 -3.832 3.83E-05 7.79E-02 0.573 CHSY1 -0.764 0.33 82 -2.312 5.87E-05 7.79E-02 0.878 RBM22 -0.41 0.352 72 -1.164 6.28E-05 7.79E-02 0.939 ZMAT2 1.35 0.353 58 3.825 6.75E-05 7.79E-02 1.000 NT5C2 -0.73 0.328 82 -2.225 7.82E-05 7.79E-02 1.000 PTK2 -0.584 0.32 82 -1.826 7.51E-05 7.79E-02 1.000 PTPRU -0.969 0.231 82 -4.195 6.87E-05 7.79E-02 1.000 ZNF529 -0.498 0.342 82 -1.456 7.60E-05 7.79E-02 1.000 FAM9B -1.5 0.334 37 -4.487 6.80E-05 7.79E-02 1.000 RPUSD1 1.211 0.398 58 3.044 9.02E-05 8.42E-02 1.000 PDPR 1.324 0.322 72 4.114 1.02E-04 8.46E-02 1.000 SH3YL1 -0.888 0.323 80 -2.749 9.88E-05 8.46E-02 1.000 ACP5 0.748 0.333 82 2.248 1.37E-04 8.76E-02 1.000 AHCYL2 1.502 0.363 48 4.134 1.42E-04 8.76E-02 1.000 AVP 1.309 0.314 51 4.165 1.21E-04 8.76E-02 1.000 OPN1LW 1.359 0.326 51 4.173 1.18E-04 8.76E-02 1.000 RANBP17 0.987 0.35 72 2.823 1.49E-04 8.76E-02 1.000 TMEM41B -0.393 0.314 80 -1.254 1.52E-04 8.76E-02 1.000

270

FSTL1 -0.432 0.416 51 -1.038 1.51E-04 8.76E-02 1.000 IGSF9B 1.368 0.32 37 4.278 1.28E-04 8.76E-02 1.000 ZDHHC12 0.724 0.3 58 2.413 1.72E-04 8.85E-02 1.000 ZNF564 1.158 0.326 58 3.558 1.67E-04 8.85E-02 1.000 ANKRD10 -1.114 0.387 72 -2.881 1.63E-04 8.85E-02 1.000 SMEK2 -0.939 0.349 71 -2.687 2.01E-04 8.92E-02 1.000 TMEM101 0.729 0.286 72 2.548 1.96E-04 8.92E-02 1.000 IGHG1 -0.607 0.56 48 -1.083 1.89E-04 8.92E-02 1.000 SOCS3 0.644 0.332 82 1.942 2.03E-04 8.92E-02 1.000 TXN 0.63 0.344 82 1.829 1.85E-04 8.92E-02 1.000 BCL7A -1.056 0.316 82 -3.34 2.24E-04 9.39E-02 1.000 ZNF284 -1.179 0.301 57 -3.921 2.38E-04 9.39E-02 1.000 PPARA 0.801 0.257 82 3.112 2.34E-04 9.39E-02 1.000 SHC3 -1.361 0.352 72 -3.868 2.39E-04 9.39E-02 1.000 SLTM -0.541 0.389 72 -1.389 2.57E-04 9.85E-02 1.000 ZC3H12A 0.752 0.409 72 1.838 2.67E-04 9.96E-02 1.000 MAN1C1 -0.479 0.277 80 -1.728 2.73E-04 9.96E-02 1.000

271

TABLE 4. COMPARISON OF DIFFERENTIAL EXPRESSION ESTIMATES ACROSS STATISTICAL MODELS. Beta coefficients derived from four cross-study regression approaches per gene were quantitatively compared using Pearson’s correlation test. Beta coefficients generated using combined-subject mega-analyses were in strong agreement across Stage 1, Stage2A and Stage 2B. Mega-analysis SVA per study SVA on full matrix Meta-analysis (Stage 1) (Stage 2A) (Stage 2B) Mega-analysis (Stage 1) - - - - SVA per study 0.863 (Stage 2A) (3.399e-05) - - - SVA on full matrix 0.923 0.963 (Stage 2B) (5.28e-07) (8.13e-09) - - 0.15 0.075 0.06 Meta-analysis (0.589) (0.791) (0.832) - P-values for Pearson’s r coefficients are shown in parentheses.

272

TABLE 5. PERMUTATION-BASED GENE SET ENRICHMENT ANALYSIS IMPLICATED 230 BIOLOGICALLY-ANNOTATED GENES IN BIPOLAR DISORDER. Differential gene expression statistics used for this permutation-based test were derived from Stage 2A mega-analysis with a coverage 20,534 genes from the transcriptome.

Database Gene set p-value FDRp Direction (BD) BioProc RESPONSE_TO_EXTERNAL_STIMULUS < 0.0001 < 0.0001 All up-regulated BioProc INFLAMMATORY_RESPONSE < 0.0001 < 0.0001 All up-regulated BioProc PROTEIN_KINASE_CASCADE < 0.0001 < 0.0001 All up-regulated BioProc CELLULAR_COMPONENT_ASSEMBLY < 0.0001 < 0.0001 All down-regulated BioProc MACROMOLECULAR_COMPLEX_ASSEMBLY < 0.0001 < 0.0001 All down-regulated NUCLEOBASENUCLEOSIDENUCLEOTIDE_AND_NUCLEIC_ACID_METABOLIC_ BioProc PROCESS < 0.0001 < 0.0001 All down-regulated BioProc RNA_METABOLIC_PROCESS < 0.0001 < 0.0001 All down-regulated BioProc RIBONUCLEOTIDE_METABOLIC_PROCESS < 0.0001 < 0.0001 All down-regulated BioProc RNA_PROCESSING < 0.0001 < 0.0001 All down-regulated BioProc PROTEIN_RNA_COMPLEX_ASSEMBLY < 0.0001 < 0.0001 All down-regulated BioProc RIBONUCLEOPROTEIN_COMPLEX_BIOGENESIS_AND_ASSEMBLY < 0.0001 < 0.0001 All down-regulated BioProc CELLULAR_MACROMOLECULE_METABOLIC_PROCESS < 0.0001 < 0.0001 Mixed up-regulated BioProc PROTEIN_METABOLIC_PROCESS < 0.0001 < 0.0001 Mixed up-regulated BioProc CELLULAR_PROTEIN_METABOLIC_PROCESS < 0.0001 < 0.0001 Mixed up-regulated BioProc PROTEIN_MODIFICATION_PROCESS < 0.0001 < 0.0001 Mixed up-regulated BioProc BIOPOLYMER_MODIFICATION < 0.0001 < 0.0001 Mixed up-regulated BioProc RESPONSE_TO_WOUNDING < 0.0001 < 0.0001 Mixed up-regulated BioProc RNA_METABOLIC_PROCESS < 0.0001 < 0.0001 Mixed down-regulated BioProc RNA_PROCESSING < 0.0001 < 0.0001 Mixed down-regulated BioProc CELLULAR_MACROMOLECULE_METABOLIC_PROCESS < 0.0001 < 0.0001 Non-directional BioProc PROTEIN_METABOLIC_PROCESS < 0.0001 < 0.0001 Non-directional BioProc CELLULAR_PROTEIN_METABOLIC_PROCESS < 0.0001 < 0.0001 Non-directional CellCompartment EXTRACELLULAR_SPACE < 0.0001 < 0.0001 All up-regulated CellCompartment MACROMOLECULAR_COMPLEX < 0.0001 < 0.0001 All down-regulated CellCompartment NUCLEAR_PART < 0.0001 < 0.0001 All down-regulated CellCompartment NUCLEUS < 0.0001 < 0.0001 All down-regulated

273

CellCompartment ORGANELLE_PART < 0.0001 < 0.0001 All down-regulated CellCompartment INTRACELLULAR_ORGANELLE_PART < 0.0001 < 0.0001 All down-regulated CellCompartment MITOCHONDRIAL_INNER_MEMBRANE < 0.0001 < 0.0001 All down-regulated CellCompartment MEMBRANE_ENCLOSED_LUMEN < 0.0001 < 0.0001 All down-regulated CellCompartment NUCLEAR_LUMEN < 0.0001 < 0.0001 All down-regulated CellCompartment ORGANELLE_LUMEN < 0.0001 < 0.0001 All down-regulated CellCompartment RIBONUCLEOPROTEIN_COMPLEX < 0.0001 < 0.0001 All down-regulated CellCompartment MITOCHONDRIAL_RESPIRATORY_CHAIN < 0.0001 < 0.0001 All down-regulated CellCompartment NUCLEAR_PART < 0.0001 < 0.0001 Mixed down-regulated CellCompartment NUCLEUS < 0.0001 < 0.0001 Mixed down-regulated CellCompartment ORGANELLE_PART < 0.0001 < 0.0001 Mixed down-regulated CellCompartment INTRACELLULAR_ORGANELLE_PART < 0.0001 < 0.0001 Mixed down-regulated CellCompartment CYTOPLASMIC_PART < 0.0001 < 0.0001 Mixed down-regulated CellCompartment RIBONUCLEOPROTEIN_COMPLEX < 0.0001 < 0.0001 Mixed down-regulated CellCompartment CYTOPLASM < 0.0001 < 0.0001 Non-directional CellCompartment CYTOPLASMIC_PART < 0.0001 < 0.0001 Non-directional CellCompartment GOLGI_APPARATUS < 0.0001 < 0.0001 Non-directional Hallmark HALLMARK_INFLAMMATORY_RESPONSE < 0.0001 < 0.0001 All up-regulated Hallmark HALLMARK_TNFA_SIGNALING_VIA_NFKB < 0.0001 < 0.0001 All up-regulated Hallmark HALLMARK_OXIDATIVE_PHOSPHORYLATION < 0.0001 < 0.0001 All down-regulated Hallmark HALLMARK_MYC_TARGETS_V1 < 0.0001 < 0.0001 All down-regulated Hallmark HALLMARK_TNFA_SIGNALING_VIA_NFKB < 0.0001 < 0.0001 Mixed up-regulated Hallmark HALLMARK_MYC_TARGETS_V1 < 0.0001 < 0.0001 Mixed down-regulated Hallmark HALLMARK_UNFOLDED_PROTEIN_RESPONSE < 0.0001 < 0.0001 Mixed down-regulated Hallmark HALLMARK_MYC_TARGETS_V1 < 0.0001 < 0.0001 Non-directional KEGG KEGG_O_GLYCAN_BIOSYNTHESIS < 0.0001 < 0.0001 All up-regulated KEGG KEGG_SPLICEOSOME < 0.0001 < 0.0001 All down-regulated KEGG KEGG_RIBOSOME < 0.0001 < 0.0001 All down-regulated KEGG KEGG_SPLICEOSOME < 0.0001 < 0.0001 Mixed down-regulated KEGG KEGG_SPLICEOSOME < 0.0001 < 0.0001 Non-directional MolFunc RNA_BINDING < 0.0001 < 0.0001 All down-regulated

274

MolFunc TRANSLATION_FACTOR_ACTIVITY_NUCLEIC_ACID_BINDING < 0.0001 < 0.0001 All down-regulated MolFunc STRUCTURAL_CONSTITUENT_OF_RIBOSOME < 0.0001 < 0.0001 All down-regulated Reactome REACTOME_MRNA_PROCESSING < 0.0001 < 0.0001 All down-regulated Reactome REACTOME_PROCESSING_OF_CAPPED_INTRON_CONTAINING_PRE_MRNA < 0.0001 < 0.0001 All down-regulated Reactome REACTOME_INFLUENZA_LIFE_CYCLE < 0.0001 < 0.0001 All down-regulated Reactome REACTOME_METABOLISM_OF_RNA < 0.0001 < 0.0001 All down-regulated Reactome REACTOME_ACTIVATION_OF_CHAPERONE_GENES_BY_XBP1S < 0.0001 < 0.0001 All down-regulated Reactome REACTOME_METABOLISM_OF_PROTEINS < 0.0001 < 0.0001 All down-regulated REACTOME_PREFOLDIN_MEDIATED_TRANSFER_OF_SUBSTRATE_TO_CCT_TR Reactome IC < 0.0001 < 0.0001 All down-regulated Reactome REACTOME_METABOLISM_OF_MRNA < 0.0001 < 0.0001 All down-regulated Reactome REACTOME_MRNA_SPLICING < 0.0001 < 0.0001 All down-regulated REACTOME_NONSENSE_MEDIATED_DECAY_ENHANCED_BY_THE_EXON_JUN Reactome CTION_COMPLEX < 0.0001 < 0.0001 All down-regulated REACTOME_FORMATION_OF_TUBULIN_FOLDING_INTERMEDIATES_BY_CCT_ Reactome TRIC < 0.0001 < 0.0001 All down-regulated REACTOME_SRP_DEPENDENT_COTRANSLATIONAL_PROTEIN_TARGETING_T Reactome O_MEMBRANE < 0.0001 < 0.0001 All down-regulated Reactome REACTOME_TRANSLATION < 0.0001 < 0.0001 All down-regulated Reactome REACTOME_INFLUENZA_VIRAL_RNA_TRANSCRIPTION_AND_REPLICATION < 0.0001 < 0.0001 All down-regulated Reactome REACTOME_PEPTIDE_CHAIN_ELONGATION < 0.0001 < 0.0001 All down-regulated Reactome REACTOME_3_UTR_MEDIATED_TRANSLATIONAL_REGULATION < 0.0001 < 0.0001 All down-regulated REACTOME_FORMATION_OF_THE_TERNARY_COMPLEX_AND_SUBSEQUENTL Reactome Y_THE_43S_COMPLEX < 0.0001 < 0.0001 All down-regulated REACTOME_ACTIVATION_OF_THE_MRNA_UPON_BINDING_OF_THE_CAP_BIN Reactome DING_COMPLEX_AND_EIFS_AND_SUBSEQUENT_BINDING_TO_43S < 0.0001 < 0.0001 All down-regulated Reactome REACTOME_MRNA_PROCESSING < 0.0001 < 0.0001 Mixed down-regulated Reactome REACTOME_PROCESSING_OF_CAPPED_INTRON_CONTAINING_PRE_MRNA < 0.0001 < 0.0001 Mixed down-regulated Reactome REACTOME_METABOLISM_OF_RNA < 0.0001 < 0.0001 Mixed down-regulated Reactome REACTOME_METABOLISM_OF_PROTEINS < 0.0001 < 0.0001 Mixed down-regulated Reactome REACTOME_METABOLISM_OF_MRNA < 0.0001 < 0.0001 Mixed down-regulated Reactome REACTOME_MRNA_SPLICING < 0.0001 < 0.0001 Mixed down-regulated REACTOME_NONSENSE_MEDIATED_DECAY_ENHANCED_BY_THE_EXON_JUN Reactome CTION_COMPLEX < 0.0001 < 0.0001 Mixed down-regulated REACTOME_SRP_DEPENDENT_COTRANSLATIONAL_PROTEIN_TARGETING_T Reactome O_MEMBRANE < 0.0001 < 0.0001 Mixed down-regulated Reactome REACTOME_TRANSLATION < 0.0001 < 0.0001 Mixed down-regulated

275

Reactome REACTOME_PEPTIDE_CHAIN_ELONGATION < 0.0001 < 0.0001 Mixed down-regulated Reactome REACTOME_3_UTR_MEDIATED_TRANSLATIONAL_REGULATION < 0.0001 < 0.0001 Mixed down-regulated REACTOME_FORMATION_OF_THE_TERNARY_COMPLEX_AND_SUBSEQUENTL Reactome Y_THE_43S_COMPLEX < 0.0001 < 0.0001 Mixed down-regulated REACTOME_ACTIVATION_OF_THE_MRNA_UPON_BINDING_OF_THE_CAP_BIN Reactome DING_COMPLEX_AND_EIFS_AND_SUBSEQUENT_BINDING_TO_43S < 0.0001 < 0.0001 Mixed down-regulated Reactome REACTOME_METABOLISM_OF_PROTEINS < 0.0001 < 0.0001 Non-directional Reactome REACTOME_MRNA_SPLICING < 0.0001 < 0.0001 Non-directional Reactome REACTOME_TRANSLATION < 0.0001 < 0.0001 Non-directional CellCompartment ORGANELLE_MEMBRANE 1.00E-04 0.0011095 All down-regulated CellCompartment INTRACELLULAR_NON_MEMBRANE_BOUND_ORGANELLE 1.00E-04 0.0011095 All down-regulated CellCompartment NON_MEMBRANE_BOUND_ORGANELLE 1.00E-04 0.0011095 All down-regulated CellCompartment ORGANELLE_INNER_MEMBRANE 1.00E-04 0.0011095 All down-regulated CellCompartment VESICLE_MEMBRANE 1.00E-04 0.0011095 All down-regulated CellCompartment CYTOPLASMIC_VESICLE_PART 1.00E-04 0.0011095 All down-regulated CellCompartment CYTOPLASMIC_VESICLE_MEMBRANE 1.00E-04 0.0011095 All down-regulated CellCompartment NUCLEOLUS 1.00E-04 0.0011095 All down-regulated CellCompartment NUCLEOPLASM 1.00E-04 0.0011095 All down-regulated CellCompartment MITOCHONDRIAL_MEMBRANE_PART 1.00E-04 0.0011095 All down-regulated Hallmark HALLMARK_E2F_TARGETS 1.00E-04 0.00125 All down-regulated Hallmark HALLMARK_UNFOLDED_PROTEIN_RESPONSE 1.00E-04 0.00125 All down-regulated Hallmark HALLMARK_INTERFERON_GAMMA_RESPONSE 1.00E-04 0.0016667 All up-regulated Hallmark HALLMARK_MYC_TARGETS_V2 3.00E-04 0.003 All down-regulated CellCompartment NUCLEOPLASM_PART 3.00E-04 0.0031773 All down-regulated Reactome REACTOME_HIV_LIFE_CYCLE 1.00E-04 0.0032095 All down-regulated Reactome REACTOME_CELL_CYCLE 1.00E-04 0.0032095 All down-regulated Reactome REACTOME_MRNA_SPLICING_MINOR_PATHWAY 1.00E-04 0.0032095 All down-regulated Reactome REACTOME_MRNA_SPLICING_MINOR_PATHWAY 1.00E-04 0.0044933 Mixed down-regulated Reactome REACTOME_INFLUENZA_VIRAL_RNA_TRANSCRIPTION_AND_REPLICATION 1.00E-04 0.0044933 Mixed down-regulated CellCompartment RIBONUCLEOPROTEIN_COMPLEX 1.00E-04 0.00466 Non-directional CellCompartment SPLICEOSOME 1.00E-04 0.00466 Non-directional CellCompartment MEMBRANE_ENCLOSED_LUMEN 2.00E-04 0.0051778 Mixed down-regulated

276

CellCompartment ORGANELLE_LUMEN 2.00E-04 0.0051778 Mixed down-regulated CellCompartment SPLICEOSOME 2.00E-04 0.0051778 Mixed down-regulated Hallmark HALLMARK_G2M_CHECKPOINT 7.00E-04 0.0058333 All down-regulated CellCompartment MITOCHONDRIAL_PART 6.00E-04 0.0060783 All down-regulated Reactome REACTOME_SIGNALING_BY_HIPPO 2.00E-04 0.0061273 All down-regulated CellCompartment DNA_DIRECTED_RNA_POLYMERASEII_HOLOENZYME 7.00E-04 0.006524 All down-regulated CellCompartment SMALL_NUCLEAR_RIBONUCLEOPROTEIN_COMPLEX 7.00E-04 0.006524 All down-regulated CellCompartment PROTEIN_COMPLEX 8.00E-04 0.0071692 All down-regulated Hallmark HALLMARK_HEME_METABOLISM 7.00E-04 0.00875 All up-regulated BioProc CHROMATIN_REMODELING 1.00E-04 0.0091667 All down-regulated KEGG KEGG_RIBOSOME 1.00E-04 0.0093 Mixed down-regulated MolFunc TRANSLATION_REGULATOR_ACTIVITY 1.00E-04 0.0099 All down-regulated CellCompartment SPLICEOSOME 0.0012 0.010356 All down-regulated CellCompartment MITOCHONDRIAL_MEMBRANE 0.0014 0.01165 All down-regulated Hallmark HALLMARK_OXIDATIVE_PHOSPHORYLATION 7.00E-04 0.011667 Mixed down-regulated Reactome REACTOME_CELL_CYCLE_MITOTIC 4.00E-04 0.011722 All down-regulated Reactome REACTOME_DIABETES_PATHWAYS 3.00E-04 0.011894 Mixed down-regulated REACTOME_CLEAVAGE_OF_GROWING_TRANSCRIPT_IN_THE_TERMINATION Reactome _REGION_ 3.00E-04 0.011894 Mixed down-regulated CellCompartment EUKARYOTIC_TRANSLATION_INITIATION_FACTOR_3_COMPLEX 0.0016 0.012427 All down-regulated CellCompartment U12_DEPENDENT_SPLICEOSOME 0.0016 0.012427 All down-regulated CellCompartment NUCLEAR_LUMEN 6.00E-04 0.012709 Mixed down-regulated CellCompartment SMALL_NUCLEAR_RIBONUCLEOPROTEIN_COMPLEX 6.00E-04 0.012709 Mixed down-regulated Reactome REACTOME_POST_CHAPERONIN_TUBULIN_FOLDING_PATHWAY 5.00E-04 0.014042 All down-regulated CellCompartment COATED_VESICLE_MEMBRANE 0.0019 0.014281 All down-regulated CellCompartment MITOCHONDRIAL_ENVELOPE 0.002 0.014562 All down-regulated Hallmark HALLMARK_IL2_STAT5_SIGNALING 6.00E-04 0.015 Mixed up-regulated CellCompartment EXTRACELLULAR_REGION 2.00E-04 0.015533 All up-regulated CellCompartment EXTRACELLULAR_REGION_PART 2.00E-04 0.015533 All up-regulated Reactome REACTOME_PLATELET_ACTIVATION_SIGNALING_AND_AGGREGATION 6.00E-04 0.015554 All down-regulated Reactome REACTOME_TRANSCRIPTION_COUPLED_NER_TC_NER 6.00E-04 0.015554 All down-regulated

277

BioProc REGULATION_OF_TRANSCRIPTION_FROM_RNA_POLYMERASE_II_PROMOTER 2.00E-04 0.0165 All down-regulated BioProc NUCLEOTIDE_METABOLIC_PROCESS 3.00E-04 0.017679 All down-regulated BioProc NUCLEOBASENUCLEOSIDE_AND_NUCLEOTIDE_METABOLIC_PROCESS 3.00E-04 0.017679 All down-regulated BioProc TRANSLATIONAL_INITIATION 3.00E-04 0.017679 All down-regulated BioProc ANDROGEN_RECEPTOR_SIGNALING_PATHWAY 3.00E-04 0.017679 All down-regulated CellCompartment VESICLE_COAT 0.0028 0.01977 All down-regulated MolFunc RNA_BINDING 1.00E-04 0.0198 Mixed down-regulated MolFunc STRUCTURAL_CONSTITUENT_OF_RIBOSOME 1.00E-04 0.0198 Mixed down-regulated Reactome REACTOME_DNA_REPAIR 8.00E-04 0.01997 All down-regulated Hallmark HALLMARK_PROTEIN_SECRETION 0.0016 0.02 Mixed down-regulated CellCompartment CHROMOSOME 0.003 0.020559 All down-regulated BioProc CELL_CELL_SIGNALING 1.00E-04 0.020625 All up-regulated Hallmark HALLMARK_DNA_REPAIR 0.0029 0.020714 All down-regulated REGULATION_OF_CELLULAR_COMPONENT_ORGANIZATION_AND_BIOGENES BioProc IS 4.00E-04 0.022 All down-regulated Reactome REACTOME_UNFOLDED_PROTEIN_RESPONSE 6.00E-04 0.022467 Mixed down-regulated Reactome REACTOME_RESPONSE_TO_ELEVATED_PLATELET_CYTOSOLIC_CA2_ 0.001 0.023241 All down-regulated Reactome REACTOME_PROTEIN_FOLDING 0.001 0.023241 All down-regulated CellCompartment CHROMOSOMAL_PART 0.0036 0.0233 All down-regulated CellCompartment INTEGRATOR_COMPLEX 0.0035 0.0233 All down-regulated CellCompartment MACROMOLECULAR_COMPLEX 0.0013 0.0233 Mixed down-regulated CellCompartment CYTOPLASM 0.0013 0.0233 Mixed down-regulated Reactome REACTOME_HEMOSTASIS 0.0011 0.023916 All down-regulated Reactome REACTOME_CYTOSOLIC_TRNA_AMINOACYLATION 0.0011 0.023916 All down-regulated Reactome REACTOME_INFLUENZA_LIFE_CYCLE 7.00E-04 0.024832 Mixed down-regulated Reactome REACTOME_ASPARAGINE_N_LINKED_GLYCOSYLATION 0.0012 0.025275 All down-regulated Hallmark HALLMARK_PROTEIN_SECRETION 0.0012 0.02625 Non-directional Hallmark HALLMARK_IL2_STAT5_SIGNALING 0.0021 0.02625 Non-directional Hallmark HALLMARK_OXIDATIVE_PHOSPHORYLATION 0.0016 0.02625 Non-directional MolFunc TRANSLATION_INITIATION_FACTOR_ACTIVITY 4.00E-04 0.0264 All down-regulated MolFunc RNA_POLYMERASE_II_TRANSCRIPTION_MEDIATOR_ACTIVITY 4.00E-04 0.0264 All down-regulated

278

Reactome REACTOME_LATE_PHASE_OF_HIV_LIFE_CYCLE 0.0013 0.026552 All down-regulated Reactome REACTOME_TRANSCRIPTION 8.00E-04 0.02696 Mixed down-regulated CellCompartment NUCLEUS 7.00E-04 0.027183 Non-directional NUCLEOBASENUCLEOSIDENUCLEOTIDE_AND_NUCLEIC_ACID_METABOLIC_ BioProc PROCESS 1.00E-04 0.0275 Mixed down-regulated CellCompartment MITOCHONDRION 0.0017 0.028293 Mixed down-regulated Reactome REACTOME_HEMOSTASIS 9.00E-04 0.028886 Mixed down-regulated CellCompartment ENDOPLASMIC_RETICULUM 0.0047 0.029597 All down-regulated KEGG KEGG_PURINE_METABOLISM 5.00E-04 0.031 All down-regulated CellCompartment GOLGI_APPARATUS 0.002 0.031067 Mixed down-regulated BioProc BEHAVIOR 2.00E-04 0.033 All up-regulated Reactome REACTOME_RECYCLING_PATHWAY_OF_L1 0.0011 0.0337 Mixed down-regulated Hallmark HALLMARK_MTORC1_SIGNALING 0.0034 0.034 Mixed down-regulated Hallmark HALLMARK_IL2_STAT5_SIGNALING 0.0054 0.035 All up-regulated Hallmark HALLMARK_REACTIVE_OXIGEN_SPECIES_PATHWAY 0.0038 0.035 All up-regulated Hallmark HALLMARK_HYPOXIA 0.0056 0.035 All up-regulated Hallmark HALLMARK_INTERFERON_ALPHA_RESPONSE 0.0045 0.035 All up-regulated BioProc RESPONSE_TO_STRESS 3.00E-04 0.035357 Mixed up-regulated BioProc INTRACELLULAR_RECEPTOR_MEDIATED_SIGNALING_PATHWAY 8.00E-04 0.036667 All down-regulated BioProc PROTEIN_DNA_COMPLEX_ASSEMBLY 8.00E-04 0.036667 All down-regulated BioProc RNA_SPLICING 8.00E-04 0.036667 All down-regulated CellCompartment ENDOMEMBRANE_SYSTEM 0.0061 0.037403 All down-regulated Reactome REACTOME_UNFOLDED_PROTEIN_RESPONSE 0.002 0.038514 All down-regulated REACTOME_REPAIR_SYNTHESIS_FOR_GAP_FILLING_BY_DNA_POL_IN_TC_NE Reactome R 0.002 0.038514 All down-regulated BioProc TRANSCRIPTION_INITIATION_FROM_RNA_POLYMERASE_II_PROMOTER 9.00E-04 0.039079 All down-regulated MolFunc RIBONUCLEOPROTEIN_BINDING 1.00E-04 0.0396 Mixed up-regulated CellCompartment INTRACELLULAR_ORGANELLE_PART 0.0012 0.039943 Non-directional Reactome REACTOME_NUCLEOTIDE_EXCISION_REPAIR 0.0022 0.041189 All down-regulated BioProc ORGANELLE_ORGANIZATION_AND_BIOGENESIS 0.001 0.04125 All down-regulated BioProc BIOPOLYMER_METABOLIC_PROCESS 2.00E-04 0.04125 Non-directional CellCompartment EUKARYOTIC_TRANSLATION_INITIATION_FACTOR_3_COMPLEX 0.0029 0.042231 Mixed down-regulated

279

Reactome REACTOME_HIV_INFECTION 0.0024 0.042568 All down-regulated REACTOME_RESPIRATORY_ELECTRON_TRANSPORT_ATP_SYNTHESIS_BY_CH EMIOSMOTIC_COUPLING_AND_HEAT_PRODUCTION_BY_UNCOUPLING_PROT Reactome EINS_ 0.0024 0.042568 All down-regulated Reactome REACTOME_DCC_MEDIATED_ATTRACTIVE_SIGNALING 0.0025 0.043205 All down-regulated Hallmark HALLMARK_BILE_ACID_METABOLISM 0.0078 0.043333 All up-regulated CellCompartment ORGANELLE_PART 0.0017 0.044011 Non-directional CellCompartment VACUOLE 0.0016 0.044011 Non-directional CellCompartment MEMBRANE_COAT 0.008 0.044381 All down-regulated CellCompartment COATED_MEMBRANE 0.008 0.044381 All down-regulated CellCompartment RIBOSOME 0.0076 0.044381 All down-regulated CellCompartment ENDOPLASMIC_RETICULUM_LUMEN 0.0079 0.044381 All down-regulated CellCompartment NUCLEOLAR_PART 0.0083 0.044974 All down-regulated CellCompartment ENDOPLASMIC_RETICULUM 0.0033 0.045229 Mixed down-regulated Reactome REACTOME_RNA_POL_II_TRANSCRIPTION 0.0027 0.045495 All down-regulated Reactome REACTOME_G2_M_CHECKPOINTS 0.0028 0.046029 All down-regulated CellCompartment NUCLEAR_BODY 0.0088 0.0466 All down-regulated CellCompartment LYTIC_VACUOLE 0.0023 0.048718 Non-directional CellCompartment LYSOSOME 0.0023 0.048718 Non-directional CellCompartment CYTOPLASMIC_VESICLE_PART 0.0042 0.04893 Mixed down-regulated CellCompartment CYTOPLASMIC_VESICLE_MEMBRANE 0.0042 0.04893 Mixed down-regulated CellCompartment MITOCHONDRIAL_RESPIRATORY_CHAIN 0.004 0.04893 Mixed down-regulated BioProc DEFENSE_RESPONSE 6.00E-04 0.0495 Mixed up-regulated BioProc INFLAMMATORY_RESPONSE 5.00E-04 0.0495 Mixed up-regulated BioProc POLYSACCHARIDE_METABOLIC_PROCESS 6.00E-04 0.0495 Mixed up-regulated Reactome REACTOME_RESPONSE_TO_ELEVATED_PLATELET_CYTOSOLIC_CA2_ 0.0017 0.049817 Mixed down-regulated

280

TABLE 6. GENE SETS AND BIOLOGICAL ANNOTATIONS SIGNIFICANTLY ENRICHED IN “GREY60” CO-EXPRESSION MODULE ASSOCIATED WITH BIPOLAR DISORDER. A gene set enrichment strategy was employed based on hypergeometric tests. Gene sets were curated from multiple sources and compiled in one location (Molecular Signature Database version 5). Database Fold BH p- source Biological Annotation enrichment p-value value* Hallmark HALLMARK_HEME_METABOLISM 30.124 1.12E-66 3.60E-65 Immunologic GSE34205_RSV_VS_FLU_INF_INFANT_PBMC_UP 32.917 3.68E-57 5.91E-54 Immunologic GSE34205_HEALTHY_VS_RSV_INF_INFANT_PBMC_DN 23.19 1.28E-41 1.03E-38 Immunologic GSE6269_FLU_VS_STREP_AUREUS_INF_PBMC_DN 9.689 4.34E-11 2.32E-08 Immunologic GSE6269_FLU_VS_STREP_PNEUMO_INF_PBMC_DN 7.189 3.99E-07 0.00016 Reactome REACTOME_METABOLISM_OF_PORPHYRINS 31.559 6.03E-06 0.001399 GSE37416_CTRL_VS_12H_F_TULARENSIS_LVS_NEUTR Immunologic OPHIL_DN 5.953 2.15E-05 0.006909 Immunologic GSE27786_CD4_TCELL_VS_NKCELL_DN 5.492 4.06E-05 0.010857 GSE6269_HEALTHY_VS_STREP_PNEUMO_INF_PBMC_ Immunologic DN 5.26 0.000148 0.033898 KEGG_PORPHYRIN_AND_CHLOROPHYLL_METABOLIS KEGG M 11.046 0.000459 0.034391 *Gene sets are ranked in ascending order of Benjamin-Hochberg adjusted p-values. Gene sets were declared significantly enriched at a BH p-value < .05.

281

TABLE 7. OVERLAP BETWEEN TRANSCRIPTOME- AND GENOME-WIDE ASSOCIATION EVIDENCE LINKING IMMUNE, HISTONE, AND SYNAPTIC GENES WITH SZ AND BD.

Gene set* SZ Brain1 SZ Blood2 BD Brain3 BD Blood4 ABCA1 ATP6V0E1 Immune (419) CADM1 PDLIM5 (p = 30 (p = 3.65e-29) IL17RB (p = 0.44) - 3.61e-10) Histone (125) - 18 (p = 9.04e-18) - - ATP6V0E1 KRAS ETS2 ITGB2 LPIN1 PPARA SHC3 (p = Synapse (360) MAP3K5 NTRK2 (p = 62 (p = 1.14e-59) PPAT PTPRE (p = 7.34e-06) 3.61e-10) 0.017) *gene sets obtained from the Psychiatric Genomics Consortium pathway-analysis study of SZ, major depression, and BD (Network and Pathway Analysis Subgroup of Psychiatric Genomics Consortium, 2015) 1Previous study (genes q < 0.10, k genes = 2,238): (Hess et al., 2016) 2Previous Study (genes q < 0.10, k genes = 92): (Hess et al., 2016) 3Previous Study (genes q < 0.10, k genes = 56): (Seifuddin et al., 2013b) 4Present study (q < 0.10, k genes = 41)

282

TABLE 8. POTENTIAL OF BLOOD-BASED RNA EXPRESSION PROFILES FOR DISORDER-SPECIFIC CLASSIFIERS OF THE MAJOR PSYCHOSES. Performances of blood-based gene expression classifiers of BD with models trained on samples profiled on Affymetrix arrays then tested in a hold-out sample profiled on Illumina arrays, and vice versa.

Models trained on Affymetrix arrays Models trained on Illumina arrays Genes in Mean 95% 95% Mean 95% 95% AUC Mean Classifier classifier AUC Low High AUC Low High difference difference 2 0.5623 0.5616 0.563 0.4403 0.4395 0.4412 0.122 " 17 0.7222 0.7195 0.7249 0.5646 0.5569 0.5724 0.158 " 32 0.709 0.7046 0.7133 0.4809 0.4715 0.4903 0.228 " 47 0.7679 0.7656 0.7702 0.4734 0.4652 0.4816 0.295 " Linear 62 0.7847 0.7831 0.7863 0.5221 0.5133 0.531 0.263 " SVM 77 0.7983 0.7967 0.7999 0.5131 0.5051 0.5211 0.285 " 92 0.805 0.8031 0.8069 0.5359 0.5282 0.5436 0.269 "

100 0.8201 0.8183 0.8218 0.5303 0.5232 0.5374 0.290 0.213 2 0.5621 0.5618 0.5625 0.4437 0.4414 0.446 0.118 " 17 0.6588 0.655 0.6627 0.5116 0.5087 0.5144 0.147 " 32 0.7239 0.7201 0.7276 0.5043 0.4993 0.5093 0.220 " Naïve 47 0.7184 0.716 0.7208 0.463 0.4591 0.4668 0.255 " Bayes 62 0.7223 0.7204 0.7243 0.4998 0.4968 0.5028 0.223 " 77 0.6815 0.6788 0.6842 0.4865 0.4813 0.4917 0.195 " 92 0.6952 0.6924 0.698 0.4905 0.4866 0.4943 0.205 " 100 0.6869 0.6835 0.6903 0.4941 0.4903 0.4979 0.193 0.194 2 0.5305 0.5167 0.5442 0.4798 0.4725 0.4871 0.051 " 17 0.7089 0.6975 0.7204 0.5424 0.5257 0.5591 0.167 " 32 0.7337 0.7243 0.7431 0.4789 0.4646 0.4932 0.255 " Random 47 0.7636 0.754 0.7731 0.5166 0.4992 0.534 0.247 " Forest (50 62 0.7343 0.7231 0.7455 0.5073 0.4964 0.5182 0.227 " trees) 77 0.7237 0.7145 0.7328 0.4991 0.4852 0.5131 0.225 " 92 0.72 0.7103 0.7297 0.5085 0.491 0.526 0.212 " 100 0.7872 0.7778 0.7966 0.5165 0.5039 0.5291 0.271 0.207 Affymetrix samples (total/training set): BD = 49/31, unaffected comparisons = 42/31 Illumina samples (total/training set): BD = 28/21, unaffected comparisons = 39/21

283

FIGURE 1. BOX-AND-WHISKER PLOTS OF GENE EXPRESSION DISTRIBUTION ACROSS 158 SAMPLES INCLUDED IN THE

MEGA-ANALYSIS. Median gene expression intensities are denoted by black horizontal line inside each box. Whiskers extend to no more than 1.5 times the interquartile range. Outlier genes are denoted as points that extend beyond the whiskers. The top panel depicts the expression distribution (log2 scale) across studies in the absence of batch effect controls. The bottom panel shows the effect of having standardized each gene’s expression to the z-scale, which removes between-study variation.

284

FIGURE 2. NORMALIZATION OF MICROARRAY INTENSITY FILES PER STUDY TO ADJUST FOR TECHNICAL SOURCES OF

WITHIN- AND BETWEEN- STUDY VARIATION. (A) GC-RMA, quantile normalized, and log2-scaling of data applied per study does not remove sources of variation that exist between studies. (B) Applying normalization steps in conjunction with z-

285 scaling of data per feature does, however, reduce variation across all samples, thus studies no longer segregate in two- dimensional space.

286

FIGURE 3. FOREST PLOTS COMPARING THE ESTIMATES OF DIFFERENTIAL EXPRESSION BETWEEN BIPOLAR DISORDER

CASES AND UNAFFECTED COMPARISON SUBJECTS. Differential expression statistics were compared between four statistical models: (1) combined subject mega-analysis (Stage 1), (2) combined subject mega-analysis after removing latent sources of variation identified within each study iteratively by SVA algorithm (Stage 2A), (3) combined subject mega-analysis after removing latent sources of variation identified on the entire data set with SVA algorithm (Stage 2B), and (4) inverse-variance weighted fixed-effect meta-analysis. We pooled regression results produced per study after measuring the change in gene expression between bipolar disorder case and unaffected comparison subject with linear models ( gene ~ diagnosis + age). A fixed effect meta-analysis was conducted using the metafor package in R. ARID4B showed best agreement across the four cross-study pooled summary statistics. A comparison of four regressions was done across the top 41 differentially expressed genes (based on Stage 1 results; genes with a significance at q-value < 0.1).

287

FIGURE 4. CONCORDANCE OF DIFFERENTIALLY EXPRESSED GENES IN BD AND SZ FROM BRAIN TRANSCRIPTOME-WIDE

MEGA-ANALYSES (PRESENT STUDY AND SEIFUDDIN ET AL. 2013). (A) Correlation of differential expression statistics

(regression coefficients) for 47 genes dysregulated in the frontal cortex of SZ and BD individuals. (B) HGNC symbols and names of the 47 genes that are differentially expressed in the frontal cortex of SZ and BD individuals. (C) Correlation of differentially expression statistics for 18 genes differentially expressed in the frontal cortex of SZ individuals with a

288 corresponding differential expression p-value < .05 in the hippocampus for BD individuals. (D) HGNC symbols and names of the 18 genes that are differentially expressed in the frontal cortex of SZ individuals and hippocampus of BD individuals.

289

FIGURE 5. DISCORDANCE OF DIFFERENTIALLY EXPRESSED GENES IN BD AND SZ FROM BLOOD-BASED TRANSCRIPTOME-

WIDE MEGA-ANALYSES (PRESENT STUDY AND HESS ET AL. 2016). (A) Correlation of differential expression statistics

290

(regression coefficients) for 352 genes dysregulated in the blood of SZ and BD individuals. (B) HGNC symbols and names of the top 30 differentially expressed genes (ranked by BD p-value).

291

FIGURE 6. GENE CO-EXPRESSION NETWORK ASSOCIATED WITH BD. (A) A gene network was constructed using the combined-sample blood transcriptome data set across six microarray studies. Gene expression data for cases and controls was included in the gene network assembly, then the effect of diagnosis was measured through linear mixed models with gene module expression (i.e., eigengenes) set as the outcome. The “grey60” module showed a modest degree of association with bipolar disorder (uncorrected p < .019). Genes exhibited to have the strongest collective pair-wise correlation to all other genes in the module are shown in the corresponding diagram. Lines connecting any two genes were drawn if their pair-wise correlation exceeded r = 0.7. (B) Approximately 58.9% of gene co-expression networks exhibited strongly preserved network structures across BD cases and unaffected comparison subjects (Z-summary > 10). 34 gene modules were identified in a

292 network analysis of unaffected comparison subjects only; separately, 60 gene modules were detected in BD cases only. Gene module with Z-preservation scores greater than 2 but less than 10 are moderately preserved across groups; gene modules with scores less than 2 are not preserved. The Z-summary score is a composite of multiple measures of network similarities, including module membership (overlapping genes), pair-wise co-expression strengths, and connectivity patterns of genes.

Networks were permuted 100 times to ensure reliable estimates of preservation across groups.

293

FIGURE 7. MACHINE LEARNING CLASSIFIERS OF THE MAJOR PSYCHOSES. Performance of three machine learning algorithms (linear SVM, Naïve Bayes, Random Forest) in correctly predicting diagnostic labels for 12 SZ and 14 BD individuals from one microarray study (Tsuang et al., 2005). Displayed in each plot is the area under the receiver operating characteristic (AUCROC) curve summarized over 100-fold Monte Carlo cross-validations (mean +/- 95% confidence interval).

The Naïve Bayes classifier based on 77 genes showed the best accuracy of all models tested (mean AUCROC = 0.849, 95% CI

= 0.837 – 0.859). Models were fit on gene expression profiles using data from Affymetrix arrays.

294

BIBLIOGRAPHY

Bates, D, Mächler, M, Bolker, B, Walker, S. 2014. Fitting Linear Mixed-Effects Models

using lme4. arXiv:1406.5823v1: 51.

Becking, K, Haarman, BCM, van der Lek, RFR, Grosse, L, Nolen, WA, Claes, S,

Drexhage, HA, Schoevers, RA. 2015. Inflammatory monocyte gene expression: trait

or state marker in bipolar disorder? Int. J. bipolar Disord. 3: 20.

Beech, RD, Leffert, JJ, Lin, A, Sylvia, LG, Umlauf, S, Mane, S, Zhao, H, Bowden, C,

Calabrese, JR, Friedman, ES, Ketter, TA, Iosifescu, D V, Reilly-Harrington, NA,

Ostacher, M, Thase, ME, Nierenberg, A. 2014. Gene-expression differences in

peripheral blood between lithium responders and non-responders in the Lithium

Treatment-Moderate dose Use Study (LiTMUS). Pharmacogenomics J. 14: 182–91.

Beech, RD, Lowthert, L, Leffert, JJ, Mason, PN, Taylor, MM, Umlauf, S, Lin, A, Lee,

JY, Maloney, K, Muralidharan, A, Lorberg, B, Zhao, H, Newton, SS, Mane, S,

Epperson, CN, Sinha, R, Blumberg, H, Bhagwagar, Z. 2010. Increased peripheral

blood expression of electron transport chain genes in bipolar depression. Bipolar

Disord. 12: 813–24.

Bousman, CA, Chana, G, Glatt, SJ, Chandler, SD, Lucero, GR, Tatro, E, May, T, Lohr,

JB, Kremen, WS, Tsuang, MT, Everall, IP. 2010. Preliminary evidence of ubiquitin

proteasome system dysregulation in schizophrenia and bipolar disorder: convergent

pathway analysis findings from two independent samples. Am. J. Med. Genet. B.

Neuropsychiatr. Genet. 153B: 494–502.

Brozzi, F, Arcuri, C, Giambanco, I, Donato, R. 2009. S100B protein regulates astrocyte

shape and migration via interaction with Src kinase: Implications for astrocyte

295

development, activation, and tumor growth. J. Biol. Chem. 284: 8797–8811.

Bulik-Sullivan, B, Finucane, HK, Anttila, V, Gusev, A, Day, FR, Consortium, R,

Genomics Consortium, P, of the Wellcome Trust Consortium, GC for A, Perry, JRB,

Patterson, N, Robinson, E, Daly, MJ, Price, AL, Neale, BM. 2015. An Atlas of

Genetic Correlations across Human Diseases and Traits. bioRxiv: 1–44.

Chen, C, Cheng, L, Grennan, K, Pibiri, F, Zhang, C, Badner, JA, Gershon, ES, Liu, C.

2013a. Two gene co-expression modules differentiate psychotics and controls. Mol.

Psychiatry 18: 1308–14.

Chen, H, Wang, N, Zhao, X, Ross, CA, O’Shea, KS, McInnis, MG. 2013b. Gene

expression alterations in bipolar disorder postmortem brains. Bipolar Disord. 15:

177–87.

Clelland, CL, Read, LL, Panek, LJ, Nadrich, RH, Bancroft, C, Clelland, JD. 2013.

Utilization of never-medicated bipolar disorder patients towards development and

validation of a peripheral biomarker profile. PLoS One 8: e69082.

Elashoff, M, Higgs, BW, Yolken, RH, Knable, MB, Weis, S, Webster, MJ, Barci, BM,

Torrey, EF. 2007. Meta-analysis of 12 genomic studies in bipolar disorder. J. Mol.

Neurosci. 31: 221–43.

Fillman, SG, Sinclair, D, Fung, SJ, Webster, MJ, Shannon Weickert, C. 2014. Markers of

inflammation and stress distinguish subsets of individuals with schizophrenia and

bipolar disorder. Transl. Psychiatry 4: e365.

Gaujoux, R, Seoighe, C. 2013. CellMix: a comprehensive toolbox for gene expression

deconvolution. Bioinformatics 29: 2211–2.

Georgiev, D, Arion, D, Enwright, JF, Kikuchi, M, Minabe, Y, Corradi, JP, Lewis, DA,

296

Hashimoto, T. 2014. Lower gene expression for KCNS3 potassium channel subunit

In parvalbumin-containing neurons In the prefrontal cortex in schizophrenia. Am. J.

Psychiatry 171: 62–71.

Goes, FS, Mcgrath, J, Avramopoulos, D, Wolyniec, P, Pirooznia, M, Ruczinski, I,

Nestadt, G, Kenny, EE, Vacic, V, Peters, I, Lencz, T, Darvasi, A, Mulle, JG,

Warren, ST, Pulver, AE. 2015. Genome-wide association study of schizophrenia in

Ashkenazi Jews. Am. J. Med. Genet. Part B Neuropsychiatr. Genet. 168: 649–659.

Greenwood, TA, Akiskal, HS, Akiskal, KK, Kelsoe, JR. 2012. Genome-wide association

study of temperament in bipolar disorder reveals significant associations with three

novel loci. Biol. Psychiatry 72: 303–310.

Haenisch, F, Cooper, JD, Reif, A, Kittel-Schneider, S, Steiner, J, Leweke, FM,

Rothermundt, M, van Beveren, NJM, Crespo-Facorro, B, Niebuhr, DW, Cowan,

DN, Weber, NS, Yolken, RH, Penninx, BWJH, Bahn, S. 2016. Towards a blood-

based diagnostic panel for bipolar disorder. Brain. Behav. Immun. 52: 49–57.

Harvey, AG, Talbot, LS, Gershon, A. 2009. Sleep disturbance in bipolar disorder across

the lifespan. Clin. Psychol. Sci. Pract. 16: 256–277.

Hess, JL, Tylee, DS, Barve, R, de Jong, S, Ophoff, RA, Kumarasinghe, N, Tooney, P,

Schall, U, Gardiner, E, Beveridge, NJ, Scott, RJ, Yasawardene, S, Perera, A,

Mendis, J, Carr, V, Kelly, B, Cairns, M, Tsuang, MT, Glatt, SJ, Glatt, SJ. 2016.

Transcriptome-wide mega-analyses reveal joint dysregulation of immunologic genes

and transcription regulators in brain and blood in schizophrenia. Schizophr. Res.

176: 114–124.

Kong, SW, Collins, CD, Shimizu-Motohashi, Y, Holm, IA, Campbell, MG, Lee, IH,

297

Brewster, SJ, Hanson, E, Harris, HK, Lowe, KR, Saada, A, Mora, A, Madison, K,

Hundley, R, Egan, J, McCarthy, J, Eran, A, Galdzicki, M, Rappaport, L, Kunkel,

LM, Kohane, IS. 2012. Characteristics and predictive value of blood transcriptome

signature in males with autism spectrum disorders. PLoS One 7: e49475.

Langfelder, P, Luo, R, Oldham, MC, Horvath, S. 2011. Tutorials on module preservation.

Latalova, K, Prasko, J, Diveky, T, Velartova, H. 2011. Cognitive impairment in bipolar

disorder. Biomed. Pap. Med. Fac. Univ. Palacky. Olomouc. Czech. Repub. 155: 19–

26.

Madison, JM, Zhou, F, Nigam, A, Hussain, A, Barker, DD, Nehme, R, van der Ven, K,

Hsu, J, Wolf, P, Fleishman, M, O’Dushlaine, C, Rose, S, Chambert, K, Lau, FH,

Ahfeldt, T, Rueckert, EH, Sheridan, SD, Fass, DM, Nemesh, J, Mullen, TE,

Daheron, L, McCarroll, S, Sklar, P, Perlis, RH, Haggarty, SJ. 2015. Characterization

of bipolar disorder patient-specific induced pluripotent stem cells from a family

reveals neurodevelopmental and mRNA expression abnormalities. Mol. Psychiatry

20: 703–17.

Matigian, N, Windus, L, Smith, H, Filippich, C, Pantelis, C, McGrath, J, Mowry, B,

Hayward, N. 2007. Expression profiling in monozygotic twins discordant for bipolar

disorder reveals dysregulation of the WNT signalling pathway. Mol. Psychiatry 12:

815–25.

McGuffin, P, Rijsdijk, F, Andrew, M, Sham, P, Katz, R, Cardno, A. 2003. The

heritability of bipolar affective disorder and the genetic relationship to unipolar

depression. Arch. Gen. Psychiatry 60: 497–502.

Middleton, FA, Pato, CN, Gentile, KL, McGann, L, Brown, AM, Trauzzi, M, Diab, H,

298

Morley, CP, Medeiros, H, Macedo, A, Azevedo, MH, Pato, MT. 2005. Gene

expression analysis of peripheral blood leukocytes from discordant sib-pairs with

schizophrenia and bipolar disorder reveals points of convergence between genetic

and functional genomic approaches. Am. J. Med. Genet. B. Neuropsychiatr. Genet.

136B: 12–25.

Moriyama, M, Sato, T, Inoue, H, Fukuyama, S, Teranishi, H, Kangawa, K, Kano, T,

Yoshimura, A, Kojima, M. 2005. The neuropeptide neuromedin U promotes

inflammation by direct activation of mast cells. J. Exp. Med. 202: 217–24.

Munkholm, K, Peijs, L, Vinberg, M, Kessing, L V. 2015. A composite peripheral blood

gene expression measure as a potential diagnostic biomarker in bipolar disorder.

Transl. Psychiatry 5: e614.

Murray, RM, Sham, P, Van Os, J, Zanelli, J, Cannon, M, McDonald, C. 2004. A

developmental model for similarities and dissimilarities between schizophrenia and

bipolar disorder. Schizophr Res 71: 405–416.

Network and Pathway Analysis Subgroup of Psychiatric Genomics Consortium, TN and

PAS of the PG. 2015. Psychiatric genome-wide association study analyses implicate

neuronal, immune and histone pathways. Nat. Neurosci. 18: 199–209.

Padmos, RC, Hillegers, MHJ, Knijff, EM, Vonk, R, Bouvy, A, Staal, FJT, de Ridder, D,

Kupka, RW, Nolen, WA, Drexhage, HA. 2008. A discriminating messenger RNA

signature for bipolar disorder formed by an aberrant expression of inflammatory

genes in monocytes. Arch. Gen. Psychiatry 65: 395–407.

Patel, JP, Frey, BN. 2015. Disruption in the Blood-Brain Barrier: The Missing Link

between Brain and Body Inflammation in Bipolar Disorder? Neural Plast. 2015:

299

708306.

Power, RA, Steinberg, S, Bjornsdottir, G, Rietveld, CA, Abdellaoui, A, Nivard, MM,

Johannesson, M, Galesloot, TE, Hottenga, JJ, Willemsen, G, Cesarini, D, Benjamin,

DJ, Magnusson, PKE, Ullén, F, Tiemeier, H, Hofman, A, van Rooij, FJA, Walters,

GB, Sigurdsson, E, Thorgeirsson, TE, Ingason, A, Helgason, A, Kong, A,

Kiemeney, LA, Koellinger, P, Boomsma, DI, Gudbjartsson, D, Stefansson, H,

Stefansson, K. 2015. Polygenic risk scores for schizophrenia and bipolar disorder

predict creativity. Nat. Neurosci. 18: 953–5.

Purcell, SM, Wray, NR, Stone, JL, Visscher, PM, O’Donovan, MC, Sullivan, PF, Sklar,

P. 2009. Common polygenic variation contributes to risk of schizophrenia and

bipolar disorder. Nature 460: 748–752.

Reinecke, F, Levanets, O, Olivier, Y, Louw, R, Semete, B, Grobler, A, Hidalgo, J,

Smeitink, J, Olckers, A, Van der Westhuizen, FH. 2006. Metallothionein isoform 2A

expression is inducible and protects against ROS-mediated cell death in rotenone-

treated HeLa cells. Biochem. J. 395: 405–15.

Rothermundt, M, Peters, M, Prehn, JHM, Arolt, V. 2003. S100B in brain damage and

neurodegeneration. Microsc. Res. Tech. 60: 614–632.

Ruderfer, DM, Fanous, AH, Ripke, S, McQuillin, A, Amdur, RL, Gejman, P V,

O’Donovan, MC, Andreassen, OA, Djurovic, S, Hultman, CM, Kelsoe, JR, Jamain,

S, Landén, M, Leboyer, M, Nimgaonkar, V, Nurnberger, J, Smoller, JW, Craddock,

N, Corvin, A, Sullivan, PF, Holmans, P, Sklar, P, Kendler, KS. 2013. Polygenic

dissection of diagnosis and clinical dimensions of bipolar disorder and

schizophrenia. Mol. Psychiatry.

300

Ryan, MM, Lockstone, HE, Huffaker, SJ, Wayland, MT, Webster, MJ, Bahn, S. 2006.

Gene expression analysis of bipolar disorder reveals downregulation of the ubiquitin

cycle and alterations in synaptic genes. Mol. Psychiatry 11: 965–78.

Savitz, J, Frank, MB, Victor, T, Bebak, M, Marino, JH, Bellgowan, PSF, McKinney, BA,

Bodurka, J, Kent Teague, T, Drevets, WC. 2013. Inflammation and neurological

disease-related genes are differentially expressed in depressed patients with mood

disorders and correlate with morphometric and functional imaging abnormalities.

Brain. Behav. Immun. 31: 161–71.

Seifuddin, F, Mahon, PB, Judy, J, Pirooznia, M, Jancic, D, Taylor, J, Goes, FS, Potash,

JB, Zandi, PP. 2012. Meta-analysis of genetic association studies on bipolar

disorder. Am. J. Med. Genet. B. Neuropsychiatr. Genet. 159B: 508–18.

Seifuddin, F, Pirooznia, M, Judy, JT, Goes, FS, Potash, JB, Zandi, PP. 2013a. Systematic

review of genome-wide gene expression studies of bipolar disorder. BMC

Psychiatry 13: 213.

Seifuddin, F, Pirooznia, M, Judy, JT, Goes, FS, Potash, JB, Zandi, PP. 2013b. Systematic

review of genome-wide gene expression studies of bipolar disorder. BMC

Psychiatry 13: 213.

Sekar, A, Bialas, A, de Rivera, H, Davis, H, Hammond, T, Kamitaki, N, Tooley, K,

Presumey, J, Baum, M, Van Doren, V, Genovese, G, Rose, S, Handsaker, R,

Schizophrenia Working Group of the Psychiatric Genomics Consortium, Daly, MJ,

Carroll, MC, Stevens, B, McCarrol, SA. 2016. Schizophrenia risk from complex

variation of complement component 4. Nature: 177–83.

Sklar, P, Ripke, S, Scott, LJ, Andreassen, OA, Cichon, S, Craddock, N, Edenberg, HJ,

301

Psychiatric GWAS Consortium Bipolar Disorder Working Group. 2011. Large-scale

genome-wide association analysis of bipolar disorder identifies a new susceptibility

locus near ODZ4. Nat Genet 43: 977–983.

Theoharides, TC, Zhang, B, Conti, P. 2011. Decreased Mitochondrial Function and

Increased Brain Inflammation in Bipolar Disorder and Other Neuropsychiatric

Diseases. J. Clin. Psychopharmacol. 31: 685–687.

Tsuang, MT, Nossova, N, Yager, T, Tsuang, M-M, Guo, S-C, Shyu, KG, Glatt, SJ, Liew,

CC. 2005. Assessing the validity of blood-based gene expression profiles for the

classification of schizophrenia and bipolar disorder: a preliminary report. Am. J.

Med. Genet. B. Neuropsychiatr. Genet. 133B: 1–5.

Tylee, DS, Kawaguchi, DM, Glatt, SJ. 2013. On the outside, looking in: a review and

evaluation of the comparability of blood and brain “-omes”. Am. J. Med. Genet. B.

Neuropsychiatr. Genet. 162B: 595–603.

Veyrieras, J-B, Kudaravalli, S, Kim, SY, Dermitzakis, ET, Gilad, Y, Stephens, M,

Pritchard, JK. 2008. High-resolution mapping of expression-QTLs yields insight

into human gene regulation. PLoS Genet. 4: e1000214.

Zeng, H, Shen, EH, Hohmann, JG, Oh, SW, Bernard, A, Royall, JJ, Glattfelder, KJ,

Sunkin, SM, Morris, JA, Guillozet-Bongaarts, AL, Smith, KA, Ebbert, AJ,

Swanson, B, Kuan, L, Page, DT, Overly, CC, Lein, ES, Hawrylycz, MJ, Hof, PR,

Hyde, TM, Kleinman, JE, Jones, AR. 2012. Large-scale cellular-resolution gene

profiling in human neocortex reveals species-specific molecular signatures. Cell

149: 483–96.

Zhang, X, Gierman, HJ, Levy, D, Plump, A, Dobrin, R, Goring, HH, Curran, JE,

302

Johnson, MP, Blangero, J, Kim, SK, O’Donnell, CJ, Emilsson, V, Johnson, AD.

2014. Synthesis of 53 tissue and cell line expression QTL datasets reveals master eQTLs. BMC Genomics 15: 532.

303

DISCUSSION AND FINAL REMARKS

HIGHLIGHTS AND MAIN FINDINGS

I presented two review papers and three studies in this dissertation that followed a common goal: better understanding the role of genes, molecules, and biological pathways in SZ and BD. After two decades of genome- and transcriptome-wide studies, we have yet to scratch the surface in regards to the complex nature of these disorders but this work has afforded us invaluable clues. A fundamental finding that emerged is that common polygenic variation associated with risk for SZ and BD plays a role in the regulation of gene expression. Building on this, SZ and BD share a significant amount of polygenic variation, which implies that there are common molecular substrates and overlapping pathophysiology. In other words, the boundary between SZ and BD appears to be blurred.

This poses an interesting problem which has culminated in clinically relevant lines of research. Genetic and biological substrates that are common to SZ and BD might be used as objective indicators of mental illness and potentially inform us of drug targets that can better treat these disorders. Moreover, identifying genetic and molecular differences between SZ and BD hold promise for point disorder-specific biomarkers, which may offer insight into pathophysiological trajectories specific to each disorder.

I sought to accomplish three main goals with this dissertation: (1) establish a pipeline to decode functional relationship of risk variants for SZ and BD, (2) identify genes and gene networks that are abnormally expressed in SZ and BD and infer relationships with GWAS evidence, and (3) demonstrate how machine learning techniques can be used to classify SZ and BD based on their gene expression profiles.

This culminated in development of two major pipelines.

304

GWAS signals can be vast and complicated, and will often yield more questions than answers. The first pipeline that I developed in this thesis is now featured as a software tool called FLEET, which can help to catalyze future research. I used this tool to decipher functional relationships of risk-conferring variants associated with SZ and BD.

My analysis of GWASs for SZ and BD with FLEET uncovered several interesting associations including: (1) gene sets related to histone modification, energy/metabolism, neurotransmitter regulation, and the synapse, (2) histone markers involved in transcriptional activation and alternative splicing that are highly enriched with risk variants in the brain and immune system, and (3) transcription factor and RNA-binding protein target sites that are enriched with risk variants. This tool provides a basis for interpreting the underlying biology of GWAS signals for complex disorders beyond psychiatry, and offers a foot hold for investigators to prioritize risk genes for follow-up analyses. I uncovered pathways and annotations that were shared and unique between SZ and BD, which helps with interpreting evidence of genetic overlap between these disorders, including risk variants that are jointly associated. This work can inform future hypothesis-driven epistasis tests, pathway-based polygenic scoring of disorder-specific variants, and experimental follow-up analyses to decipher mechanisms that regulate expression of SZ and BD risk genes.

In light of this evidence, I sought evidence of gene networks in the transcriptome that are dysregulated in SZ and BD, and then to synthesize these networks with genomic evidence. There are several small transcriptome-wide studies of SZ and BD in the literature that suffer from drawbacks, namely inconsistency between findings. I developed a pipeline to uniformly pre-process and analyze microarray data acquired from

305 multiple laboratories to obtain a best-estimate of gene expression dysregulation at the level of single genes and gene networks. I applied this pipeline to study the transcriptomes of SZ and BD, which yielded convergent findings of altered neuronal development and neurotransmission, immune signaling cascades, and energy/metabolism.

I demonstrated that gene networks in the brain are recapitulated in the blood; moreover, that the transcriptome signatures of SZ and BD are highly concordant in brain tissue but exhibit opposite relationships in the blood. This led to creation of a machine learning classifier based on blood-based gene expression data that showed high accuracy and promising generalizability. This work is setting the stage for poly-omic studies, which will hopefully link together genomic, epigenomic, and molecular milieus into a model that best explains risk for SZ and BD and point us toward better drug targets.

TOWARDS POLY-OMIC PREDICTORS OF SZ AND BD

Early detection and intervention is essential to reducing the burden of mental illness. Painstaking efforts have been put forth to identify diagnostic/prognostic biomarkers of psychiatric disorders, yet progress is hindered by our limited etiological knowledge and complications that technologies have yet to resolve. Conceptualizations of SZ and BD have evolved considerably over the past century. There are aspects of pathophysiology that remain elusive, however, research has helped identify new candidate genes, molecules, and circuits associated with SZ and BD and has shaped our perspective of etiology. One of the driving forces behind the shift was technological advances in molecular genetics, which enabled investigators to study effects of candidate genes on susceptibility in cohorts and then tease apart the underlying biology and

306 physiological responses related to risk genes in vitro and in vivo. Whole-genome genotyping arrays and wide-spread adoption of GWAS also contributed substantially to the shift in knowledge. GWAS made it possible to test for differences in allele frequency of variants associated with risk across all variants simultaneously. This became an essential tool in psychiatry for two main reasons: (1) serving as confirmation that the phenotypes of SZ and BD have a relationship to biology, and (2) providing a means to discover new genes and variants that confer risk for SZ and BD. These same principals applied to transcriptome-wide studies, which were able to provide detailed information about transcripts and pathways related to SZ and BD, and also studies that surveyed the epigenome for changes in the accessibility of DNA that might relate to risk.

Collectively, evidence points to there being a cumulative backdrop of genetic and epigenetic vulnerabilities that can trigger a pathophysiological cascade that eventually precipitates into mental illness; however, there is not one or few factors that will definitively trigger pathophysiology. Evidence from genome-wide analyses demonstrated common polygenic variation plays an important role in SZ and BD liability (Purcell et al.

2009), but common variation itself only accounts for a fraction of risk. This highlights the important concept of “missing heritability”, which can be demonstrated by summing up all of the risk-conferring effects of all risk genes in the genome and finding that this polygenic load fails to capture the total twin-estimated heritability for SZ and BD (Power et al. 2015; Ripke et al. 2014a; Ruderfer et al. 2016; Zhu et al. 2015). Some have suspected that missing heritability can be recovered by account for rare variants that contribute stronger effects relative to common variation; however, evidence from whole- exome sequencing studies suggests that burden of rare variation explains < 1% of total

307 risk (Genovese et al. 2016; Marshall et al. 2016; Purcell et al. 2014). Researchers have also looked into gene-gene interactions as a potentially significant source of heritability

(Ripke et al. 2014a), yet there remains to be definitive evidence that epistasis is a major driver of SZ or BD risk. This does not exclude the possibility that epistasis is an important component to etiology, however genome-wide significant epistatic interactions have yet to emerge from the largest data available. Gene-environment interactions may help close the gap in missing heritability. Data from population-level registers affords insight into gene-environmental interactions on risk for mental illness at a broad level, and there is promising evidence emerging that genetic predisposition interacts with early environmental exposures (i.e., prenatal infections, low birth weight, early stressful life events) to modify risk (Lichtenstein et al. 2009; Uher 2014). Specific genes that interact with environmental factors need to be examined through systematic whole-genome analyses, requiring whole-genome genotyping on rigorously phenotyped samples.

Longitudinal studies like as the Swedish Schizophrenia Study and the Autism Birth

Cohort have laid the groundwork for this endeavor in psychiatry (Ruderfer et al. 2016;

Stoltenberg et al. 2010).

Missing heritability is a complex issue to solve. Beyond these particular study designs, large-scale poly-omics investigations are starting to blossom and will bring usher the field into a new generation that can better understand the multifaceted nature of SZ and BD etiology. Projects such as the CommonMind Consortium and PsychENCODE are leading efforts to integrate data from the genome, epigenome, transcriptome, and proteome from large cohorts in order to identify credible causal relationships between risk factors and psychiatric disorders. This is critical element that has been lacking from

308 recent studies but may help explain missing heritability. Transcriptomic studies of data sets assembled by these consortia have recently appeared for SZ (Fromer et al. 2016) and autism (Wu et al. 2016). These offer roadmaps of transcriptomic variation (coding and non-coding) for future mechanistic work. Some of this work has offered causal mechanisms, such as dysregulation in the SZ-associated genes FURIN, TSNARE1, and

CNTN4 leading to neurodevelopmental abnormalities in zebrafish. These data sets have yet to realize their full potential. We are on the crux of learning how genetic variation, coding and non-coding RNAs, splice isoforms, DNA methylation patterns, epigenomic landscapes, and proteins are related en masse to influence pathophysiology for mental illness.

Complementing these efforts is on-going work by the Enhancing Neuro Imaging

Genetics through Meta-Analysis (ENIGMA) Network, which has been bringing together neuro-imaging data from thousands of individuals to identify anatomical and physiological abnormalities in the brain related to psychiatric disorders. The ENIGMA network is also collecting genomic data on these subjects to bridge the gap between risk factors in the genome and endo-phenotypes for mental illness. This is a particularly powerful resource for the future of poly-omic inquiry, and provides a platform for causal inference and the identification of biological sub-types of psychiatric disorders.

Etiological heterogeneity (i.e., affected individuals possessing different genetic backgrounds or other vulnerabilities) poses a significant obstacle to classifiers for psychiatric disorders. There are sophisticated machine learning techniques that could be used to address this challenge. One group applied a machine learning technique based on non-negative matrix factorization and unsupervised bi-clustering to GWAS data as a way

309 to extract clinical sub-types of SZ (Arnedo et al. 2014). This technique was later applied to neuro-imaging data by the same group to recover clusters of patients with similar symptomatology and brain abnormalities (Arnedo et al. 2015) . These studies provide tantalizing links and may help to steer future research for SZ and BD. Research domain criteria (RDoC) is a complementary perspective that may utilize data-driven methods to extract objective biological indicators of mental health and illness, such as reclassifying aspects of mental illnesses into disruptions of reward system processing and then studying the genes that predispose to these abnormalities (Hess, Kawaguchi, et al. 2016;

Morris and Cuthbert 2012). This introduces a new way to study mental illness beyond rigid psychiatric nosology and dichotomous relationships. Although psychiatric diagnoses have purpose and validity, RDoC offers a strategic plan for us to study how genetic and biological variation plays a role in the continuum of brain functioning, behavior, cognition, and mood. This might help us to identify genes, pathways, cells, and circuits that are fundamental for “normal” dimensions of mental health, which might, in turn, lead to a better understanding of psychiatric etiology.

This dissertation may serve as a basis for future RDoC studies that seek to deconstruct SZ and BD into endophenotypes to acquire genetic and molecular signatures that more closely relate to abnormalities in brain function or plasticity. However, a limitation of my work is that it was constrained by DSM diagnoses as opposed to RDoC’s dimensional constructs of mental health. Work that I have not been able to include in this dissertation but am actively pursuing is: the identification of gene and splicing networks in the brain whose expression is associated with genetic burden for SZ and BD.

Mental illness can wax and wane through a person’s lifetime. However, DNA is mostly

310 static thus variation in gene expression that is related to genetic risk may be more informative than diagnosis-centric differential expression analyses. Furthermore, integrating genome- and transcriptome-wide data into a single study such as this affords opportunity for rigorous quality control (i.e., inferring ancestry from DNA variation as opposed to self-reported data, or uncovering cryptically related subjects), inferring causal relationships between genetic variation and gene expression, and “imputing” effects of environment (i.e., DNA methylation marks) based on prior knowledge and predictive models trained on reference data sets (Rawlik, Rowlatt, and Tenesa 2016). Inclusion of epigenomic data into poly-omic analyses also provides a basis to infer missing/unreported covariates that might should be accounted for to avoid spurious signals from a gene expression analysis, such as subject’s age, smoking habits, or source of RNA (or cell stratification) (Wu et al. 2016).

PERSON-CENTERED APPROACHES IN THE FUTURE OF PSYCHIATRY

Mega-biobanks are springing up all over the globe to accumulate hundreds of thousands to millions of genomes and clinical data to study complex diseases. Some of these efforts are being led by the Million Veteran Program, UK Biobank, 23andMe, and

Resilience Project (Chen et al. 2016; Gaziano et al. 2016; Ge et al. 2016). These are invaluable resources for the post-GWAS era of psychiatry. It is expected most of the common genetic variation that underlies SZ, BD, and other complex neuropsychiatric disorders will be discovered in the coming years of GWASs. Animal models and induced-pluripotent stem cells will a principal target for researchers to study the effects of risk genes on neurodevelopmental, neurobehavioral-cognitive phenotypes, and

311 underlying physiological changes; in addition to rational drug design. However, a common downside to psychiatric GWASs is that they have not been person- or patient- population centered. GWAS can become patient-centered by rationally using phenotypic and clinical data to stratify patients into relatively homogeneous groups, which has been put to practice in the genetic studies of clozapine-induced agranulocytosis, a rare but fatal side effect (Goldstein et al. 2014), and, GWAS of response to lithium treatment in BD

(Hou et al. 2016). Psychiatry can benefit from mega-biobanks through studies that identify rare mutations with theragnostic value. Also, mega-biobanks are a gateway for phenome-wide association studies (PheWASs) for identification of pleiotropic genes and investigating the genetic (dis)similarity of phenotypes. PheWAS can help to inform the drug design process, such as predicting ectopic effects of altered gene function or for drug repositioning. Knowledge gained from studies of gene networks and functional relationships of variants, including the work in this dissertation, will be critically important to translating genetic and molecular signatures of mental illness into theragnostic biomarkers and treatment strategies.

FINAL REMARKS

The pathophysiology of SZ and BD is still murky, but we have learned a great deal about their genetic structure and molecular profiles from genome- and transcriptome-wide studies. Furthermore, we have a better understanding of abnormal genes and pathways that cut across disorder boundaries. There is recurring evidence that strongly implicates immune-related genes with risk for SZ and BD. The connection between the immune system and the brain has been of long-standing interest in the field

312 of psychiatry. The major histocompatibility (MHC) region bears one of the strongest risk signals for SZ and BD (Cross Disorder Group of the Psychiatric Genomics Consortium

2013; Ripke et al. 2014b; Ruderfer et al. 2013). Our knowledge of the interplay between immune system genes and the brain is nascent. Evidence of familial linkage, co- morbidity, and genetic correlations suggests that there may be overlapping etiologies between autoimmune and psychiatric disorders (Benros et al. 2013; Rege and

Hodgkinson 2013; Tylee et al. 2016). It might be that abnormal in the immunological milieu participate in neurodevelopment, neurotransmission, plasticity, and cellular resilience. A recent study discovered a structural variant in the MHC region that alters expression of the gene C4 (Sekar et al. 2016) that significantly increases risk for SZ. C4 plays an important role in innate immunity by marking foreign pathogens for removal

(Carroll 2008), and was shown to effect synaptic pruning (Sekar et al. 2016).

Dysregulation of immune signaling cascades were one of the most prominent signatures of SZ and BD from my transcriptomic studies (Hess, Tylee, et al. 2016), suggesting that immune disruptions are fundamental to pathophysiology. There are obstacles that need to be overcome for us to interpret these “omics” findings: (1) causal links have yet to be fully elucidated between genetic variation and gene expression, (2) we do not yet know the outcome of altered transcript levels on the translation of proteins, protein activity, physiological effects on brains cells and circuits, or higher-order brain functions, and (3) our transcriptomic analyses to date have not been able to differentiate between gene expression irregularities at the level of individual cell populations compared to bulk homogenized tissue. It is feasible that we can overcome these obstacles and, in turn, discover new pathways toward susceptibility for psychiatric disorders. Our

313 conceptualization of SZ and BD has changed immensely over the past century due to tour-de-force efforts to find genetic and molecular variation related these disorders. I envision future poly-omics studies making unprecedented contributions to our knowledge of pathophysiology and the genome, and shifting psychiatric nosology towards a focus on objective indicators of mental illness.

314

BIBLIOGRAPHY

Arnedo, Javier et al. 2014. “Uncovering the Hidden Risk Architecture of the

Schizophrenias: Confirmation in Three Independent Genome-Wide Association

Studies.” The American journal of psychiatry 172(2): 139–53.

http://www.ncbi.nlm.nih.gov/pubmed/25219520 (December 9, 2014).

Arnedo, Javier et al. 2015. “Decomposition of Brain Diffusion Imaging Data Uncovers

Latent Schizophrenias with Distinct Patterns of White Matter Anisotropy.”

NeuroImage 120: 43–54. http://www.ncbi.nlm.nih.gov/pubmed/26151103 (April 15,

2017).

Benros, Michael E. et al. 2013. “Autoimmune Diseases and Severe Infections as Risk

Factors for Mood Disorders: A Nationwide Study.” JAMA Psychiatry 70(8): 812–

20.

Carroll, Michael C. 2008. “Complement and Humoral Immunity.” Vaccine 26(SUPPL.

8).

Chen, Rong et al. 2016. “Analysis of 589,306 Genomes Identifies Individuals Resilient to

Severe Mendelian Childhood Diseases.” Nature Biotechnology 34(5): 531–38.

http://www.nature.com/doifinder/10.1038/nbt.3514 (November 23, 2016).

Cross Disorder Group of the Psychiatric Genomics Consortium. 2013. “Identification of

Risk Loci with Shared Effects on Five Major Psychiatric Disorders: A Genome-

Wide Analysis.” Lancet 381(9875): 1371–79.

http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3714010&tool=pmcentr

ez&rendertype=abstract.

Fromer, Menachem et al. 2016. “Gene Expression Elucidates Functional Impact of

315

Polygenic Risk for Schizophrenia.” Nature Neuroscience 19(11): 1442–53.

http://www.nature.com/doifinder/10.1038/nn.4399 (February 23, 2017).

Gaziano, John Michael et al. 2016. “Million Veteran Program: A Mega-Biobank to Study

Genetic Influences on Health and Disease.” Journal of Clinical Epidemiology 70:

214–23. http://www.sciencedirect.com/science/article/pii/S0895435615004448

(April 15, 2017).

Ge, Tian et al. 2016. “Phenome-Wide Heritability Analysis of the UK Biobank.” bioRxiv:

70177. http://biorxiv.org/lookup/doi/10.1101/070177.

Genovese, Giulio et al. 2016. “Increased Burden of Ultra-Rare Protein-Altering Variants

among 4,877 Individuals with Schizophrenia.” Nature Neuroscience 19(11): 1433–

41. http://www.nature.com/doifinder/10.1038/nn.4402.

Goldstein, Jacqueline I et al. 2014. “Clozapine-Induced Agranulocytosis Is Associated

with Rare HLA-DQB1 and HLA-B Alleles.” Nature communications 5: 4757.

http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=4155508&tool=pmcentr

ez&rendertype=abstract.

Hess, Jonathan L, Daniel M Kawaguchi, et al. 2016. “The Influence of Genes on

"positive Valence Systems" Constructs: A Systematic Review.”

American journal of medical genetics. Part B, Neuropsychiatric genetics : the

official publication of the International Society of Psychiatric Genetics 171B(1): 92–

110. http://doi.wiley.com/10.1002/ajmg.b.32382 (December 12, 2016).

Hess, Jonathan L., Daniel S. Tylee, et al. 2016. “Transcriptome-Wide Mega-Analyses

Reveal Joint Dysregulation of Immunologic Genes and Transcription Regulators in

Brain and Blood in Schizophrenia.” Schizophrenia Research 176(2–3): 114–24.

316

http://www.ncbi.nlm.nih.gov/pubmed/27450777 (December 12, 2016).

Hou, Liping et al. 2016. “Genetic Variants Associated with Response to Lithium

Treatment in Bipolar Disorder: A Genome-Wide Association Study.” The Lancet

387(10023): 1085–93. http://www.ncbi.nlm.nih.gov/pubmed/26806518 (April 15,

2017).

Lichtenstein, Paul et al. 2009. “Common Genetic Determinants of Schizophrenia and

Bipolar Disorder in Swedish Families: A Population-Based Study.” Lancet

373(9659): 234–39. http://www.ncbi.nlm.nih.gov/pubmed/19150704.

Marshall, Christian R et al. 2016. “Contribution of Copy Number Variants to

Schizophrenia from a Genome-Wide Study of 41,321 Subjects.” Nature Genetics

49(1): 27–35. http://www.nature.com/doifinder/10.1038/ng.3725 (December 30,

2016).

Morris, Sarah E., and Bruce N. Cuthbert. 2012. “Research Domain Criteria: Cognitive

Systems, Neural Circuits, and Dimensions of Behavior.” Dialogues in Clinical

Neuroscience 14(1): 29–37.

Power, Robert A et al. 2015. “Polygenic Risk Scores for Schizophrenia and Bipolar

Disorder Predict Creativity.” Nature Neuroscience 18(7): 953–55.

http://www.ncbi.nlm.nih.gov/pubmed/26053403 (June 8, 2015).

Purcell, S M et al. 2009. “Common Polygenic Variation Contributes to Risk of

Schizophrenia and Bipolar Disorder.” Nature 460(7256): 748–52.

http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=

Citation&list_uids=19571811.

Purcell, Shaun M et al. 2014. “A Polygenic Burden of Rare Disruptive Mutations in

317

Schizophrenia.” Nature 506(7487): 185–90.

http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=4136494&tool=pmcentr

ez&rendertype=abstract.

Rawlik, Konrad, Amy Rowlatt, and Albert Tenesa. 2016. “Imputation of DNA

Methylation Levels in the Brain Implicates a Risk Factor for Parkinson???s

Disease.” Genetics 204(2): 771–81.

Rege, Sanil, and Suzanne J Hodgkinson. 2013. “Immune Dysregulation and

Autoimmunity in Bipolar Disorder: Synthesis of the Evidence and Its Clinical

Application.” The Australian and New Zealand journal of psychiatry 47(12): 1136–

51. http://www.ncbi.nlm.nih.gov/pubmed/23908311.

Ripke, Stephan et al. 2014a. “Biological Insights from 108 Schizophrenia-Associated

Genetic Loci.” Nature 511(7510): 421–27.

http://www.ncbi.nlm.nih.gov/pubmed/25056061 (July 22, 2014).

Ruderfer, D M et al. 2013. “Polygenic Dissection of Diagnosis and Clinical Dimensions

of Bipolar Disorder and Schizophrenia.” Molecular Psychiatry 19(9): 1017–24.

http://dx.doi.org/10.1038/mp.2013.138%5Cnhttp://www.pubmedcentral.nih.gov/arti

clerender.fcgi?artid=4033708&tool=pmcentrez&rendertype=abstract.

Sekar, A et al. 2016. “Schizophrenia Risk from Complex Variation of Complement

Component 4.” Nature: 177–83.

Stoltenberg, C et al. 2010. “The Autism Birth Cohort: A Paradigm for Gene-

Environment-Timing Research.” Mol Psychiatry 15(7): 676–80.

http://www.ncbi.nlm.nih.gov/pubmed/20571529.

Tylee, Daniel S. et al. 2016. bioRxiv Genetic Correlations among Brain-Behavioral and

318

Immune-Related Phenotypes Based on Genome-Wide Association Data.

http://biorxiv.org/lookup/doi/10.1101/070730.

Uher, Rudolf. 2014. “Gene-Environment Interactions in Severe Mental Illness.”

Frontiers in Psychiatry 5(MAY).

Wu, Chong et al. 2016. “Imputation of Missing Covariate Values in Epigenome-Wide

Analysis of DNA Methylation Data.” Epigenetics 11(2): 132–39.

Wu, Ye E, Neelroop N Parikshak, T Grant Belgard, and Daniel H Geschwind. 2016.

“Genome-Wide, Integrative Analysis Implicates microRNA Dysregulation in

Autism Spectrum Disorder.” Nature Neuroscience 19(11): 1463–76.

http://www.nature.com/doifinder/10.1038/nn.4373 (February 23, 2017).

Zhu, Zhihong et al. 2015. “Dominance Genetic Variation Contributes Little to the

Missing Heritability for Human Complex Traits.” American Journal of Human

Genetics 96(3): 377–85.

319