<<

META-ANALYSES OF EXPRESSION PROFILING DATA IN THE POSTMORTEM

HUMAN BRAIN

by

Meeta Mistry

B.Sc., McMaster University, 2005

A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF

THE REQUIREMENTS FOR THE DEGREE OF

DOCTOR OF PHILOSOPHY

in

THE FACULTY OF GRADUATE STUDIES

(Bioinformatics)

THE UNIVERSITY OF BRITISH COLUMBIA

(Vancouver)

July 2012

© Meeta Mistry, 2012

Abstract

Schizophrenia is a severe psychiatric illness for which the precise etiology remains unknown. Studies using postmortem human brain have become increasingly important in schizophrenia research, providing an opportunity to directly investigate the diseased brain tissue. expression profiling technologies have been used by a number of groups to explore the postmortem human brain and seek which show changes in expression correlated with schizophrenia. While this has been a valuable means of generating hypotheses, there is a general lack of consensus in the findings across studies. Expression profiling of postmortem human brain tissue is difficult due to the effect of various factors that can confound the data. The first aim of this thesis was to use control postmortem human cortex for identification of expression changes associated with several factors, specifically: age, sex, brain pH and postmortem interval. I conducted a meta-analysis across the control arm of eleven microarray datasets

(representing over 400 subjects), and identified a signature of genes associated with each factor. These genes provide critical information towards the identification of problematic genes when investigating postmortem human brain in schizophrenia and other neuropsychiatric illnesses. The second aim of this thesis was to evaluate patterns in the prefrontal cortex associated with schizophrenia by exploring two methods of analysis: differential expression and coexpression. Seven schizophrenia microarray studies of prefrontal cortex were combined for a total of 153 subjects with schizophrenia and

153 healthy controls. Meta-analysis was conducted with careful consideration for the effects of covariates, revealing a robust list of 98 differentially expressed ‘schizophrenia genes’. Using the same seven schizophrenia datasets, coexpression networks were generated for control and schizophrenia cohorts within each dataset and then combined across studies using a rank aggregation approach. Topological properties of our ‘schizophrenia genes’ were evaluated in the context of each network, highlighting differences in correlation structure of these genes in the control and schizophrenia brain. Together these results converge towards a general conclusion, emphasizing that the integration of postmortem human brain expression profiling data improves statistical power and is particularly useful in detecting subtle yet consistent changes in expression associated with schizophrenia.

ii

Preface

Together with my supervisor, Paul Pavlidis, I was responsible for the identification and design of the research program described in this thesis. I was the primary author for every chapter and corresponding publications. My supervisor, Paul Pavlidis contributed study design, supervision, concepts, text and editorial suggestions for all chapters.

A version of Chapter 2 has been published. (Mistry M, Pavlidis P (2010). A cross-laboratory comparison of expression profiling data from normal postmortem human brain. Neuroscience 167:2. 384-95 doi:10.1016/j.neuroscience.2010.01.016).

A version of Chapter 3 has been published (Mistry M, Gillis, J, and Pavlidis P (2012). Genome-wide expression profiling of schizophrenia using a large combined cohort. Molecular Psychiatry. doi:

10.1038/mp.2011.172). Jesse Gillis contributed to Chapter 3 and is a co-author of the corresponding publication. Specifically, Jesse contributed network analysis, interpretation and editorial suggestions for

Chapter 3.

For Chapter 4, Jesse Gillis was responsible for the construction of the rank aggregated coexpression matrices. Jesse also contributed significantly to this chapter by advising on subsequent analyses and interpretation of results and providing guidance and editorial suggestions for this chapter.

iii

Table of Contents

Abstract ...... ii

Preface ...... iii

Table of Contents ...... iv

List of Tables ...... viii

List of Figures ...... x

List of Abbreviations and Gene Definitions ...... xii

Acknowledgements ...... xv

Chapter 1: Introduction ...... 1

1.1 Thesis Overview ...... 2

1.2 Neuropsychiatric Illness ...... 3

1.3 Schizophrenia ...... 4

1.3.1 Theories of Pathophysiology ...... 5

1.3.2 Genetic and Environmental Factors ...... 7

1.3.3 Insights from Human Brain Studies ...... 10

1.4 Postmortem Human Brain Tissue ...... 14

1.4.1 Tissue Heterogeneity ...... 15

1.4.2 Tissue Quality ...... 15

1.4.3 Clinical Quality ...... 16

1.4.4 Demographic Data ...... 17

1.5 Gene Expression Profiling ...... 18

iv

1.5.1 Profiling Technologies ...... 18

1.5.2 RNA Quality ...... 19

1.5.3 Limitations of Microarrays ...... 19

1.5.4 Differential Expression ...... 21

1.5.5 Coexpression ...... 22

1.5.6 Network Analysis ...... 23

1.6 Meta-analysis ...... 24

1.6.1 Meta-analysis of Differential Expression ...... 25

1.6.2 Meta-analysis of Coexpression Networks ...... 27

1.7 Thesis Chapters Summary ...... 32

Chapter 2: Meta-analysis of the normal human postmortem brain ...... 34

2.1 Introduction ...... 34

2.2 Methods ...... 35

2.2.1 Data Collection ...... 35

2.2.2 Regression Analysis ...... 36

2.2.3 Meta-analysis of Differential Expression ...... 36

2.2.4 Validation Analysis...... 37

2.3 Results ...... 37

2.4 Discussion ...... 55

Chapter 3: Genome-wide expression profiling of schizophrenia using a large combined cohort ... 60

3.1 Introduction ...... 60

3.2 Methods ...... 62 v

3.2.1 Data Collection ...... 62

3.2.2 Data Pre-processing ...... 63

3.2.3 Data Quality Control ...... 63

3.2.4 Statistical Modeling ...... 64

3.2.5 Literature-derived Signatures ...... 64

3.2.6 Enrichment Analysis ...... 65

3.2.7 Network Analysis ...... 65

3.3 Results ...... 66

3.4 Discussion ...... 85

Chapter 4: Gene coexpression network analysis of schizophrenia ...... 89

4.1 Introduction ...... 89

4.2 Methods ...... 92

4.2.1 Data Processing and Quality Control ...... 92

4.2.2 Gene Coexpression Networks ...... 92

4.2.3 Random Coexpression Networks ...... 93

4.2.4 Network Properties ...... 93

4.2.5 Schizophrenia Meta-signature Network Analysis ...... 94

4.2.7 Network Clustering ...... 96

4.2.8 Enrichment Analysis ...... 97

4.3 Results ...... 98

4.4 Discussion ...... 119

Chapter 5: Conclusion ...... 123 vi

5.1 Summary of Major Findings ...... 123

5.2 Contribution to Field of Study ...... 125

5.3 Strengths and Limitations ...... 127

5.4 Interpretation of Findings...... 129

5.5 Potential Applications and Future Directions ...... 132

References ...... 135

Appendix ...... 150

Appendix A: ‘Core’ meta-signature lists for age, brain pH, PMI and sex ...... 150

Age Down-regulated ...... 150

vii

List of Tables

Table 1: DSM-IV-TR Diagnostic criteria for schizophrenia ...... 29

Table 2: Candidate genes in schizophrenia ...... 30

Table 3: Human postmortem brain datasets included in control brain meta-analysis ...... 43

Table 4: Sample characteristics for control human postmortem brain datasets ...... 44

Table 5: Significant genes (q<0.01) identified within each individual dataset ...... 44

Table 6: Top meta-signature genes for age, pH, sex and PMI ...... 45

Table 7: Comparison of meta-signature profiles against validation gene sets ...... 47

Table 8: Schizophrenia candidate gene analysis ...... 47

Table 9: Rank correlations between sample information ...... 48

Table 10: Evaluating gene overlap between meta-signatures ...... 48

Table 11: Schizophrenia datasets ...... 72

Table 12: Summary of demographic variables across combined cohort ...... 72

Table 13: Probe model selection across schizophrenia signatures ...... 73

Table 14: Schizophrenia meta-signatures ...... 74

Table 15: Comparison of meta-signatures with findings from original studies ...... 80 viii

Table 16: Evaluating meta-signatures against brain-specific gene coexpression modules ...... 81

Table 17: Whole network properties of the control and schizophrenia brain networks ...... 105

Table 18: Schizophrenia gene set network properties ...... 105

Table 19: enrichment of modules identified by WGCNA-based clustering ...... 106

Table 20: Gene Ontology enrichment of modules identified by MCODE clustering ...... 107

ix

List of Figures

Figure 1: A simple schematic of mesolimbic and mesocortical circuitry ...... 31

Figure 2: Distribution of dataset p-values across meta-signature q-values ...... 50

Figure 3: Distribution of dataset p-values for individual genes: a magnified view ...... 51

Figure 4: Top genes down-regulated with age...... 52

Figure 5: GO enrichment analysis ...... 54

Figure 6: Investigating the relationship between age and brain pH ...... 54

Figure 7: Example of consistent expression changes for a gene across data sets ...... 82

Figure 8: Expression changes in the ‘core signatures’ ...... 84

Figure 9: Connectivity distribution of control and schizophrenia networks ...... 108

Figure 10: Shared edges between networks ...... 109

Figure 11: Comparison to random network distributions ...... 110

Figure 12: Comparison of gene set properties to functional GO groups ...... 111

Figure 13: Jackknifed network measures ...... 112

Figure 14: Network representation of within gene set interactions for schizophrenia meta-signature genes

...... 114

x

Figure 15: Comparison of modules between networks (WGCNA) ...... 115

Figure 16: Enrichment of cell type markers in WGCNA modules ...... 116

Figure 17: Enrichment of cell type markers in MCODE modules ...... 117

Figure 18: Cluster comparison between WGCNA and MCODE clustering algorithms ...... 118

xi

List of Abbreviations and Gene Definitions

ABCA1 ATP-binding cassette, sub-family A (ABC1), member 1 AIC Akaike information criterion AMPA α-amino-3-hydroxy-5-methyl-4-isoxazolepropionic acid ANOVA analysis of variance ANP32A acidic nuclear phosphoprotein 32 family, member A APBA2 amyloid beta (A4) precursor -binding, family A, member 2 APOD apolipoprotein D ATP5C1 ATP synthase, H+ transporting, mitochondrial F1 complex, gamma polypeptide 1 AUC area under the curve BA47 Brodmann area 47 BA9 Brodmann area 9 BAZ1A bromodomain adjacent to zinc finger domain, 1A BBX bobby sox homolog (Drosophila) BDNF brain-derived neurotrophic factor CBFA2T2 core-binding factor, runt domain, alpha subunit 2; translocated to, 2 cDNA complimentary DNA CNS central nervous system CNV copy number variation COMT catechol-O-methyltransferase COPS7B COP9 constitutive photomorphogenic homolog subunit 7B COQ4 coenzyme Q4 homolog (S. cerevisiae) CRHR corticotropin releasing hormone receptor 1 CRYM crystallin, mu CYP26B1 cytochrome P450, family 26, subfamily B, polypeptide 1 DAOA D-amino acid oxidase activator DCAF8 DDB1 and CUL4 associated factor 8 DISC1 disrupted in schizophrenia 1 DLGAP1 disks large-associated protein 1 DNA deoxyribonucleic acid DSM-IV diagnostic and statistical manual of mental disorders version 4 DTI diffusion tensor imaging DTNBP1 dystrobrevin binding protein 1 EIF2C3 eukaryotic translation initiation factor 2C, 3 EIF3E eukaryotic translation initiation factor 3, subunit E eQTL expression quantitative trait loci ERBB3 v-erb-b2 erythroblastic leukemia viral oncogene homolog 3 FBXO9 F-box protein 9 FDR false discovery rate FEM fixed effects model xii

fMRI functional magnetic resonance imaging GABA gamma-aminobutyric acid GABBR1 gamma-aminobutyric acid (GABA) B receptor, 1 GABRG2 gamma-aminobutyric acid (GABA) A receptor, gamma 2 GAD65 glutamate decarboxylase 2 (pancreatic islets and brain, 65kDa) GAD67 glutamate decarboxylase 1 (brain, 67kDa) GAPDH glyceraldehyde-3-phosphate dehydrogenase GBA guilt-by-association gcRMA robust multiarray averaging with GC-content background correction GEO gene expression omnibus GFAP glial fibrillary acidic protein guanine nucleotide binding protein (G protein), alpha activating activity polypeptide, GNAL olfactory type GNB1L guanine nucleotide binding protein (G protein), beta polypeptide 1-like GNB5 guanine nucleotide binding protein (G protein), beta 5 GO gene ontology GPCR G-protein-coupled receptor GSR gene score resampling GTPase that hydrolyzes guanosine triphosphate hiPSCs human induced pluripotent stem cells KCNK1 potassium channel, subfamily K, member 1 LCM laser capture microdissection LC-MS liquid chromatography-tandem mass spectrometry LPL lipoprotein lipase MAG myelin associated glycoprotein MAPK1 mitogen-activated protein kinase 1 MAQC microarray quality control MAS5.0 microarray analysis suite 5 MEM mixed effects model MHC major histocompatibility complex MPSS massively parallel signature sequencing MRI magnetic resonance imaging mRNA messenger ribonucleicacid NGS next-generation sequencing NMDA N-methyl-D-aspartate NOVA1 neuro-oncological ventral antigen 1 NRG1 neuregulin 1 NSF N-ethylmaleimide-sensitive factor OLIG2 oligodendrocyte lineage transcription factor 2 OPCML opioid binding protein/cell adhesion molecule-like OPN3 opsin 3 ORA over-representation analysis

xiii

PAIP2B poly(A) binding protein interacting protein 2B PCP phencyclidine PCSK2 proprotein convertase subtilisin/kexin type 2 PFC prefrontal cortex PKP4 plakophilin 4 PLOD2 procollagen-, 2-oxoglutarate 5-dioxygenase 2 PLP1 proteolipid protein 1 PMI postmortem interval PPI protein-protein interaction RGS12 regulator of G-protein signaling 12 RGS17 regulator of G-protein signaling 17 RGS4 regulator of G-protein signaling 4 RGS6 regulator of G-protein signaling 6 RGS7 regulator of G-protein signaling 7 RIN RNA intergrity number RMA robust multiarray averaging RNA ribonucleicacid RNS reactive nitrogen species ROC receiver operating characteristic ROS reactive oxygen species rRNA ribosomal ribonucleic acid RT-PCR reverse transcriptase PCR SAGE serial analysis of gene xpression SAM Significance analysis of microarrays SLC25A12 solute carrier family 25 (mitochondrial carrier, Aralar), member 12 SLC25A15 solute carrier family 25 (mitochondrial carrier; ornithine transporter) member 15 SMG1 smg-1 homolog, phosphatidylinositol 3-kinase-related kinase (C. elegans) sMRI structural magnetic resonance imaging SMRI Stanley medical research institute SNN stannin SNP single nucleotide polymorphism SYN2 synapsin II SYNJ1 synaptojanin 1 TACC2 transforming, acidic coiled-coil containing protein 1 TOM topological overlap matrix USP19 ubiquitin specific peptidase 19 VCFS velo-cardio-facial syndrome VTA ventral tegmental area WGCNA weighted gene coexpression network analysis WNK1 WNK lysine deficient protein kinase 1 XIST X (inactive)-specific transcript (non-protein coding) ZDHHC8 zinc finger, DHHC-type containing 8 xiv

Acknowledgements

I would like to thank my graduate supervisor Paul Pavlidis, whose kindness, support and guidance throughout the duration of my graduate studies has been invaluable to me. With his encouragement, patience and his great efforts to explain things clearly, he has helped make this a wonderful learning experience for me. Thanks also to my thesis committee members: – Robert Holt, Clare Beasley and

Wyeth Wasserman, whose support and advice have successfully guided me through my graduate career.

I am greatly indebted to the groups and institutions that made their data available, including Karoly

Mirnics (Vanderbilt), Vahram Haroutunian (Mt Sinai), the Stanley Medical Research Institute (SMRI) and the Harvard Brain bank. This thesis would not have been possible without their generosity. I would also like to thank Nicole Berchtold (University of California, Irvine), Mehmet Somel (Max Planck Institute for

Evolutionary Anthropology), Alice Chen-Plotkin (Center for Neurodegenerative Disease Research), and

Elizabeth Thomas (Scripps Research Institute) for providing additional information on their data sets.

I would particularly like to acknowledge Jesse Gillis for his seasoned advice and guidance for much of the work done in Chapter 4. His insightful thoughts on my work and his ability to explain network theory have been incredibly helpful and a large contribution towards the completion of this thesis. Thank you to all my fellow labmates and friends at CHiBi for helping me through the process. The support and encouragement of many fellow graduate students and friends has been indispensable, specifically thanks to: Kelsey Hamer, Leon French, Thomas Sierocinski, Audrey Houillier, Olena Morozova, Shabnam

Tavassoli, Katayoon Kasaian, Misha Kanji, Shreena Desai, Audrey Cherryl Mogan, Warren Cheung,

Simon Chan, Luke McCarthy, Carri-Lyn Mead, Ben Vandervalk, Reena Grewal and Anna Johnson. I am also very grateful to the Canadian Institute for Health Research and the MIND Foundation of BC for awarding me funding throughout my graduate career.

Finally, heartfelt thanks are owed to my family for their continued understanding and support throughout the process. I would like to thank my sister and her partner Sonal and Neil Ghosh, my brothers Mandip

xv

Mistry and Bhavik Mistry and most of all to my parents for their love and support throughout everything I do in my life.

xvi

Chapter 1: Introduction

Like any other biological organ system, the function of the human brain is ultimately determined by the function of the genome, in a complex interplay with the organism’s environment and life history. Therefore it has long been known that normal and disease processes in the brain reflect processes which are either under direct control of the genome or are influenced by genetic variability. As molecular neuroscience has matured, we are able to make the transition from high level structural organization of the brain to detailed maps of genetic influences. The power to investigate and discover genetic and genomic mechanisms underlying the health and disease of the human brain has increased dramatically. Gene expression profiling by use of microarrays has frequently been applied to studying the postmortem human brain [1,

2]. These studies enable detailed investigation of gene expression patterns related to human brain circuitry and aid in our understanding of the precise spatiotemporal regulation of the brain’s transcriptome

[3, 4]. Establishing a better understanding of normal human brain processes is critical for elucidating the pathophysiology and etiology of schizophrenia amongst other major mental illnesses. In this thesis, I sought to explore gene expression changes in the normal and diseased human brain by leveraging the large amounts of available microarray data. Combining datasets across studies enabled more powerful statistical analyses to generate findings not observable in single dataset studies. In the diseased human brain, I focused on expression alterations associated with subjects diagnosed with schizophrenia.

Schizophrenia is a complex psychiatric disorder that is highly heritable with a strong genetic component

[5, 6]. From existing gene expression studies of schizophrenia we are faced with a mixture of diverse and concordant findings. While microarrays provide a valuable means of generating hypotheses, there are challenges that persist in identifying truly reliable gene expression changes associated with psychiatric illnesses. Because the neuropathology being sought is not obvious, we must correctly control for possible factors confounded with the disease to carefully distinguish signal from noise. In my analysis of the normal human brain, I sought to evaluate expression changes associated with such factors.

In this chapter, I introduce the background for my research which combines two challenging topics: (1) the analysis of expression profiling data in the face of substantial sources of noise and small signals and 1

(2) the continuing pursuit to uncover the etiological components of schizophrenia. This introduction reviews: the issues of expression profiling and postmortem human brain, the statistical methods for integration of profiling studies, as well as a summary of past and current research on the pathophysiology of schizophrenia.

1.1 Thesis Overview

In this thesis, I apply statistical methods for integrating microarray data across studies, with a particular focus on human postmortem brain tissue. The current literature identifies numerous studies using postmortem brain tissue, and the number of studies that attempt to integrate this data is also on the rise

[7-13]. I emphasize that the integration of data across studies increases statistical power and facilitates the identification of more robust changes in expression. The first part of this thesis (Chapter 2) focuses on the normal human cortex, to examine gene expression patterns associated with various factors that can be confounding across eleven independent microarray studies. Expression changes associated with age, sex, postmortem interval, and brain pH are evaluated within each dataset and then combined across datasets using meta-analytical methods. I aim to identify lists of genes significantly associated with each factor as they are of interest in their own right, but careful consideration of these effects will be useful for researchers in the interpretation of future postmortem brain studies. The second part of this thesis

(Chapter 3) focuses on evaluating gene expression changes in the prefrontal cortex associated with schizophrenia. My goal was to find statistically significant ‘schizophrenia genes’ that show consistent patterns across seven independent microarray studies. Expression changes associated with schizophrenia are subtle and often masked by expression changes due to extraneous factors. I hypothesized that by combining data across studies, I could increase statistical power to identify small changes associated with the illness. I also incorporate results from the normal brain analysis to ensure the correct control of various factors that can be confounded with the disease effect. While the first two projects of this thesis are concerned with differential expression, in the last section (Chapter 4) I turn the focus to coexpression. Using the same seven schizophrenia datasets used in Chapter 3, I perform a meta-analysis of gene coexpression. Separate networks are generated for the control and schizophrenia 2

brain, and the network properties of each are evaluated. The differentially expressed ‘schizophrenia genes’ obtained through meta-analysis in Chapter 3 are also interrogated in the context of each network. I aim to identify characteristic network properties of the ‘schizophrenia genes’ and evaluate how they differ from other functional gene groups. I hypothesize that evaluating the interactions of these genes in the context of each network, will reveal unknown relationships between them and possibly highlight functional differences in the control and schizophrenia human brain.

1.2 Neuropsychiatric Illness

Neuropsychiatric illnesses are complex brain disorders that arise from a combination of genetic and environmental influences. Examples of such illnesses include schizophrenia, bipolar disorder, major depressive disorder and autism spectrum disorder. The development of new treatment approaches for psychiatric illnesses is grounded in a fundamental understanding of disease etiology and pathophysiology, something which we have long been in search for. Over the past decade there has been a vast amount of research conducted in this area, but despite these efforts, there has been little success in psychiatry compared to other areas of medicine [14]. Difficulty in isolating the exact cause for these disorders stems from a number of challenges. A first obstacle occurs at the stage of diagnosis due to the phenotypic heterogeneity of the affected population. There is often a lack of definitive symptoms that consistently manifest in all affected subjects. In general, disease is investigated as a set of binary traits where people are considered to be with or without disease, but this simple binary system does not apply well to psychiatric illness which can consist of a spectrum of phenotypes. A second challenge is that it has been established that psychiatric disorders are generally not single gene disorders and do not show a simple Mendelian pattern of inheritance [15]. They implicate a large number of genes, suggestive of multiple affected pathways which result in selective failures of normal brain function. Finally, the brain is a highly complex organ, with proper function dependent on the coordinated activity of different cell types across several brain regions. We are still at an early stage of understanding normal brain function and the precise inner workings at the level of neuronal circuits. Thus, a critical barrier to finding better diagnosis

3

and treatment for psychiatric illness lies within these challenges. By addressing these issues we can gain a deeper understanding of the causal pathways and make progress with these debilitating disorders.

1.3 Schizophrenia

Schizophrenia is a complex psychiatric illness that affects about one percent of the population worldwide

[16]. The disorder presents itself with a combination of signs and symptoms that vary across affected patients, but is predominantly defined by observed signs of psychosis. Diagnosis of schizophrenia is based on a set of specified criteria found in the Diagnostic and Statistical Manual of Mental Disorders fourth edition (DSM-IV-TR) [17]. These criteria are also listed in Table 1 of this thesis. Five categories of clinical symptoms are assessed and the patient must have at least two for a 1-month period. Further, these symptoms must associate with disturbance in work, interpersonal relations or self-care of the patient for at least a 6-month period. Symptoms are typically clustered into three domains: positive, negative and cognitive. Positive symptoms are those that appear to be in excess of the individual’s normal functioning. Some common examples, and perhaps the most dramatic clinical aspects, include paranoid delusions and auditory hallucinations. In contrast, negative symptoms reflect the absence of certain functions that are present in the normal individual; for example lack of emotion or motivation, and social isolation. The cognitive domain of symptoms arise not from observations of clinical symptoms, but from clinical neuropsychology whereby standardized tests are used to assess the level of function (e.g.

Wisconsin Card Sorting Test, N-back test) [18]. Studies using such approaches have shown that affected patients show deficits in a variety of cognitive domains including executive function, attention, working memory and language [19]. The onset of psychosis typically occurs in young adulthood (18-25 years of age), with patients experiencing a slow and gradual development of signs earlier in life, also known as the prodromal phase. In the prodromal phase, patients experience non-specific behavioural changes manifesting mostly negative symptoms, for example social withdrawal or sudden outbursts of anger. The current literature indicates sex differences in the symptomology and prognosis of schizophrenia, although results are not always consistent [20]. For example schizophrenia tends to occur in women 3-4 years later in life than men, and women also tend to have milder forms of the disease in their younger years than 4

their male counterparts. Further, epidemiological studies have found higher incidence rates in male than female [21].

1.3.1 Theories of Pathophysiology

Several neurotransmitter systems have been found to exhibit dysfunction in the brains of schizophrenia patients [22]. Here, I discuss the role of three neurotransmitters: dopamine, glutamate and gamma- aminobutyric acid (GABA) and their involvement in schizophrenia pathophysiology.

Dopamine is a catecholamine neurotransmitter that has many functions in the brain including important roles in behavior and cognition. Historically, dopamine receptors were divided into two major subtypes referred to as D1 and D2, but it is now recognized that additional receptors D3, D4, and D5 also exist

[23].The first formulation of the ‘dopamine hypothesis’ suggested an excess of dopaminergic activity associated with schizophrenia. This hypothesis evolved from the antipsychotic effects observed when the

D2 receptor antagonist chlorpromazine (as part of its early clinical testing as an antihistamine) was administered to patients in a French mental hospital. Furthermore, psychosis-inducing effects were observed from dopamine releasing drugs such as amphetamines [24]. Given the predominant localization of D2 receptors in subcortical regions of the brain, much of the ensuing research focused attention on the mesolimbic dopamine pathway composed of dopaminergic neurons in the ventral tegmental area (VTA) and their projections to the nucleus accumbens, regions of the hippocampus, and the mesial components of the frontal, anterior cingulate and entorhinal cortices (Figure 1). Due to the conceptualization of schizophrenia as a disorder of increased dopamine transmission, the treatment of schizophrenia remained unchanged for many decades. First-generation or ‘typical’ antipsychotics such as chlorpromazine and haloperidol remained a first choice to patients, despite their unpleasant side effects.

A reformulation of the ‘dopamine hypothesis’ eventually emerged; implicating the D1 receptor and the mesocortical dopamine pathway (Figure 1), whereby the neocortex (mainly the prefrontal cortex (PFC)) receives a dense dopaminergic innervation from the VTA. D1 receptors are localized in the PFC, and have been found to be associated with cognitive dysfunctions, especially working memory [25, 26].

Moreover, brain imaging studies show abnormalities of the D1 receptor density in the frontal cortex of

5

persons with schizophrenia [27, 28]. These lines of evidence suggest that hypoactivity of dopamine transmission in the mesocortical pathway is associated with negative and cognitive symptoms of schizophrenia, whereas hyperactivity of dopamine transmission in the mesolimbic pathway is attributed to positive symptoms of schizophrenia. A case has also been made proposing a link between the two hypotheses, in which the mesolimbic dopaminergic function in schizophrenia is a secondary phenomenon of a functionally compromised mesocortical system [29].

Another hypothesis suggests that hypofunctioning of the glutamatergic system may be involved in the pathogenesis of schizophrenia. Glutamate is the primary excitatory neurotransmitter in the mammalian brain and is thought to be utilized by 40% of all synapses [30]. Glutamate mediates its actions via four different post-synaptic receptors: N-methyl-D-aspartate (NMDA), alpha-amino-3-hydroxy-5-methyl-4- isoazolepropionic acid (AMPA), kainite, and metabotropic receptors. Of the different receptor types,

NMDA is the one that has received the most attention. Low doses of dissociative anesthetics such as phencyclidine (PCP) or ketamine when administered to healthy individuals were observed to elicit both positive symptoms (e.g. paranoia) and negative symptoms (e.g. blunted affect) [31], and also capable of inducing schizophrenia-like cognitive effects (e.g. attention and memory problems) [32]. It was discovered later that that these compounds function by blocking the NMDA receptor [33]. More recently, there is evidence that glutamatergic drugs can be helpful in the treatment of schizophrenia. Agents that enhance

NMDA function by modulating the modulatory site on the NMDA receptor (e.g. sarcosine) have been reported to reduce some symptoms in patients with schizophrenia [34]. Similarly, a selective agonist of metabotropic receptors has been shown to reverse PCP effects in animal models and are also effective in treating some positive and negative symptoms [35].

Abnormalities of the GABA system are also implicated in schizophrenia. GABA is the major inhibitory neurotransmitter in the brain, and is synthesized from its precursor L-glutamate via the enzyme glutamate decarboxylase (GAD). Benes et al illustrated a deficit in GABA-ergic (GABA producing) interneurons in the prefrontal and cingulate cortices of schizophrenia patients [36], consistent with a great deal of work from other postmortem brain studies, as reviewed in [37]. Moreover, messenger RNA (mRNA) expression

6

of major GABA-synthesizing enzyme GAD67 was also found to be decreased in prefrontal cortex of subjects with schizophrenia relative to controls [38, 39].

While each of these hypotheses has been generally considered independently, a more recently proposed paradigm suggests that they may be intertwined [40]. Anomalies in one neurotransmitter system often lead to a dysfunction in another. For example, NMDA antagonists are also potent activators of dopamine release which cause marked psychotic symptoms in healthy humans and exacerbate symptoms in schizophrenics [41]. Thus, dopaminergic dysfunction in schizophrenia may be secondary to an underlying glutametergic dysfunction. GABA-ergic interneurons were found to be more sensitive to NMDA receptor inhibition than pyramidal neurons (an abundant cell type in cortical structures) [30], suggesting that hypofunction of the NMDA receptor results in a subtle loss of GABA-ergic inhibition and interferes with localized processing. GABA also modulates the dopaminergic mescortical system, whose disturbance could theoretically create dopamine dysregulation. Alterations of other neurotransmitter systems (i.e. serotonergic, cholinergic, and opioid) also provide evidence of synaptic involvement leading to the pathophysiology of schizophrenia, however, the case for their involvement is not as strong. Additional support for a multiple neurotransmitter hypothesis, comes from the increased efficacy of clozapine and other second generation neuroleptics also referred to as ‘atypical’ antipsychotics, which act by antagonizing a wide variety of receptors (i.e. most dopamine receptors, norepinephrine receptors, and many cholinergic and serotonergic receptors) [42].

1.3.2 Genetic and Environmental Factors

Schizophrenia is characterized by a genetic component working together with various environmental influences to onset symptoms. There is a substantial genetic contribution demonstrated by high heritability estimates (up to 80%), and > 50% concordance in monozygotic twins [43-45]. However, unlike monogenic diseases where a single mutation may cause the disease phenotype, schizophrenia involves the influence of multiple genes. To date there are no definitive universal genetic markers associated with schizophrenia, despite numerous efforts focused on their identification. Candidate-gene and genome- wide association studies (GWAS) have been conducted in attempts to identify single nucleotide

7

polymorphisms (SNPs) of genetic loci underlying susceptibility to schizophrenia. A list of common genes identified from genetic studies of schizophrenia is provided in Table 2 of this chapter. The evaluation of genetic influences on schizophrenia was for many years guided primarily by the ‘common disease- common allele’ model, which states that common diseases are caused by multiple common alleles with relatively modest effects that contribute to increasing risk [46]. In contrast, the ‘common disease-rare allele’ model asserts that disease is caused by multiple variants that are highly penetrant, individually rare, and even specific to single cases or families [47]. While the two are competing theories, it is thought that both could be true for a single susceptibility gene for a complex disease, such as schizophrenia.

One of the most replicated findings of candidate genes in schizophrenia is the dysbindin (DTNBP1) gene, first reported by Straub et al [48] from association mapping across the linkage region on 6p22.3.

Numerous other association studies followed, providing evidence in favour of the DTNBP1 gene, however there were inconsistencies in the specific risk alleles identified [49-51]. The dysbindin protein is involved in many functions, but of particular interest is its involvement in vesicle trafficking and the potential implication with schizophrenia [52]. Another strong candidate gene is neuregulin 1 (NRG1), showing suggestive linkage of schizophrenia to 8p, in not only the original Icelandic population [53], but also a number of other populations (see review [54]). The NRG1 gene gives rise to at least fifteen isoforms that encode different carrying out a range of functions in the brain. Similar to DTNBP1, the way in which altered function of this gene would lead to schizophrenia is unclear.

In addition to reports on SNPs, other avenues of genetic abnormalities have been explored with respect to their association with schizophrenia. One example of this is the DISC1 gene, also known as disrupted- in-schizophrenia 1. This gene was identified based on a balanced chromosomal translocation (1:11)

(q42.1; q14.3) found to co-segregate with schizophrenia and other psychiatric disorders in a large

Scottish pedigree [55, 56]. The translocation was found to disrupt two genes on : DISC1 and DISC2. Association studies that followed sought to identify DISC gene polymorphisms in another population. Positive findings were reported in a large Finnish sample [57] and in a US sample comprising additional subjects with bipolar disorder and schizoaffective disorder [58]. Another prominent example is

8

the presence of a rare hemizygous microdeletion at chromosome 22q11.2, also referred to as velo- cardio-facial syndrome (VCFS) [59]. The VCFS phenotype is complex including multiple congenital abnormalities affecting several tissue and organs, with at least 20-fold excess risk of developing psychosis [60]. The 3MB deletion region contains more than 45 genes, of which roughly 30 are usually lost in patients with VCFS with no ‘critical region’ of deletion identified. VCFS represents an unequivocal genetic ‘subtype’ of schizophrenia, but the extent to which its underlying mechanism generalizes to schizophrenia remains unknown. Non-deletion variants (polymorphisms) of individual genes within the

22q11 region have also been studied, as they may make a larger contribution to schizophrenia susceptibility in the wider population. Examples of such genes include G-protein subunit beta-like protein

(GNB1L) [61], zinc finger DHHC domain containing protein 8 (ZDHHC8) [62], and catechol-O-methyl (COMT) [63]. More recently, the availability of whole genome scanning methods have made it possible to interrogate genomic structural variants in schizophrenia on a larger scale, enabling us to move beyond the classic examples of DISC1 and 22q11. There is now convincing evidence for association between schizophrenia and a number of specific rare copy-number variants (CNVs) [64].

The last decade of genetic studies in schizophrenia has suggested many candidate genes, but a problem that persists is the failure to replicate across studies. Some studies may be underpowered to reliably detect genes of very minor effect size. To address the issue of sample size, Stefansson et al [65] carried out a comprehensive meta-analysis of genome-wide SNP data from several large independent studies of schizophrenia. They identified seven replicable associations including a large number of genes of the major histocompatibility complex (MHC), with follow-up on the most significant signals. Using slightly less stringent criteria on the same cohorts they were able to identify two additional common variants [66], one of which was later confirmed in a Han Chinese population [67]. These findings suggest that it is possible to detect significant genetic effects associated with schizophrenia, but a move towards even larger sample sizes will be highly beneficial.

While there is strong evidence for a genetic contribution, there is also a large body of evidence indicating a role for environmental risk factors in the etiology of schizophrenia. Examples of proposed environmental

9

factors include stress, drug use, history of trauma, head injury, low socioeconomic class, and a number of prenatal factors such as late winter/early spring birth, maternal infections in the second trimester, and obstetric complications [68]. Each of these factors range in the relative risk conferred; although the mechanism by which each influences schizophrenia is not clear. Further support for the role of environmental insults comes from the fact that some of the identified genetic candidates could provide a source of explanation. One study suggests a link between susceptibility genes that are pertinent to normal cell function, but also code for proteins involved in the life cycles of pathogens, by virtue of their multifunctionality [69]. If genes are implicated in the life cycles of pathogens (which are environmental risk factors), the polymorphisms of these genes may well affect the virulence of the pathogen. In line with prenatal risk factors, studies have also revealed a number of schizophrenia susceptibility genes to have key roles during neurodevelopment (e.g. DISC1, NRG1) [70, 71].

1.3.3 Insights from Human Brain Studies

Abnormalities in a number of different brain regions have been reported in schizophrenia, but an exclusive region of pathology has not been identified [72]. From the literature, it is evident that reports of pathologic change in the hippocampus and the PFC predominate. Here, I detail only implications of schizophrenia in the PFC, as it is the brain region of greatest relevance to the work presented in my dissertation. The PFC is a neocortical region of the brain, serving specific cognitive functions involved in selective attention, working memory and behavioural inhibition [73]. These are brain functions that are critically impaired in patients with schizophrenia. Moreover, the PFC is one of the last cortical regions to develop structurally and functionally, with evidence of grey matter reduction (indicating synaptic pruning) and increased myelination continuing into early adulthood [74]. Thus, early adulthood may be a critical period of vulnerability of the PFC for psychosis. An early conceptualization of the neurodevelopmental model of schizophrenia was proposed by Weinberger [29], suggesting that the behavioural outcome of psychosis in late adolescence is the result of an early insult that begins long before onset. This model posits that the insult is a congenital static brain ‘lesion’ in the PFC. The effects of the lesion remain silent during much of development as the brain region is not yet functionally mature. It is during late adolescence when key maturational events take place in the PFC and clinical impact is observed. An 10

alternative theory proposed by Feinberg is termed the ‘late neurodevelopmental model’, which suggests that the onset of psychosis in late adolescence is due to an abnormality in normal maturational processes of the cerebral cortex during this time independent of any ‘lesions’ [75]. There is possibly excessive synaptic pruning in patients with schizophrenia, though the exact mechanism by which this occurs remains unclear [76].

Modern brain imaging techniques provide non-invasive approaches to observe brain structural changes in schizophrenia most recently using magnetic resonance imaging (MRI). There is a large body of research in this area, with in-depth coverage by various literature reviews [77, 78]. Here, I highlight from these findings patterns that emerge in relation to the neurodevelopmental model. Longitudinal MRI studies have consistently shown progressive increases in lateral ventricle volume, and reduced grey matter volume in schizophrenia [78]. Progression is evaluated in adult onset schizophrenia patients by imaging high-risk prodromal patients and assessing changes after the first psychotic episode (i.e. 12 month period) [79], or by comparing patients in the early stages of schizophrenia compared to chronic stages (i.e. 4 year period) and evaluating change in the context of clinical severity of the illness [80]. Although the normal trajectory of brain development involves loss of grey matter, the progressive loss observed in schizophrenia patients is significantly more than the decrease in healthy controls. This suggests an ‘exaggeration’ of normal brain development, corroborating with existing theories of excess synaptic pruning in schizophrenia [76].

Traditional MRI can also be used to look at white matter, though it is limited and can only give a gross overview. A relatively new imaging modality called diffusion tensor imaging (DTI) allows for a more detailed analysis of the integrity of the white matter tracts in the brain. DTI findings in schizophrenia are not entirely consistent, but the overall picture is one of reduced white matter integrity [81]. The most marked differences in white matter tracts that are observed in both chronic and first-episode patients, are found in the cingulate and the frontal lobes [81]. White matter tracts in the brain are the basis for large- scale connectivity between functionally related but anatomically disparate regions of the brain, thus these findings make a case for disrupted neural connectivity in the pathophysiology of schizophrenia. DTI

11

findings have also been reported in the context of clinical correlates and cognitive functioning related to schizophrenia. For example, reduced white matter integrity in the anterior cingulum has been found to correlate with deficits on the WCST [82]. Behavioural deficits in schizophrenia have also been evaluated by integrating cognitive paradigms with functional MRI (fMRI), an imaging modality which measures cerebral blood flow as a proxy for neural activity. A recent meta-analysis of 41 functional neuroimaging studies reported consistently decreased activity in the dorsolateral PFC and the anterior cingulate cortex during discrete PFC dependent executive function tasks [83], reflective of hypofrontality in schizophrenia.

Studies of the postmortem human brain have been beneficial in elucidating features of schizophrenia that remain beyond the reach of neuroimaging. In general, schizophrenia lacks the presence of major identifiable neuropathological lesions and in earlier times was quoted as being the ‘graveyard of neuropathology’ [84]. However more recently, microscopic features of pathologic change have been identified in various regions of the limbic cortex and the neocortex; with a focus here on the latter. There have been reports of reduced dendritic spine density [85, 86] and smaller cell bodies [87] on prefrontal cortical pyramidal neurons in schizophrenia. Also, concordant with MRI findings, grey matter volume reductions have been reported [88], though it is thought that this is not due to loss of neurons but rather because of reduced neuropil [89]. A wide range of methods and techniques have been applied to postmortem tissue to investigate alterations at the cellular level. Protein-based schizophrenia studies have been conducted using postmortem tissue [90], ranging from standard Western blot analyses, and two-dimensional gel electrophoresis to newer methods such as liquid chromatography-tandem mass spectrometry (LC-MS). High-throughput approaches such as microarrays and RNA-Seq have enabled us to distinguish molecular features such as levels of gene expression change associated with schizophrenia; and the study of epigenetics provides a window into the regulation of those transcriptional changes. Several studies have reported alterations in DNA cytosine methylation and histone methylation at specific genes and promoters in the postmortem brain of subjects with schizophrenia, often in conjunction with changes in the levels of corresponding RNAs [91, 92]. There is evidence for epigenetic modifications influencing the GABAergic system (i.e. GAD1 [93] and RELN [94, 95]), the dopaminergic system (i.e. COMT [96]), and myelination-related processes based on the methylation of a transcription 12

factor important for myelination and oligodendrocyte function (i.e. SOX10 [97]).

Transcriptome analysis of the postmortem human brain has been of substantial interest in schizophrenia research as it holds the promise of identifying a signature of genes representing a pathology at the molecular level. Measurements of gene expression reflect the transcription processes in a cell at any given point in time. Changes observed between normal and diseased brain tissue (or cells that comprise that tissue), capture maladaptation of cell function in the diseased environment by identifying a signature of over- and under-expressed quantities of RNA. However, similar to genetic studies of schizophrenia few findings from gene expression analysis have been reliably replicated. Here, I will summarize some of the major findings pertaining to the PFC, but recent reviews such as [98] and [99] should be consulted for a more comprehensive overview. The first microarray analysis of postmortem human brain in schizophrenia was conducted by Mirnics et al. (2000) [98], investigating expression patterns in the PFC.

Overall, genes involved with presynaptic function were shown to have lower expression in schizophrenia including, N-ethylmaleimide-sensitive factor (NSF), synapsin II and synaptojanin 1. Further support for the involvement of presynaptic genes was provided by later studies of the hippocampus [100] and the entorhinal cortex [101]. Also related to dysregulation at the synapse, a recent analysis using two large cohorts found consistent expression changes in gene sets associated with synaptic vesicle recycling, neurotransmitter release and cytoskeleton dynamics [102]. In line with the neurotransmitter hypotheses associated with schizophrenia, Mirnics et al. also identified altered expression of transcripts involved in

GABA-ergic and glutamatergic neurotransmission. Notably, the decreased expression of GAD1 mRNA was of particular interest, and was later confirmed in another microarray study [103] and by other techniques [38, 104, 105]. Given the importance of myelination in the process of brain maturation and its implication in schizophrenia, reduced expression of oligodendrocyte transcripts found by Hakak et al.

(2001) [106] was particularly relevant. Decreased expression of myelination-related genes such as myelin-associated glycoprotein (MAG) and oligodendrocyte lineage transcription factor 2 (OLIG2) have also been reported in neocortical regions and other brain areas [107]. While a large majority of studies have reported reduced expression of transcripts, there are also findings of up-regulated genes. Several studies have reported increased expression of immune and stress-response genes in patients with

13

schizophrenia, using mostly PFC samples [99, 108, 109]. Despite all of these studies there is no signature of genes that are reliably found across all studies. This lack of consensus can be attributed to various sources of limitation. First, many of these studies suffer from low statistical power. Typically sample sizes in these studies are small (N ≈ 40 or less), as postmortem brain tissue is a limited resource.

Second, schizophrenia is associated with relatively small changes in gene expression and discriminating real differences from experimental noise is difficult. Finding small effects from small sample sizes is difficult, leading researchers to use lax criteria for identification of targets, at the risk of substantial false positive rates. Finally, differences between studies also arise due to reasons related to the nature of postmortem work; a topic which is covered in detail in the next section (Section 1.4). For my thesis, I have combined schizophrenia expression profiling data across studies, increasing the power to find more significant changes in gene expression. I hope that the identification of a robust schizophrenia signature of genes will help aid in forming further hypotheses on the complex etiology of schizophrenia.

1.4 Postmortem Human Brain Tissue

The use of postmortem human brain tissue is a crucial element for understanding the pathological processes of psychiatric illness. While animal models can mimic certain aspects of human pathology we will never be able to fully recapitulate the disorder in an animal. Moreover, major psychiatric illnesses are disorders of the brain, thus direct examination at the source can better reveal details on the mechanisms underlying disease. However, postmortem human brain tissue poses several challenges to researchers

[110]. Unlike animal models, whose genetic makeup and environmental factors can be controlled and influenced, human tissue comes from relatively uncontrolled sources. Variability of pre- and postmortem conditions among individuals can potentially influence the quality of tissue and consequently patterns of gene expression. In this section, I discuss some of the factors which can affect the integrity of conclusions made from gene expression profiling studies of postmortem human brain, as this is a focus of my thesis. It should be noted though, that these factors affect all postmortem brain studies regardless of the technique being applied.

14

1.4.1 Tissue Heterogeneity

The human brain is made up of two major types of cells called neurons and glia, with the latter accounting for the majority [111]. Neurons are cells involved in processing and transmitting information through the brain via electrical and chemical signaling. Neurons exist in different shapes and sizes and can be classified by their morphology and function. Glia also have multiple subtypes (e.g. oligodendrocytes and astrocytes), each involved in a wide variety of functions providing structural, metabolic, and trophic support to neurons. More recently, glia have also been identified as active members of synaptic transmission, modulating the information flow between neurons [111]. It is the coordinated activity and cross-talk between these different cells that allow for proper functioning of the brain.

Given the intricate organization of the brain, it is difficult to obtain samples of homogenous cell type.

Common approaches for postmortem brain tissue dissection include obtaining brain tissue blocks by use of a cryostat, whereby the brain is frozen and then sliced at variable thickness from which selected brain areas can be punched out. These samples comprise a heterogeneous collection of cell types, each of which is characterized by distinct molecular compositions. Moreover, each cell type could react to perturbations (such as disease) differently. A large fold change in the expression of a particular gene can be diluted considerably if the cell type expressing the gene represents only a fraction of the overall population of cells being studied. Furthermore, a relevant expression change in one direction (i.e. up- regulation) in one cell type can be masked by regulation in the opposite direction (i.e. down-regulation) by other cell types in the same sample. To reduce the effects of tissue heterogeneity on gene expression, some recent studies of schizophrenia [112-114] have employed Laser Capture Microdissection (LCM)

[115], a technique which allows the investigation of specific cell types within tissue sections.

1.4.2 Tissue Quality

Because of the concern that tissue degrades after death, it is routine to assess the integrity of the tissue using parameters such as the brain pH, postmortem interval and neuropathological assessment. Brain pH, is often used as a surrogate measure for RNA quality. It has been well demonstrated that brain pH

15

positively correlates with mRNA preservation in postmortem tissue [116-118], and as such some reports have suggested using a pH cutoff (< 5.9-6.0) to select samples for further analyses [119]. However, postmortem brain is a limited resource and removal of samples is often considered a last resort.

Postmortem interval (PMI) is defined as the time that has elapsed between the time of death and the time at which the samples are collected from the deceased. As the PMI increases RNA degradation in the sample is thought to be more likely, although in human studies PMI has not yet been shown to have a clear relationship with the quality of RNA [120]. This confusion may be due to the fact that PMI can sometimes be an estimated value, particularly in situations in which there is uncertainty of the time of death. In addition to these measures, tissue samples typically undergo a neuropathological examination to rule out the presence of any abnormalities of other brain diseases (e.g. Alzheimer’s disease) that may mimic the clinical features of psychiatric disorders. Neuropathological assessment can also characterize the presence of brain abnormalities that might affect interfere with downstream interpretation. Protocols for this assessment generally include gross and microscopic examination, but can vary across brain banks [121].

1.4.3 Clinical Quality

Another consideration for postmortem brain studies is the clinical state of the subjects pre-mortem. An accurate diagnosis of subjects is dependent on multiple sources of clinical information, which are not always readily available. Such resources include psychiatric records coupled with medical records, interviews with medical professionals, interviews with family and semi-structured diagnostic assessment tools.

Toxicology reports of medication use and illicit drug and alcohol abuse are another important component of clinical quality. These substances elicit their effects by binding receptors in the brain which could interfere with dependent measures of interest in postmortem studies. Such screening is not always conducted on subjects, and therefore not reported. In the case of medication analysis, costs can be prohibitive for antipsychotics which are not part of routine assays [121]. Tests that are conducted at time

16

of death give data that is not necessarily representative of lifetime usage. Typically, the effects of antipsychotics are handled in one of several ways; each of which is listed here with its associated limitation.1) Lifetime exposure of antipsychotics can be estimated based on medical records of prescription and incorporated into analyses, but is not necessarily accurate as there are often problems with patient adherence. 2) Studies aim to use subjects that are ‘off’ medications for a period before death

[122], although being off medication for a certain period of time cannot reverse the potential changes induced by medication received prior to the ‘off’ phase. 3) The effects of medication are evaluated in an animal model (usually rodents), in which medication is administered for a given number of weeks [123].

Expression changes found are then cross-referenced with gene lists obtained from the postmortem brain study to identify genes altered secondary to a medication effect. The limitation here is that treatment in the animal model is of short duration and is not exactly comparable to treatment courses in humans.

Cause of death is also a clinical measure that can profoundly affect the integrity of brain tissue. It is a measure which is less of a concern for subjects who died suddenly outside a medical care setting (e.g. automobile accident), compared to individuals who died following a prolonged illness that involved medical interventions (e.g. cancer). Prolonged agonal states have been shown to yield lower tissue pH and decreases in the expression level of genes involved in energy and proteolytic activities and increased levels of stress response genes and transcription factors [124]. Agonal states have been clearly associated with pH and RNA integrity [125], and subsequently a rating system based on agonal duration has even been developed [126].

1.4.4 Demographic Data

Gene expression will inevitably vary from one individual to another in ways that are not related to neuropsychiatric status. Subjects used in postmortem studies have diverse genetic backgrounds, different lifestyles and have been exposed to various environmental influences. Thus there are a number of extraneous factors that cannot be controlled for no matter how well thought out the experimental design.

However, two demographic factors that are commonly measured and considered are age and sex. These

17

factors have been found to be associated with large expression changes in the brain [127, 128], thus can potentially mask or masquerade as the disease effect if not properly controlled for. It is routine to match samples across conditions by these factors, which can result in the removal of samples from the cohort.

Alternatively, one could reduce the effects by adjusting for these factors during statistical analysis. Other demographic factors that have more recently been incorporated in postmortem brain studies include race/ethnicity of the subjects, and measures of lifetime smoking.

1.5 Gene Expression Profiling

Genome-wide RNA expression profiling enables a large-scale approach to identifying molecular changes associated with a given condition. The RNA expression profile of a sample represents a static view of global gene expression which can then be used to compare against profiles from a different environment

(diseased vs. control, treated vs. non-treated, etc.). These changes can be primary or secondary.

Primary changes occur in response to sequence level changes (i.e. SNPs or mutations in regulatory or coding region) or environmental changes. Secondary expression changes are transcript level changes that are a consequence of the primary genetic and environmental factors. Thus, gene expression data can reveal unanticipated biological relationships from which we can then generate hypotheses.

Expression profiling is frequently applied to the study of the human central nervous system [1, 2, 129].

With the accumulation of expression data on human brain tissue and its potential impact on psychiatric research, it is important to evaluate the current standing of the available technologies.

1.5.1 Profiling Technologies

Expression profiling technologies can be classified into one of two categories: 1) sequencing-based approaches or 2) hybridization-based approaches. The focus of this thesis is on the latter, however here we will discuss briefly on both. Serial analysis of gene expression (SAGE) [130] was the first reported sequencing-based high-throughput method for expression profiling followed by massively parallel signature sequencing (MPSS) [131]. These methods work by generating a short sequence tag for

18

transcripts. The tag is a short stretch of nucleotides adjacent to the 3’ most site of specific restriction enzyme in a transcript. Gene expression is then measured by counting these tags. MPSS can generate a larger number of signatures in a single run, providing better coverage than SAGE. Though both of these methods have several advantages, neither was as well adopted as microarrays at the time of my research. DNA microarrays are based on probes which are immobilized in an ordered two-dimensional pattern on substrates, such as nylon membranes or glass slides. Probes are usually designed to be specific for an organism and can cover the whole genome. Measuring the amount of hybridization between immobilized probes and mRNA sequences in the sample indicates the amount of signal or gene expression.

1.5.2 RNA Quality

It is critical to obtain samples with a high level of RNA quality, irrespective of the technology being used.

While RNA in brain is relatively stable, RNA quality does vary and must be controlled for [132, 133]. A classic measure of RNA quality is the 28S/18S ratio in which ribosomal RNAs are quantified, and 28S is expected to be twice that of 18S. However, this has been called into question as unrepresentative for a couple of reasons. First, since rRNA and mRNA differ structurally it is likely the two would also result in different in situ stability. Also, this approach is subjective as it relies on human interpretation of gel images and therefore not comparable from one lab to another. Another measure involves the signal from the 3’ and 5’ ends of a housekeeping gene transcript (i.e. GAPDH) and taking the ratio of the expression levels.

Generally, with this measure a ratio close to one would advocate good integrity of the sample. A recently developed and more commonly used measure is the RNA integrity number (RIN, generated using a software tool [134]. The RIN value is calculated using the entire electrophoretic trace of the RNA sample, including the presence or absence of degradation products.

1.5.3 Limitations of Microarrays

Microarrays have become a popular tool and come in different flavours, including arrays with probes representing only coding regions, exons, SNPs, and the option of custom arrays. However, they do have

19

their limitations. First, the effectiveness of a microarray is directly affected by the quality of genome annotations. Without a priori knowledge of the genome, analysis can be severely hampered. Second, non-specific cross-hybridization can occur between similar sequences, although there have been efforts to design probes less prone to cross-hybridization [135] and tools to infer the extent of cross-hybridization in resulting data [136]. Quality control methods are applied to ensure removal of outlier assays that contribute to noise in the data. To assess the data quality, various methods have been proposed by the chip manufacturers and have become standard protocol for microarray technology. Some examples include, a consistent scaling factor (related to the overall intensity of the chip), the use of internal and external spiked in controls and inter-sample correlation analysis. These among other measures allow us to gauge detection level and sensitivity after hybridization on the chip.

The multiplicity of different platforms for measuring expression presents further challenges in comparing results across labs. Each platform is built on the same general principle but differ from one another in their building strategy (i.e. in situ synthesis vs. deposition), probe length (i.e. short oligonucleotides vs. long cDNA arrays), probe labeling (radioactivity, fluorochrome incorporation, etc.), sequence representation (i.e. assay different genes), and array hybridization strategies (i.e. one colour vs. two colour). Array hybridization is a sensitive procedure involving reagents and hardware of which the quality can vary between labs. Differences in lab conditions and protocols for sample preparation can also contribute to the variation observed between datasets, creating potential ‘lab effects’. The MicroArray

Quality Control (MAQC) project [137] was initiated to address concerns surrounding technical differences that may arise due to platform variation as well as the reliability of cross-laboratory microarray studies, and other performance and data analysis issues. They showed that when comparable methods were applied, a high level of both intra-platform consistency and inter-platform concordance resulted with respect to the genes identified as differentially expressed [137]. There is also concern regarding differences in the methods used for data processing such as, image segmentation, signal intensity measurement (accounting for background signal), probeset summarization, and normalization of data

(within the array or across all arrays used in experiment). For each of these steps there are multiple algorithms (for example, MAS5.0 present/absent calls [138], RMA [139], gcRMA [140]) that can be 20

applied. Different published studies utilize combinations of methods which may contribute to different outcomes. Another concern is the issue of non-biological variation caused by ‘batch effects’. Practical considerations limit the number of samples that can be hybridized at one time, thus samples from the same experiment can be run several days or weeks apart. Arrays run on the same day may share preparation conditions, making arrays run in different batches (different days) not directly comparable. It is necessary to detect and control for these effects [141].

A possible fix to some the drawbacks of using microarrays is a move towards next-generation sequencing

(NGS) based approaches. One example that is commonly used is RNA-Seq, a method that analyzes cDNA by means of NGS methods and subsequently mapping short sequence reads onto the reference genome. The sample is sequenced directly and thereby not dependent on user-defined sequences, removing issues of cross-hybridization and experimental bias from the data. Moreover, quantification of signal is based on counting sequence tags rather than relative measures between samples. Notably,

NGS approaches also come with their own limitations (i.e. effective rRNA removal, and development of appropriate data analysis tools).

1.5.4 Differential Expression

Differential expression refers to the identification of meaningful changes in levels of gene expression across two or more conditions. In a simple case, we would look at one gene across two different conditions (i.e. control versus disease). This type of analysis is usually preformed using a two-sample t- test or a Wilcoxon rank-sum test. A more complex scenario would involve evaluating expression of a gene across multiple conditions, each having different factor levels (i.e drug dosage and case/control).

This type of analysis would usually require more sophisticated statistical approaches for example, linear modeling using an analysis of variance (ANOVA) to identify significance of change with each factor. High- throughput genome-wide datasets, such as microarrays, contain thousands of genes on which the statistical test must be applied. A major issue that arises when testing multiple genes is inflation of the false-positive rate; this is called the “multiple testing” problem. To deal with this problem, p-values must

21

be adjusted upwards to compensate and reduce the number of false positives. Different methods for multiple test correction exist, some examples include Bonferroni correction, Benjamini-Hochberg [142] and q-value [143]. The current literature of microarray studies in schizophrenia and other psychiatric illnesses have not applied multiple testing to their datasets as it often results in uninformative, small gene lists. These studies instead use extended gene lists with less stringent criteria to allow for a more inclusive understanding of affected pathways of pathology.

1.5.5 Coexpression

Coexpression refers to genes that have correlated expression patterns across a set of samples. Similarity is measured by the Pearson correlation coefficient (or some other metric) across all possible gene pairs within a microarray dataset, generating a symmetric matrix of correlation values. Often the associations established from coexpression analysis are represented as a network. A gene coexpression network is a graph in which nodes represent genes and edges between nodes indicate the two genes are coexpressed. Unlike other biological networks, whose edges represent well-defined biological interactions, the edges in a coexpression network reflect the correlation structure of the data. A connection between two genes should not be mistaken to suggest a physical interaction between them.

The network is specified by an adjacency matrix, with values of the matrix corresponding to edge weights.

Network construction requires an important decision to be made in considering which correlations are relevant enough to constitute a connection between two genes [144]. For un-weighted networks, hard thresholding is applied, which involves discarding edges (gene pairs) below a given threshold. Selection of the threshold can be based on the actual similarity values, or rank-transformed similarities [145]. The threshold can be arbitrarily chosen, or by controlling for statistical significance of similarities [146, 147].

For gene pairs that meet the threshold, correlation values are converted to a value of 1 and all other genes assigned zero, resulting in a sparse binary matrix. Alternatively, by raising correlation values to a power β ≥ 1 one can generate a weighted network also referred to as soft-thresholding [148]. By raising correlations to a power, each gene pair retains a value with an emphasis of high correlations at the expense of low correlations. A common next step is to extract gene relationships from the coexpression

22

network to uncover gene function. Genes that are highly similar are thought to reflect a functional relationship; a concept that has been termed guilt-by-association (GBA) [149]. By evaluating subtle but coordinated changes in expression across multiple genes in the network, we can extract highly connected clusters and characterize them into functional groups based on over-representation of pathway-specific genes. Moreover, for uncharacterized genes that associate with these groups we may be able to assign putative function.

1.5.6 Network Analysis

A network representation of gene coexpression data makes it amenable to mathematical analysis. We can then apply tools of graph theory to characterize various structural properties of the network. Here I discuss only those properties that are most relevant to my thesis. An important measure of a node in a network is how many other nodes it is directly connected to, also known as the node degree. If the distribution of node degrees in a network follows a power law, the network is defined as ‘scale-free’ [150].

Many quantities in nature follow a unimodal distribution, whereby there is a characteristic scale that is embodied by the mean and the singe mode. The significance of the scale-free topology is that there is no characteristic node degree; most nodes in the network are scarcely-connected, while few are highly- connected ‘hubs’. The hubs in a network represent particularly important nodes, dominating the structure of networks in which they are present. In a biological network, they often reflect genes that are involved in multiple processes and are crucial to the functioning of the cell. Thus, the alteration of hub genes can have more severe effects than changes made to lower degree nodes. The presence of high degree nodes can also impose ‘small-world’ connectivity of the network, whereby most nodes in the graph can be reached by another through a small number of steps. Since hubs have links to an unusually large number of nodes in the network, they create shorter paths between any two nodes. Small-world networks tend to contain cliques or highly clustered sub-networks, modeled by small shortest path lengths between nodes and high cluster coefficients [151]. Shortest path length is defined as the minimal number of edges that need to be traversed to travel from one node to another, typically computed using Dijkstra’s algorithm

[152]. The cluster coefficient describes the degree to which the neighbours of a node tend to cluster

23

together. Values range from 0 (indicating none of the neighbours are connected to each other) to 1

(indicating all neighbors of a node are also connected to each other). Together, these measures give us an idea of the overall organization of the network. A number of studies have analyzed these topological properties in gene coexpression networks and have shown that they exhibit the ‘small-world’ and ‘scale- free’ properties [153-155], as do many other biological networks (i.e protein-protein interaction (PPI) networks). Although recent literature suggests that perhaps it is a heavy-tailed distribution that is prominent but not necessarily a ‘scale-free’ fit [156]. Furthermore, dissection of networks into smaller sub- structures can reveal greater insight into biological function. These sub-networks, also called communities or modules, represent a group of nodes that are more densely connected to each other than to nodes outside the group. Searching for modules within a network is a difficult task and while there are available methods there is no efficient algorithm for doing so. A number of different methods have been proposed for identifying sub-networks; some that incorporate expression data with network topology [157, 158] and others that rely strictly on network structure and node properties [148, 159, 160]. Once sub-networks are identified, we can assess them for enrichment of genes that can be attributed to a specific characteristic for example, biological function or specific pathways.

In this thesis, I exploit both frameworks of evaluating gene expression. In Chapter 2 and 3 the focus is on evaluating gene expression changes in the normal and schizophrenia brain by use of differential expression analyses; the focus is then turned to coexpression and network analysis in Chapter 4. While each of these analyses stand alone in their findings, a key feature of my thesis is the integration of the two frameworks by taking the expression changes observed through differential expression and evaluating them in the context of functional modules derived from the coexpression networks.

1.6 Meta-analysis

Meta-analysis provides an integrative data analysis method, enabling us to extract more value from a collection of individual studies. It is commonplace in a single dataset study to use the microarray as a screening tool and then validate a few differentially expressed genes of interest using techniques such as

24

reverse-transcription PCR (RT-PCR). By conducting a meta-analysis of multiple datasets, we can essentially validate and statistically assess all of the positive results simultaneously to yield a significant gene set. Data are combined across studies and evaluated in a single study to yield a more precise estimate of effect. The overwhelming accumulation of available transcriptomic data (particularly microarrays) in the last decade [161-163], has resulted in a corresponding spike in the number of meta- analyses being conducted across these data [164]. Statistical methods can be applied to combine information across datasets to increase power and find results with increased sensitivity. There are many proposed approaches to meta-analysis, each depending on the type of data being used and the biological purpose. In this section I focus on methods specific to my thesis and describe below the current literature for meta-analysis of differential expression and meta-analysis of coexpression.

1.6.1 Meta-analysis of Differential Expression

There are two general approaches of meta-analysis that are commonly used for cross-study microarray data comparisons, ‘relative’ and ‘absolute’ [165]. A ‘relative’ meta-analysis involves the aggregation or comparison of per-gene result values across multiple studies to estimate an overall summary statistic of the effect. The gene result value reflects the relationship between a gene and the phenotype(s) of interest within the dataset. One example of this is the Fisher’s inverse chi-squared method [166]. P-values representing significance of differential expression for each gene are combined across studies to generate a list of summary statistics. A variation on the Fisher’s method has been proposed, in which p- values are weighted and only those that meet a specified threshold are considered in the computation

[167]. Another example is integrative correlation analysis, demonstrated by Parmigiani et al. [168] in a study using various histological types of lung cancer samples. This method is based on the notion that consistency of correlation across datasets should reflect overall consistency of datasets. The ‘rank product’ approach proposed by Breitling et al [169] provides a non-parametric method whereby fold change values are computed for all genes and converted to ranks within each dataset. Ranks are then aggregated to produce an overall score for each gene across datasets. Choi et al [170] describe an approach to combine estimated effect sizes across datasets. The effect size, a standardized index

25

measuring the magnitude of effect between case and control, is computed per gene within a dataset.

Effect sizes are then combined across datasets using either a fixed effects model (FEM) or random effects model (REM) to enable modeling of inter-study variation. Other ‘relative’ approaches for meta- analysis of microarray are described in more detail in [165].

The ‘absolute’ approaches for meta-analysis involve combining the raw or transformed data from multiple studies. Multiple datasets are thus considered as a single merged dataset for further analyses. Traditional microarray methods for differential expression can then be applied to the merged dataset. With a large combined sample size, there is added power to the statistical tests that are performed on the merged data to find expression changes with increased reliability. Moreover, when sample-specific covariates need to be considered, these approaches are more effective than the ‘relative’ class of methods. An example of an ‘absolute’ approach in the literature is demonstrated by Dawany and Tozeren [171], in which the significance analysis of microarrays (SAM) test [172] was the choice of differential expression measure applied to a merged dataset of normal and cancer tissue types. Other examples include a two- stage ANOVA model approach [173], and a linear mixed effects model (MEM) [174, 175]. In Chapter 3 of this thesis I apply FEM and MEM to my data, therefore I briefly outline each in the paragraph below.

Ideally one wants a statistical model that explains the data well, with a minimum number of parameters and assumptions. Often the disease effect will vary as a function of study-level covariates such as age, sex etc. A proper synthesis requires one to understand how the disease effect varies as a function of these variables. Using the FEM we are trying to model observed gene expression values in terms of covariates that are treated as if the quantities were non-random. These variables are termed ‘fixed effects’, influencing only the mean of the expression data as they are sampled from a defined set of quantities. In a MEM, some of the variables are treated as fixed effects, while others are treated as if they arise from random causes drawn from a larger population. These random effects often have uninformative factor levels, and there is no need to estimate means of a small subset of factor levels.

Therefore for ‘random effects’ we estimate the influence of variance around the true mean value of

26

expression. Deciding on how to model explanatory variables can be tricky, and we discuss some of the challenges we encountered in Chapter 3.

There are good motivations to apply meta-analysis of differential expression to psychiatric studies.

Sample sizes of typical postmortem brain microarray datasets are fairly small, as brain tissue is generally hard to obtain. Additionally, the disease effect is small, making it difficult to distinguish biological signal from noise. Combining data across studies, allows for greater statistical power to more reliably estimate an average effect or highlight subtle variation not easily evaluated in single dataset studies.

1.6.2 Meta-analysis of Coexpression Networks

Gene coexpression networks represent relationships between genes that are based on a matrix of pair- wise correlations between genes in the dataset. Because microarray data are noisy there has been much interest in the reproducibility of coexpression patterns between microarray datasets. Lee et al [147] demonstrate that patterns of coexpression that can be confirmed across multiple studies are more likely to be functionally relevant. Thus, it seems only natural to extend coexpression networks from the single dataset level to the meta-analysis scenario. One approach to combining coexpression evidence across studies involves vote counting. Reliability of gene pairs is assessed by confirmations across multiple datasets, and statistical significance is estimated based on randomized networks [147, 176]. Other studies have adapted meta-analytical approaches originally applied to differential expression analysis, for use on coexpression data. For example the Fisher’s method can be applied to correlation coefficient p- values [177], or effect size based methods can be employed by converting correlation values to z-scores as demonstrated by Choi et al [178]. Moreover, a number of studies apply meta-analytical approaches by incorporating a priori knowledge of gene sets with some expected functional relationships, for example

Gene Ontology (GO) annotations [160], pathway annotations or tissue specificity [179].

Another commonly observed protocol is to merge datasets and construct a network as if it were a single study [180, 181]. Datasets are combined at the level of expression data to obtain a merged matrix of samples across different studies using the same platform. Pearson correlation values are computed for all 27

probe pairs, and a network graph of the data is generated based on a user-defined threshold [180]. Ucar et al [181], apply a rank-based methodology to the resulting coexpression matrix to generate probe-pair reliability scores. A network is then constructed using probe pairs with reliability scores above a given threshold that is determined by the false-discover rate (FDR) cutoff. The method I use in Chapter 4 of this thesis constructs a single network across studies by aggregating data at the level of coexpression matrices. Individual coexpression matrices are noisy and thus by aggregating data we can obtain a clearer signal, improving performance of the resulting network [182]. For each dataset, a similarity matrix is computed for each cohort by taking the absolute value of the Pearson correlation between all possible gene pairs. Correlation values were replaced by ranks. These similarity rank matrices were aggregated across datasets by taking the mean rank for each gene pair. The aggregated matrix is then thresholded at

0.5% sparsity to obtain network connections. Gillis and Pavlidis [182] showed that this aggregation procedure is a robust method for producing high-quality coexpression networks.

An extension of coexpression network analysis is to identify sets of genes for which coherence of expression profiles is altered between different conditions or ‘differential coexpression’. This sort of analysis allows us to exploit condition-specific patterns of coexpression. Coexpressed pairs are identified across samples representing the normal state and are compared to patterns observed in the diseased state. Differential coexpression patterns could indicate disruption of a common regulatory mechanism, or dysregulation of a particular cellular process, amongst other things. Coexpression networks have been used in this context on few postmortem human brain studies, to identify disease mediated changes in network connectivity associated with neuropsychiatric illnesses such as depression [10], schizophrenia

[13], and autism spectrum disorder [183].

28

Table 1: DSM-IV-TR Diagnostic criteria for schizophrenia A. Characteristic Symptoms: Two or more of the following, each present for a significant portion of time during a 1-month period (or less if successfully treated). 1) delusions 2) hallucinations 3) disorganized speech 4) grossly disorganized or catatonic behaviour 5) negative symptoms, i.e., affective flattening, alogia, or avolition (lack of drive)

B. Social/Occupational Dysfunction: For a significant portion of time since the onset of the disturbance, one or major areas of functioning such as work, interpersonal relations, or self-care are markedly below the level achieved prior to the onset (or there is a failure to achieve expected level)

C. Duration: Continuous signs of the disturbance persist for at least 6 months. This 6-month period must include at least one month of symptoms that meet Criterion A (i.e. active phase symptoms) and may include periods of prodromal or residual symptoms. During these prodromal or residual periods, the signs of the disturbance may be manifested by only negative symptoms or two or more symptoms listed in Criterion A present in an attenuated form.

D. Schizoaffective and Mood Disorder exclusion These disorders can be ruled out because either 1) no Major Depressive, Manic or Mixed Episodes have occurred concurrently with the active-phase symptoms or 2) if mood episodes have occurred during active-phase symptoms, their total duration has been brief relative to the active and residual periods.

E. Substance/general medical condition exclusion The disturbance is not due to the direct physiological effects of a substance (e.g., a drug of abuse, a medication) or a general medical condition

F. Relationship to a Pervasive Developmental Disorder If there is a history of Autistic Disorder or another Pervasive Developmental Disorder, the additional diagnosis of schizophrenia is made only if prominent delusions and hallucinations are also present for at least a month

Subtypes of Schizophrenia: 1) Paranoid type 2) Undifferentiated type 3) Disorganized type 4) Residual type 5) Catatonic type

29

Table 2: Candidate genes in schizophrenia

Gene Description Function Cytogenic band * NRG1 neuregulin 1 Signaling protein with critical roles in 8p12 growth and development DTNBP1 dysbindin Vesicle trafficking 6p22.3 RGS4 regulator of G-protein signaling Signal transduction of GPCR; modulate 1q23.3 4 neurotransmission COMT catechol-O-methyltransferase Degradation of catecholamine 22q11.21 transmitters (i.e. dopamine) DISC1 disrupted in schizophrenia 1 Neurite outgrowth and cortical 1q42.1 development AKT1 v-akt murine thymoma viral - protein kinase 14q32.32 oncogene homolog 1 PPP3CC protein phosphatase 3, catalytic Protein phosphatase involved in the 8p21.3 subunit, gamma isozyme downstream regulation of dopaminergic signal transduction DRD2 dopamine receptor D2 D2 subtype of dopamine receptor 11q23 DAOA/G72 D-amino acid oxidase activator Activates DAO which degrades the 13q33.2 gliotransmitter D-serine NRGN neurogranin Post-synaptic protein kinase substrate 11q24 that binds calmodulin in the absence of calcium PGBD1 piggyback transposable Transposase specifically expressed in 6p22.1 element derived 1 the brain PRSS16 protease, serine, 16 (thymus) Role in alternative antigen presenting 6p21 pathway PDE4B phosphodiesterase 4B Regulation of second messengers 1p31 TCF4 transcription factor 4 Transcription factor with possible role in 18q21.1 nervous system development DRD4 dopamine receptor D4 D4 subtype of dopamine receptor 11p15.5 (GPCR) NOTCH4 notch4 Controlling cell fate decisions 6p21.3 TPH1 hydroxylase 1 Biosynthesis of serotonin 11p15.3 HTR2A 5-hydroxytryptamine (serotonin) Serotonin receptor 13q14 receptor 2A MDGA1 MAM domain containing Possible brain development role 6p21 glycosylphosphatidylinositol anchor 1 APOE apolipoprotein E Catabolism of lipoprotein constituents 19q13.2 A list of candidate schizophrenia genes identified based on genetic studies as listed in the top 45 list of the SZGene database. Genes highlighted in grey are of high epidemiological credibility on the basis of amount of evidence, consistency of replication, and protection from bias. Credibility is assigned based on meta-analysis results found at www.szgene.org . *Not in SZGene top list. 30

Figure 1: A simple schematic of mesolimbic and mesocortical circuitry This figure has been used with permission from Piomelli, Nature Medicine 2001 [184] to illustrate the mesolimbic and mesocortical dopamine pathways in the brain. The mesocortical pathway can be seen as projecting from the ventral tegmental area (VTA) to the prefrontal cortex (PFC). The mesolimbic pathway begins in the VTA and connects to the nucleus accumbens via limbic structures including the hippocampus, amygdala.

31

1.7 Thesis Chapters Summary

The general aim of this thesis is to identify changes in gene expression in the normal and diseased postmortem human brain, by applying a variety of meta-analytical techniques. In each chapter I have conducted a cross-laboratory meta-analysis across a number of independent microarray studies, carefully controlling for sources of variation where possible. Combining studies increases the total sample size and subsequently increases statistical power to find changes that might not have been considered significant in any one single study.

In Chapter 2, I describe gene expression changes associated with age, sex, brain pH and PMI using eleven independent microarray studies of the normal human cortex. Each dataset was first analyzed independently, and the results combined across studies using the Fisher’s method [166]. For each factor a characteristic meta-signature of genes was identified, highlighting specific transcriptional changes which implicate an assortment of critical cellular processes. We found a significant overlap between the meta- signatures with independent gene lists extracted from the literature, but also identify a large proportion of genes identified as significantly changed only through meta-analysis. In addition, many previously proposed schizophrenia candidate genes appear in the meta-signatures, reinforcing the idea that studies must be carefully controlled for interactions between these factors and disease.

Chapter 3 focuses on differential expression patterns in the prefrontal cortex of individuals with schizophrenia compared to unaffected controls. Expression data was combined across seven microarray datasets forming a final cohort of 153 affected and 153 control individuals. Using an FEM, disease associated changes were extracted on a probe-by-probe basis with careful control for factors investigated in Chapter 2. The combined analysis revealed a schizophrenia meta-signature of 39 probes up-regulated in schizophrenia and 86 down-regulated. Gene expression changes associated with aspects of neuronal communication, and alterations of processes affected as a consequence of changes in synaptic functioning were observed. Some of these genes have been previously identified in expression profiling

32

studies, while others are novel to our analysis. A network analysis using a large protein-protein interaction network, predicts previously unidentified functional relationships among the signature genes.

Chapter 4 builds on the findings from Chapter 3, with the major goal being to explore schizophrenia from a network perspective. Coexpression was evaluated, using the same seven microarray datasets with samples split into cohorts of subjects with schizophrenia and unaffected controls. Coexpression matrices for each cohort were then aggregated across datasets using a rank-based approach, to generate a network representation of the control and schizophrenic prefrontal cortex. Differences between the two networks at a global level are small, suggesting that the overall coexpression structure is retained in the brain between normal and diseased states. Using the two networks we analyzed differential coexpression, by evaluating network properties of the schizophrenia meta-signature identified in Chapter

3. The meta-signature genes exhibit coexpression network properties not observed for other functional gene groups or other brain-related disease gene groups. Finally, each of the networks was clustered into high density sub-networks and we evaluated meta-signature genes in the context of functionally distinct gene complexes.

33

Chapter 2: Meta-analysis of the normal human postmortem brain1

2.1 Introduction

Many studies have applied genome-wide expression analysis to human postmortem brain tissue with aims to identify changes in gene expression associated with neuropsychiatric disease [185]. Human brain tissue presents a particular challenge for the analysis of gene expression. The variability between individuals and heterogeneity of the tissue (different cell types), make the detection of small expression changes difficult. It is routine to match samples across conditions and check for confounding effects of sex, age and other factors. However, this is not always easy, as postmortem brain tissue is a limited resource and often sample sizes are small. Another common method of reducing the effects of these factors involves adjustment during data analysis. These methods include stratification of samples or implementation of statistical techniques based on observed covariate distributions in the compared populations. However, many studies are underpowered to detect genes so affected. This greatly complicates the detection of molecular changes associated with neuropsychiatric disorders such as schizophrenia and bipolar disorder [125].

It is therefore important to understand the effects of factors such as age, sex, brain pH and PMI on gene expression in the postmortem brain. This information will allow us to control for confounding sources of variability when seeking disease effects, and provide a means of elucidating biologically interesting patterns due to the factors themselves. A number of studies have examined expression differences associated with age [127, 186], sex [128, 187, 188] and brain pH [124, 189]. Because of small samples sizes and the presence of noise, our knowledge of gene expression changes associated with these factors is likely to be incomplete.

1 A version of this chapter has been published. (Mistry M, Pavlidis P (2010). A cross-laboratory comparison of expression profiling data from normal postmortem human brain. Neuroscience 167:2. 384-95 doi:10.1016/j.neuroscience.2010.01.016).

34

One approach to detecting weak patterns is to use meta-analysis. In a meta-analysis, the results from multiple studies are statistically pooled to provide an overall estimate of significance of an effect. While meta-analysis has been increasingly used in the study of gene expression data [190-192], to our knowledge only a few studies have done so with postmortem human brain data [7-9].

In this chapter, I have conducted a large cross-laboratory meta-analysis of human postmortem brain data by integrating expression data from multiple studies, rather than a simple comparative analysis of published gene lists. The primary focus of this chapter is to examine gene expression changes in the normal human brain with respect to four factors: age, sex, PMI and brain pH. While many studies treat these factors as a nuisance and attempt to limit their range or control for them, we show that considerable variability in gene expression exists due to these factors. The results from this chapter provide new information on gene expression changes attributable to these factors, and will be useful for future postmortem brain expression studies of neuropsychiatric illness.

2.2 Methods

2.2.1 Data Collection

Genome-wide expression data sets were selected on the basis of public availability, inclusion of normal subjects, use of neocortical tissue, and the availability of sample characteristic data. Details on each of the eleven datasets, including the source citation, can be found in Table 3. Sources include the Stanley

Medical Research Institute (SMRI), the Harvard Brain Bank, and the Gene Expression Omnibus (GEO).

GEO studies were identified by extensive manual and keyword searches. From the available 12 SMRI studies, only two were selected to represent each of the two SMRI brain collections; as the additional data sets represent repeated runs of samples from the same subjects. Sample characteristics for the normal subjects within each dataset were collected (see Table 4 for a summary). Datasets consisted of single-channel intensity data generated from various Affymetrix platforms and one dataset from the

Illuimina HumanRef-8 BeadArray platform. For 8 out of the 11 datasets we obtained pre-processed data in which the expression levels were summarized, log transformed and normalized by using the ’rma’ 35

function in the R bioconductor ‘affy’ package [193]. Where possible, we obtained the raw data (.cel files) for the remaining datasets and reprocessed it using the ‘rma’ function. For one study in which the raw data was not available, we used the data in its given format.

2.2.2 Regression Analysis

Gene expression for each probe, in each dataset, was modeled as a function of each of the factors (age, sex, pH, and PMI). P-values were computed using one-sided tests, performed independently for the two alternative null hypotheses. To make a fair comparison across studies, we re-annotated probe sequences for each array and mapped them to the corresponding GenBank gene using the Gemma database

(http://www.chibi.ubc.ca/Gemma/ ). Probes which were annotated to more than one gene were removed from consideration. For cases in which multiple probes mapped to a single gene, we combined p-values by retaining only the minimum p-value. Analyses were conducted in R [193] for which the code is available at http://www.chibi.ubc.ca/postmortem-brain.

2.2.3 Meta-analysis of Differential Expression

The following meta-analysis was carried out for each of the four factors, and each hypothesis independently. We computed a summary statistic S across n studies for each gene t using Fisher’s method [194], which has been used previously in other microarray meta-analyses [195, 196]

,

th where pi is the regression p-value in the i experiment. A given gene was included in the analysis given it was measured in at least three datasets and the particular sample characteristic (i.e. age, pH) was reported. A p-value for S(t) is computed by observing that, under the null hypothesis of uniform p-values within each study, S(t) has a χ2 distribution with 2n degrees of freedom. The meta-analysis p-values for each signature were processed with the R ‘qvalue’ package to control the false discovery rate, yielding a q-value measure for each gene [143].

36

2.2.4 Validation Analysis

We extracted gene lists from the postmortem brain gene expression literature for age [186], sex [187] and brain pH [189]. Each set consisted of a list of probes (Affymetrix probe sets) differentially expressed in the human postmortem brain as reported in their respective studies, which were then split based on direction of change (i.e. up-regulated or down-regulated). Each probe was mapped to its corresponding gene using

Gemma. Genes were removed if they were not included in our meta-analysis. Agreement of the meta- signature ranking with the respective validation set was performed using receiver operating characteristic

(ROC) curve analysis. A meta-signature with an area under the ROC curve (AUC) score closer to 1.0 indicates many genes in the validation set are near the top of the respective ranked list. On the other hand, a score closer to 0.5 reflects that the validation gene set is randomly distributed across the ranking.

Each meta-signature was further analyzed for functional enrichment of GO terms [197], using the ‘over- representation analysis’ (ORA) method in ErmineJ [198]. ORA evaluates the genes that meet a specified selection criterion (meta-q < 0.001) and determines if there are gene sets which are statistically over- represented. Probabilities were computed using the binomial approximation to the hypergeometric distribution and then corrected for multiple testing using the Benjamini-Hochberg procedure.

2.3 Results

We first assessed global levels of gene expression across datasets by assigning rank values to each gene based on its mean expression value within a dataset. While we observed variation between studies, there still emerged a clear pattern of genes which were consistently strongly or weakly expressed in the brain supporting the feasibility of comparing studies to one another.

To evaluate gene expression changes with respect to four factors (age, sex, brain pH, and PMI), we used linear regression within each dataset, for each factor individually. We considered both directions of change (up- and down-regulation) for each factor, creating up to eight different scenarios for each dataset. Although the focus of this chapter is to report on the results from the meta-analysis across

37

datasets, we briefly summarize here the results from individual studies. The numbers of genes that show evidence of change with each of the factors (q < 0.001) is given in Table 5. Not surprisingly, the datasets with smaller sample sizes showed fewer statistically significant changes associated with the factors.

Overall, the factors associated with the most differential expression were age and brain pH.

To examine changes in gene expression that were consistent across all datasets, or supported by evidence from multiple data sets, we implemented a cross-study meta-analysis approach. The output of this analysis was eight meta-signatures (up and down for each of the four factors). The top ten genes from each meta-signature can be found in Table 6 and full lists (at meta-q < 0.001) can be found in

Appendix A. To examine the results, we first extracted the corresponding p-values from each individual dataset and visualized them as (smoothed) plots in the order determined by the meta-signature (Figure

2).We observed that genes that have good meta-q-values tended to have good p-values in multiple, but not necessarily all studies. More detailed results are plotted for some example genes in Figure 3, illustrating that p-values for a given gene can vary across individual datasets. These plots demonstrate that the meta-analysis is capable of identifying significant genes even if they show weak or non-significant effects in some data sets. For example, in Figure 3, for age genes GFAP and RGS4, we observed weak changes in expression level (up and down, respectively), that are not significant after multiple test correction in those studies. On the other hand, we also found genes that show significant effects in most if not all studies (i.e., XIST, Figure 3). In Figure 4, we assembled the top 50 genes down-regulated with age, and plotted the expression levels within each dataset, with samples ordered by increasing age. For most of the studies, we observe a gradient across the dataset as gene expression decreases from high to low levels; illustrating that the meta-analysis recovers many genes which show fairly consistent trends across data sets.

While we have presented results from an analysis which treated each factor independently (linear regression), we also performed a meta-analysis which models gene expression based on the factors simultaneously in an analysis of covariance. This analysis yielded meta-signatures very similar to those identified when factors were modeled independently, with correlations of q-values ranging from 0.79 to

38

0.99. We also attempted to model interactions amongst factors, but for some datasets there was insufficient data. The majority of the data sets used in our analysis are small in sample size (≤ 30 samples), and lack the power to reliably model gene expression with so many predictors.

We tested the robustness of our meta-signatures by using a jackknife procedure. This involved sequentially removing a dataset, performing the meta-analysis on the remaining datasets, and then selecting genes at a slightly higher significance value of meta-q < 0.01. This procedure was repeated for each data set in turn, and genes found in all rounds were retained as a ‘core’ signature. Each of the ‘core’ signatures encompass more than half of the genes found in the corresponding meta-signature, with the pH meta-signatures as an exception. The ‘core’ signatures can be found in Appendix A of this thesis.

The studies we selected for meta-analysis were, in general, not designed to test the effects of age, sex, pH or PMI; in fact attempts may have been made to limit the range of these factors (especially in the case of PMI and pH). However, even across a small range there is inevitable variability of expression; and enough to enable us to perform a meaningful meta-analysis. We still questioned the extent to which our results would agree with more targeted studies, and therefore sought to validate findings from our approach. We identified independent gene lists from the literature for age, sex and brain pH (we could not find a comprehensive validation set for PMI). Each validation gene list was then separated into two groups based on the direction of change, to correspond with meta-signatures obtained from our meta- analysis. Obviously none of these validation gene lists can be considered true gold standards, but does help put our results in the context of previous findings.

To quantify the predictive power of our analysis meta-signatures with respect to the corresponding validation sets, we first performed a standard receiver operating characteristic (ROC) analysis. The score reported for each signature and its respective validation set is the area under the ROC curve (AUC), a value between 0 and 1, where 1.0 is perfect agreement with the external list and 0.5 would reflect a random order. The AUC values for each meta-signature are reported in Table 7. We also tested the effect of using a specific statistical threshold for selecting genes from the meta-signatures, by collecting genes

39

at two significance levels (meta-q <0.01, and meta-q < 0.001). The overlap with the validation set was significant (p<0.001, Fisher’s exact test; Table 7) for all signatures. We also found a comparable overlap between each of the ‘core’ signatures and the validation sets.

A brain pH validation set of genes was obtained from Vawter et al. [199], determined from fold change of controls with no agonal factors and high pH ( > 6.87), compared to controls with agonal factors and pH below 6.87. The study was carried out on two cortical regions, from which we used the dorsolateral PFC pH-sensitive genes for validation of our brain pH meta-signatures. The ROC analysis for brain pH gave high AUC scores of 0.91 and 0.86 (for up- and down-regulated genes, respectively). Additionally, we obtained reasonably high AUC scores of 0.88 and 0.89 (for up- and down-regulated genes, respectively) using a smaller independent pH gene list obtained from Mexal et al. [189], despite a difference in the brain region used between the validation study and the meta-analysis. Because pH itself probably covaries across brain regions [189], our results are consistent with the hypothesis that pH-related changes in gene expression are similar across brain regions. The age signatures on the other hand, exhibited slightly lower AUC scores than those obtained for brain pH. Erraji-Benchekroun et al. [186] used samples from dorsolateral PFC Brodmann area 9 (BA9) and orbitofrontal PFC Brodmann area 47 (BA47) from each subject, to evaluate age expression differences, showing comparable changes in both brain regions. As such, our validation set consisted of genes showing age expression changes collectively within both neocortical brain regions BA9 and BA47. While many of these genes appear at the top of our ranked lists, some are also dispersed throughout our ranking. Finally, the sex meta-signatures from our analysis also scored high when validated with a set of genes from Galfalvy et al.[187]. Although most of the validation genes appeared at the top of the ranking, it should be noted that this validation set had much fewer genes than the others.

We compared the significant genes (meta-q < 0.001) from each of our meta-signatures with genes known to be associated with schizophrenia. We extracted a list of 34 schizophrenia candidate genes provided in a comprehensive literature review [200], and searched this list of genes within each of our meta- signatures. We found that 12 of these genes identified with at least one of our meta-signatures, although

40

the majority of overlap was observed with the age and pH meta-signatures (Table 8). The overlap between schizophrenia genes and each of the age meta-signatures was significant at p < 0.01.

To derive a high-level biological interpretation of our meta-signatures we performed a GO [197] enrichment analysis using ErmineJ [198]. We extracted the ‘top’ GO categories for each of the signatures and a compared them amongst each other. Various ‘biological processes’ were found to be unique to each meta-signature. In Figure 5, we have displayed the top ten categories for each meta-signature by depicting each GO category and the associated p-value (corrected for multiple testing). For the age and pH meta-signatures we found GO terms to appear with greater significance.

For genes increasing in expression with the progression of age, top GO categories included those involved in cell growth and proliferation, and cell-cell interaction, consistent with previous studies [186,

201]. Other processes included the insulin receptor signaling pathway, encompassing a number of genes involved in longevity and aging [202]. The age down-regulated genes presented an enrichment in synaptic and/or receptor activity with GO categories such as “neuron recognition” (GO:0008038),

“neurotransmitter transport” (GO:0006836), “neurotransmitter secretion” (GO:0007269), and “regulation of neurotransmitter levels” (GO:0001505). This finding is concordant with existing aging studies in mouse and human [186, 203]. Similarly, we found an enrichment of genes involved in neuropeptide signaling in the pH up-regulated meta-signature, in addition to genes implicated in metabolism and a different array of pathways (e.g. G-protein signaling). The female and male meta-signatures identified enrichment of different terms including some sex-specific processes with the female meta-signature such as “female gamete generation” (GO:0007292), and female pregnancy (GO:0007565). Functional enrichment analysis of our meta-signatures does not provide hard cellular evidence, but still serves as a useful indication of the biological processes altered by each factor and contributes some insight at the molecular level.

Because we analyzed each factor independently, we wished to check whether the values for each factor were correlated with each other across the 415 samples (Table 9). Age and PMI displayed the highest correlation of 0.35, consistent with a positive correlation reported in [187]. Age and brain pH displayed a

41

slight negative correlation of -0.2. Further investigation of these two factors revealed that categorizing the values of age into ‘young’ (< 50 years of age) and ‘old’ (≥ 50 years of age) groups resulted in a lower mean pH in the ‘old’ group versus the ‘young’. This was the general trend within each dataset as observed in Figure 6.

Due to these correlations, we expected that individual genes in some meta-profiles might overlap with other meta-profiles (Table 10). Accordingly, the two factors that displayed the highest number of overlapping genes were those in age (up or down) and those with brain pH (down or up), respectively.

However, these effects were weak and were even weaker in the ‘core’ signatures. We also found that a number of genes up-regulated with PMI were also identified amongst the profiles for brain pH and age in both directions, but we were unable to extract any definite patterns.

42

Table 3: Human postmortem brain datasets included in control brain meta-analysis Dataset Reference Description Microarray Brain region(s) No. of Platform normal Subjects A Stanley Chen n/a Schizophrenia, HG-U133A/B DLPFC 13 Bipolar, depression (RMA)

B GSE1572 Lu et al. (2004) Aging study HG-U95vA Frontal lobe 30 (RMA) C GSE2164 Vawter et al. Gender differences HG-U95vA DLPFC 10 (2004) in expression (RMA)

D GSE3790 Hodges et al. Huntington’s HG-U133A/B Frontal cortex 36 (2006) (MAS 5.0) E GSE11882 Berchtold et al. Gender and aging HG-U133 Plus Superior Frontal 47 (2008) 2.0 Gyrus (GC-RMA) F GSE11512 Somel et al. Transcriptional HG-U133 Plus Frontal cortex 15 (2009) neoteny 2.0 (RMA) G GSE8919 Myers et al. Cortical gene Illumina Sentrix Cerebral cortex, 193 (2007) expression BeadChip temporal lobe, frontal lobe, (Illumina parietal lobe Software) H Stanley Kato Iwamoto et al. Schizophrenia, HG-U133A DLPFC 34 (2005) Bipolar (RMA) I GSE13162 Chen-Plotkin et Frontotemporal HG-U133A Frontal cortex 8 al. (2008) lobar degeneration (RMA) J GSE5390 Lockstone et al. Down Syndrome HG-U133A DLPFC 8 (2007) (RMA) K McLean PFC n/a Schizophrenia, HG-U133A PFC 27 Bipolar (RMA)

43

Table 4: Sample characteristics for control human postmortem brain datasets Dataset Age Range Male : Female PMI Range (hrs) PH Range A Stanley Chen 25-60 9 : 4 8 - 60 6.0 - 6.6 B GSE1572 26-106 18 : 12 1 - 21 n/a C GSE2164 50-82 5 : 5 9.8 - 30.75 6.12 – 6.98 D GSE3790 19-70 21 : 9 n/a n/a E GSE11882 20-99 23 : 24 2 - 12 n/a F GSE11512 16-47 10 : 5 4 - 25 6.49 – 6.96 G GSE8919 65-90 107 : 86 1.17 - 54 n/a H GSE13162 47-92 5 : 3 3.5 - 21 n/a I Stanley Kato 30-60 15 : 9 9 - 60 6 – 7.03 J GSE5390 30-60 7 : 1 32 - 61 n/a K McLean PFC 30-80 19 : 8 7.42 – 28.83 n/a

Table 5: Significant genes (q<0.01) identified within each individual dataset Dataset Age Sex pH PMI Down Up Female Male Down Up Down Up Stanley Chen 0 0 2 7 0 0 0 0 Stanley Kato 0 0 1 10 703 0 0 0 GSE1572 162 64 1 6 n/a n/a 0 0 GSE2164 0 0 1 2 0 0 0 0 GSE8919 153 133 6 19 n/a n/a 2 278 GSE11512 0 0 3 13 1 0 0 0 GSE11882 428 0 2 14 n/a n/a 1 0 GSE13162 0 0 0 0 n/a n/a 0 0 GSE5390 1 1 n/a n/a n/a n/a 0 0 McLeanPFC 0 0 1 5 0 0 n/a n/a GSE3790 0 0 1 13 n/a n/a n/a n/a

‘Union’ Signature 689 198 10 33 704 0 2 278

Meta-signature Overlap 415 102 6 19 191 0 1 46 Q-values were calculated from regression p-values for each factor within each dataset. The number of genes reported here are significant at q < 0.01. The ‘union’ signature represents the union of unique genes identified for each factor across all datasets. The meta-signature overlap indicates the number of union signature genes overlapping with the corresponding meta-signature.

44

Table 6: Top meta-signature genes for age, pH, sex and PMI

Age Down-regulated Age Up-regulated OLFM1 olfactomedin 1 NEBL nebulette KCNF1 potassium voltage-gated channel, subfamily F, member 1 MED12 mediator complex subunit 12 RGS4 regulator of G-protein signaling 4 BCL2 B-cell CLL/lymphoma 2 PPP3CB protein phosphatase 3 (formerly 2B), catalytic subunit, beta isoform GMPR guanosine monophosphate reductase ADCY2 adenylate cyclase 2 (brain) GFAP glial fibrillary acidic protein SVOP SV2 related protein homolog (rat) ADH1B alcohol dehydrogenase 1B (class I), beta polypeptide EFNB3 ephrin-B3 WWOX WW domain containing ATP2B2 ATPase, Ca++ transporting, plasma membrane 2 PLEC1 plectin 1, intermediate filament binding protein 500kDa HPCA Hippocalcin VCAN versican CALB1 calbindin 1, 28kDa AHCYL1 S-adenosylhomocysteine -like 1

pH Down-regulated pH Up-regulated FGF2 fibroblast growth factor 2 (basic) solute carrier family 1 (neuronal/epithelial high affinity glutamate transporter, SLC1A1 system Xag), member 1 AHCYL1 S-adenosylhomocysteine hydrolase-like 1 LARGE like-glycosyltransferase DTNA dystrobrevin, alpha C17orf81 chromosome 17 open reading frame 81 MAPKAPK2 mitogen-activated protein kinase-activated protein kinase 2 HISPPD2A acid phosphatase domain containing 2A TJP1 tight junction protein 1 (zona occludens 1) PRKCD protein kinase C, delta S100A13 S100 calcium binding protein A13 DLG3 discs, large homolog 3 (Drosophila) RBBP6 retinoblastoma binding protein 6 KCNAB1 potassium voltage-gated channel, shaker-related subfamily, beta member 1 GNG12 guanine nucleotide binding protein (G protein), gamma 12 SLC8A1 solute carrier family 8 (sodium/calcium exchanger), member 1 ANP32E acidic (-rich) nuclear phosphoprotein 32 family, member E GABRA5 gamma-aminobutyric acid (GABA) A receptor, alpha 5 BAALC brain and acute leukemia, cytoplasmic RIT2 Ras-like without CAAX 2

Female Up-regulated Male Up-regulated XIST X (inactive)-specific transcript (non-protein coding) JARID1D jumonji, AT rich interactive domain 1D HDHD1A haloacid dehalogenase-like hydrolase domain containing 1A USP9Y ubiquitin specific peptidase 9, Y-linked (fat facets-like, Drosophila) UTX ubiquitously transcribed tetratricopeptide repeat, EIF1AY eukaryotic translation initiation factor 1A, Y-linked JARID1C jumonji, AT rich interactive domain 1C CYorf15B chromosome Y open reading frame 15B TSIX XIST antisense RNA (non-protein coding) DDX3Y DEAD (Asp-Glu-Ala-Asp) box polypeptide 3, Y-linked USP9X ubiquitin specific peptidase 9, X-linked UTY ubiquitously transcribed tetratricopeptide repeat gene, Y-linked LOC554203 alanyl-tRNA synthetase domain containing 1 pseudogene RPS4Y1 ribosomal protein S4, Y-linked 1 STS steroid sulfatase (microsomal), isozyme S TTTY15 testis-specific transcript, Y-linked 15 ZFX zinc finger protein, X-linked CYorf15A chromosome Y open reading frame 15A 45

PNPLA4 patatin-like phospholipase domain containing 4 TMSB4Y thymosin beta 4, Y-linked

PMI Down-regulated PMI Up-regulated BRD8 bromodomain containing 8 GOSR2 golgi SNAP receptor complex member 2 RBM5 RNA binding motif protein 5 CYB5B cytochrome b5 type B (outer mitochondrial membrane) PUM2 pumilio homolog 2 (Drosophila) SNF8 SNF8, ESCRT-II complex subunit, homolog (S. cerevisiae) ARHGEF7 Rho guanine nucleotide exchange factor (GEF) 7 GRLF1 glucocorticoid receptor DNA binding factor 1

C6orf1 chromosome 6 open reading frame 1 MGMT O-6-methylguanine-DNA methyltransferase MAX MYC associated factor X ST3GAL2 ST3 beta-galactoside alpha-2,3-sialyltransferase 2 PTGER3 prostaglandin E receptor 3 (subtype EP3) EXT1 exostoses (multiple) 1

For each meta-signature we have listed the top ten genes, as ranked by meta-q-value (all at q < 0.001). For each gene we have listed the gene symbol an gene name. Complete meta-signature lists for each factor can be found in Supplementary Table 6 http://chibi.ubc.ca/~mmistry/.

46

Table 7: Comparison of meta-signature profiles against validation gene sets

No. of profile Overlap with No. of profile Overlap with AUC genes validation set genes validation set score

(q < 0.001) (q < 0.001) (q < 0.01) (q < 0.01)

Age Genes from Erraji-Benchekroun et al, 2005 Up-regulated (268) 404 40 1241 69 0.74 Down-regulated (260) 1134 113 2247 136 0.76 pH Genes from Vawter et al, 2006 Up-regulated (497) 25 11 368 131 0.91 Down-regulated (294) 215 55 1018 122 0.86

Sex Genes from Galfalvy et al, 2003 Male (13) 14 7 38 7 0.90 Female (1) 19 1 128 1 n/a

PMI Genes Up-regulated 75 n/a 691 n/a n/a Down- regulated 4 n/a 49 n/a n/a

Table 8: Schizophrenia candidate gene analysis

Schizophrenia genes identified in meta-profiles

Age Down Opcml, Pldn, Nrg1, Rgs4, Bdnf, Dlg4, Gad67 Up Ntrk2, Ppp1r1b, Erbb3 pH Down Ntrk2, Slc1a2 Up Rgs4, Gad67 PMI Up Nrg1

47

Table 9: Rank correlations between sample information

Age Sex pH PMI Age Sex -0.12** pH -0.2 * 0.17 PMI 0.35*** -0.2*** -0.06 Spearman rank correlations were computed using sample characteristic information for individual subjects. * indicates a p-value of ≤ 0.05 ** indicates a p-value of ≤ 0.01 *** indicates a p-value of << 0.001

Table 10: Evaluating gene overlap between meta-signatures

No. of Age pH PMI Sex Profile genes Up Down Up Down Up Down Female Male Age Up 404 Down 1134 2 pH Up 25 0 18 Down 215 75 1 0 PMI Up 75 2 32 0 2 Down 4 1 2 0 0 0 Sex Female 19 4 1 0 1 0 0 Male 14 0 1 0 0 0 0 0 Using genes for each meta-profile at q < 0.001, we compared them against one another to evaluate the overlap and potential relationships between the factors. We note that the age meta-signature returned genes changing in both directions. This is a consequence of multiple probe-to-gene mappings resulting in the selection of two probes of different specificity for each direction of expression change.

48

49

Figure 2: Distribution of dataset p-values across meta-signature q-values For each dataset used, gene p-values were plotted against the corresponding meta-q value and a loess fit was computed to generate a smooth curve between points. The fact that most data sets show a rise in p-values correlated with the meta-q-values indicates the contribution of signals of varying strengths to the meta-signatures. The distorted curves for gender are due to the strong effects of a small number of genes with very small meta-q-values (note the difference in scale of D compared to A-C).

50

Figure 3: Distribution of dataset p-values for individual genes: a magnified view For selected genes from each of the meta-signatures we have plotted the log regression p-values from each dataset. Open circles represent the datasets for which the gene was found to be significant after multiple test correction (q < 0.01). Dashed line indicates a per-study p-value significance level of 0.05 for reference. 51

Figure 4: Top genes down-regulated with age The top 50 age down-regulated genes were selected based on meta-analysis q-value ranking. For each gene, the corresponding data from each study was extracted and converted to a heat map. Expression values were normalized across samples within each dataset, and ordered by age. Age is plotted at the top of each heat map. Light values in heat map indicate higher expression. Grey bars indicate missing values. All data sets are at approximately the same horizontal scale except the last, which is compressed to fit on the page.

52

53

Figure 5: GO enrichment analysis For the each of the eight meta-signatures, we have displayed the top 10 GO terms identified using a GO over-representation analysis. The y-axis displays the given ‘biological process’ GO term category, while each column on the x-axis represents a meta-signature. The color scale depicts the significance of the term by the negative log10 of the corrected p-value. GO terms were collapsed to parent term if parent and child both appeared in the top ten. Grey bars indicate the absence of the term for the analysis.

Figure 6: Investigating the relationship between age and brain pH Subjects from each dataset were categorized by age into ‘young’ ( < 50 years of age) and ‘old’ ( ≥ 50 years of age) groups and pH levels were plotted. The general trend was lower pH levels in the ‘old’ group. In A and B, we have plotted values within each dataset. In C ,we have plotted the two datasets against each other, as each contains subjects from only one group (GSE11512 = ‘young’; GSE2164 = ‘old’). We see a more pronounced difference between the groups using subjects across all datasets in D. 54

2.4 Discussion

In this chapter, I have conducted a meta-analysis of gene expression in the human cortex, by examining changes that occur with respect to sex, age, postmortem interval and brain pH. This meta-analysis was made possible by the fact that many gene expression analyses have useful data for each of these factors, even though they were originally considered potential “confounds” to be controlled for. The results from this chapter have at least two potential uses for future studies. First, the results of our meta-analysis provide new information on the effects of each of the factors on gene expression and can be studied further independently or used to bolster support for other studies. Second, the identification of signatures associated with these factors will provide a ‘watch list’ of genes which might be viewed cautiously if they are found to be implicated in neuropsychiatric disease by expression studies. To facilitate the use of these lists in future studies, they are provided at http://www.chibi.ubc.ca/postmortem-brain, with the top ten from each list displayed in Table 6 and significant gene lists (q < 0.001) provided in Appendix A.

There are some limitations to the work presented in this chapter. First, we used a relatively simple meta- analysis method, and acknowledge that there are other techniques which may provide higher sensitivity.

Second, we combined datasets generated using different platforms, which may contribute noise and potentially reduce the power of our meta-analysis. The MAQC project recently initiated a number of studies to specifically address these concerns, and in general, reported a high agreement between platforms [137, 204, 205]. A number of other studies have been conducted to this end, showing agreement between platforms [206-208], and a high concordance between the top functions identified by each platform [102, 209]. While we acknowledge that there still remain small differences between studies, we are only focused on the consistencies and combining them to extract more robust expression changes than can be derived from single dataset studies. To maximize total sample size in our study, we accepted studies of any neocortical brain region. All of our datasets utilized samples from the frontal cortex, but we also included one dataset which included samples from the temporal and parietal cortices. There are groups that have studied regional patterns of gene expression in the postmortem human brain, revealing that cortical regions tend to cluster together indicating a shared global expression profile [210-212]. 55

Finally, we only considered linear model fits to age, pH and PMI. Future studies can address some of these issues, and also include more data as studies become available.

Comparison of our results to the validation sets strongly supported the relevance of the meta-signatures.

The overlap of ‘top genes’ between meta-signatures and their respective validation sets, while statistically significant, identified only a subset of the genes in the validation lists. There are several possible explanations for this effect. One is that most of the studies we used in our analysis treated these factors as nuisances to be eliminated, which may have reduced our power to find real changes. For example, more than half of the datasets have no subjects under the age of 30 at the time of death, and most are over 40. In particular the Kato and Chen data sets, which use samples from the SMRI, have a particularly well-controlled (narrow) age range. In contrast the age validation set used a broader age range (13 to 79 years of age) [186]. Additionally, we expect biological variation among sample groups (and therefore studies). Strong signals in any given data set, including the validation sets, may be specific to that study.

That is, none of the validation sets are truly gold standards. Further examination of genes on the validation lists within each individual dataset supports this notion. The agreement between the meta- analysis and the validation lists is better than the agreement between the validation genes and results from any single dataset, with only a few to none of the validation genes being identified in each dataset.

In summary, the limited overlap of the meta-signatures with the validation sets may simply be contingent on the data we used, and does not call into question the validation sets or the meta-analysis.

Using a jackknife analysis, we obtained ‘core’ signatures for each of the factors. We found a large proportion of the meta-signature genes to overlap with the ‘core’ signatures, illustrating the ability of our meta-analysis to extract gene profiles robust to influences from individual datasets. The exception was brain pH, for which the ‘core’ signatures consisted of 13 up-regulated genes and only one down-regulated gene. This was due to a large pH effect in the Kato dataset. However, examination of the other data sets revealed that many genes showing large effects in the Kato data set also show trends with pH. Thus, even though the pH meta-signatures are arguably biased by strong effects from the Kato dataset, these genes also show weak signals with pH in the other data sets.

56

The inverse relationship we observed between our age and brain pH meta-signatures is in agreement with a previous study [116]. A review of the literature also reveals that results from independent gene expression studies examining changes with age are strikingly similar to results derived from brain pH profiling studies, [8, 124, 186, 201, 213], further supporting this relationship. It has been suggested that the relationship between age and pH is likely a result of slower modes of death experienced by elderly subjects [116], but this has not yet been fully explored. Previous studies however, have found brain pH to be a proxy for agonal stress [214]. Subjects experiencing a longer terminal phase of death results in lower brain pH levels than would be observed in subjects experiencing a sudden death. We were unable to obtain cause of death information for the majority of our datasets, and thus were unable to incorporate this information into the meta-analysis. This raises the question of whether the reasoning behind the inverse relationship is as hypothesized by Harrison et al. [116], or if the process of aging actually results in a general decline of brain pH.

To this point we have focused on evaluating the meta-analysis in light of other data sets, but clearly one of the reasons to do a meta-analysis is to integrate information on weak patterns. Indeed, our meta- analysis has confirmed previous findings and also added to them. By assembling the significant genes

(q<0.01) from each individual dataset, we were able to generate a ‘union’ signature, for each of the factors (Table 5). A comparison of the union signatures against the corresponding meta-signatures revealed that greater than 50% of the genes in each of our meta-signatures are novel (not found in any of the individual studies, and only revealed by the meta-analysis). These novel genes span a broad range of cellular functions, implicating various biological processes with each of the different factors. An example is alterations of the GABA-related transcriptome found with age. We found two GABA receptor genes

(GABBR and GABRG2) and two decarboxylase genes (GAD67 and GAD65) to be down- regulated with age. Animal studies have demonstrated that GABA receptors are markedly decreased with age, and there is evidence to suggest that this may play a role in age-related cognitive changes [215,

216]. Evidence of reduced inhibitory neurotransmission in the human brain with ageing is supported by evidence from a recent study [217] using the Lu et al. [127] dataset, and is also observed in the results of our meta-analysis. Also consistently altered in our age meta-signature are members of the regulator of G- 57

protein signaling (RGS) family of genes. RGS family members are expressed in the brain and periphery.

Their gene products play a critical role in signal transduction by negatively regulating G-protein-coupled receptors (GPCR) by means of their GTPase accelerating activity. These proteins have been implicated in neuronal function and many have been identified as vulnerability factors for several CNS disorders such as addiction, Parkinson’s disease, schizophrenia and mood disorders [218]. In the brain, they function to modulate neurotransmission resulting from the activation of metabotropic GPCRs. RGS4, a member of this family has been previously shown to be down-regulated with age [200]. In our study, we confirm this finding and additionally report four other members of the family, (RGS6, RGS7, RGS12 and

RGS17) that display an age-related decline in expression. Alterations in expression of RGS genes present a possible molecular mechanism that could affect neuronal functioning during aging.

One motivation of this chapter was to identify gene expression changes which need to be accounted for when studying potential expression changes in neuropsychiatric disorders such as schizophrenia. This is important because changes in expression due to the factors we studied can be large, compared to the reported effects of psychiatric disease [185]. Thus, even a mild bias in age might cause a change in expression which is potentially larger than the effect of disease. Therefore a gene which is known to change expression with age (for example) must be analyzed very carefully if it is to be considered a candidate marker for disease, because it is difficult to control perfectly for age. We searched our meta- signatures for a list of schizophrenia associated genes and found that 12 of these genes identified with at least one of our meta-signatures (Table 8). One such example is RGS4, a gene that has been extensively characterized in schizophrenia both as a susceptibility allele and from expression studies [219]. It is also a gene which we find to be down-regulated with age. Our results confirm previous work showing that

RGS4 is down-regulated with age [200]. We also identified the receptor-ligand pair ERBB3 (v-erb-b2 erythroblastic leukemia viral oncogene homolog 3) and NRG1 in our age up- and down-regulated meta- signatures, respectively. NRG1 and its receptor ERBB3 are implicated in key neurodevelopmental processes in the nervous system [220], and have also been implicated in schizophrenia [221]. Evidence of a role for NRG1-ERBB3 in schizophrenia includes reduction in the level of ERBB3 expression in human postmortem PFC samples and genetic association evidence linking NRG1 to schizophrenia [221]. 58

Another notable candidate is GAD67, a gene that is down-regulated across our meta-signatures for age

(consistent with some reports in the literature [222, 223]), and up-regulated with pH. The reduction of

GAD67 expression in schizophrenia is arguably one of the best established changes for this disorder

[185]. Together these findings of expression alterations of genes implicated in schizophrenia with respect to these factors contribute an added complexity to their pre-existing relationships with the disorder.

Looking specifically at findings from microarray studies of schizophrenia [108, 224-226], we find additional overlap with the results from our meta-analysis. Synaptic machinery transcripts such as SYN2 and

SYNJ1, reported as down-regulated in subjects with schizophrenia [98], are also down-regulated with age in our meta-signatures. We see similar patterns between our data and other genes reported to be down- regulated in schizophrenia such as MAPK1, KCNK1, and CRYM. Careful analysis of such genes will allow us to explore the potential of interrelationships between these factors and schizophrenia, and reveal the underlying factors driving the changes in gene expression.

In summary, the results from this chapter show that meta-analysis of postmortem human brain gene expression data is both feasible and informative. We have provided a list of gene expression changes associated with four factors that can potentially have confounding effects in portmortem brain studies.

The identification of disease associated genes amongst our meta-signatures indicates that the transcriptional response of these genes may warrant special consideration when studying disease effects in neuropsychiatric illness.

59

Chapter 3: Genome-wide expression profiling of schizophrenia using a large combined cohort2

3.1 Introduction

Schizophrenia is a severe psychotic disorder that affects approximately one percent of the population worldwide [16]. Many groups have attempted to identify changes in gene expression in the brains of individuals with schizophrenia, often focusing on the prefrontal cortex [185, 225, 227]. Such studies have identified alterations in genes implicating various molecular processes. Some examples include (but are not limited to) synaptic machinery and mitochondrial-related transcripts [98, 224, 228, 229], immune function genes [108] and a reduction in oligodendrocyte and myelination-related genes [106, 230, 231].

The variety and scope of these processes, found in different subject cohorts, raises the question as to whether there are underlying commonalities in molecular signatures in schizophrenia. Such commonalities are presupposed by most genetic studies, which look for alleles overrepresented in large numbers of schizophrenic individuals [232-234]. It is important to establish if there are any common features of the disease at the molecular level.

The diversity of results in transcriptome studies can be attributed to many sources. Besides differences in the sampled cohorts and disease heterogeneity, discrepancies between transcriptome studies can be due to methodological differences in sample preparation, choice of platform, and data analysis. There are also issues that are especially pertinent to the analysis of postmortem human brain tissue. One is the confounding effect of factors such as age, sex and medication. Such factors are often associated with relatively large gene expression changes [235], while psychiatric illnesses such as schizophrenia are associated with small effect sizes. If these factors are not correctly controlled for, they can mask or

2 A version of this chapter has been published. (Mistry M, Gillis, J, and Pavlidis P (2012). Genome-wide expression profiling of schizophrenia using a large combined cohort. Molecular Psychiatry. doi: 10.1038/mp.2011.172)

60

masquerade as expression patterns associated with the disease. Standard practice involves minimizing the effects of such factors either in the experimental design by sample matching or treating these factors as covariates in regression models. It is also increasingly appreciated that technical artifacts such as

‘batch effects’ can result in substantial variability [13, 236-238]. In addition, postmortem brain tissue is a limited resource, leading to small sample sizes with low statistical power. For this reason, most studies have not applied multiple test correction, and perform validation only on the same RNA samples that were used for profiling. All of these issues are likely to contribute to the differences in findings across studies. I hypothesize that a good way to address these problems is to re-analyze and meta-analyze the studies in question, a task I undertake in this chapter.

The use of meta-analyses to combine high-throughput genomics studies has become increasingly used in neuropsychiatry [7, 13, 233, 239, 240]. Combining datasets across studies increases power and facilitates the identification of gene expression changes that are consistent and reliable, reducing false positives. In a meta-analysis, multiple studies are statistically pooled to provide an overall estimate of significance of an effect, highlighting important yet subtle variations. While meta-analysis has been used in the study of gene expression data [171, 192, 241], to our knowledge only a few studies have done so with postmortem human brain data [7-9, 13]. A cross-study analysis of psychosis was conducted across seven datasets using samples from the SMRI postmortem brain collections [7], in which subjects were divided into groups based on the presence or absence of psychotic features. As such, the control group consisted of patients with bipolar (without psychotic features), and depression in addition to normal healthy controls. Additionally, the SMRI report results from a cross-study analysis across schizophrenia datasets in their online genomics database (http://stanleygenomics.org ), computing ‘consensus’ fold changes while adjusting for confounding variables. However, the studies used in these analyses use samples from the same two brain collections and are therefore not entirely independent. More recently, a comparative analysis was conducted across two independent schizophrenia cohorts; probes were identified as differentially expressed within each study and the intersecting probes between the two studies were reported [102]. Thus, while there have been attempts to meta-analyze schizophrenia

61

expression profiling data, there has not yet been an integration using the primary data of more than two independent microarray studies.

In this chapter I present a cross-study analysis of seven microarray datasets comprising a total of 153 schizophrenia samples and 153 normal controls. We applied a linear modeling approach to control for factors such as age, brain pH and batch effects, and applied multiple testing corrections to control the false discovery rate. We show that we are able to detect small yet consistent and statistically significant changes. Careful control of extraneous factors using probe-specific statistical modeling, results in gene expression changes associated with the disease effect. Our results from this chapter confirm some previously reported expression changes in schizophrenia in addition to identifying potential new targets which suggest alterations in synaptic function.

3.2 Methods

3.2.1 Data Collection

Genome-wide expression data sets were selected on the basis of microarray platform, use of prefrontal cortex (BA 9, 10 or 46), the availability of information on covariates such as age, and finally the availability of the raw data. Each dataset is comprised of a cohort of neuropathologically normal subjects and a cohort of schizophrenia subjects, as diagnosed and reported in their respective studies (Table 11).

Sources for data include the SMRI, the Harvard Brain Bank, and the Gene Expression Omnibus (GEO).

GEO studies were identified by extensive manual and keyword searches. While the SMRI has additional data sets, these represent repeated runs of the samples from the same subjects, so we selected one dataset to represent each of the two SMRI brain collections. Two additional studies were obtained from the authors [107, 242]. Sample characteristics for the subjects were collected and are summarized in

Table 12. Batch information was obtained using the ‘scan date’ stored in the CEL files; chips run on different days were considered different batches. Datasets consisted of single-channel intensity data generated from two Affymetrix platforms, but only probe sets on the HG-U133A chip from each dataset were used for analysis. Probe sets were re-annotated at the sequence level by alignment to the hg19 62

genome assembly, using methods essentially as previously described [238], and also cross-referenced with problematic probe lists provided by http://masker.nci.nih.gov/ev/. The final data matrix consisted of expression values for 22,215 probes sets and 306 samples.

3.2.2 Data Pre-processing

The raw data (“CEL”) files from all the datasets were pooled together and expression levels were summarized, log transformed and normalized by using the R Bioconductor [243] ‘affy’ package using default settings for the RMA algorithm. Data was also processed using four other pre-processing methods for evaluating the robustness of our meta-signatures. We decided to retain standard RMA as the method on which to centre the analysis, because RMA has been shown independently to be a high performer on gold standard data sets [139, 244, 245]. The four methods are as follows: 1) RMA with quantile normalization applied after summarization, 2) RMA using MAS5-style mean adjustment rather than quantile normalization, 3) RMA using MAS5-style mean adjustment rather than quantile normalization

(applied after summarization as is typically done in MAS5) and 4) MAS5. We compared the results from each of these methods to our original meta-signatures using Pearson correlations and compared the significant probes (q < 0.1) by computing overlaps.

3.2.3 Data Quality Control

Sample outliers were then identified and removed from each dataset based on inter-sample correlation analysis, resulting in the removal of 13 samples (2 of these are the same outliers identified in a previous analysis of SMRI data; http://stanleygenomics.org). Briefly, a sample-by-sample correlation matrix was generated for each dataset by reducing each sample into a vector of probe expression values and taking all pair-wise Pearson correlations. Outlier samples were identified as those showing correlations less than

0.8 with all the other samples, and were removed from the dataset and not included in subsequent analyses.

63

3.2.4 Statistical Modeling

Gene expression values for each probe set were modeled using a standard FEM framework. We also employed a model selection procedure, in which each probe set was modeled using the full model including all five factors, as well as various sub-models (an approach similar to that used previously

[245]). For the full model, we treated Disease, Age, Brain pH, Batch date and Study as fixed effects for which unknown constants are to be estimated from the data. We generated four sub-models by inclusion/exclusion of selected parameters (Table 13). Each sub-model fit was compared to the full model fit using an ANOVA, whereby an F-statistic was computed to assess whether the loss of a parameter resulted in a substantial loss of explanatory power. Model comparison was also repeated using the

Akaike Information Criterion (AIC), a measure of the relative goodness of fit between the two models. AIC is computed using the likelihood function for the estimated model and incurs penalty for each parameter included. Results using the AIC method were highly correlated with those obtained using the ANOVA measure (0.99 Pearson correlation). For each probe set, the t-statistic for the disease effect was then extracted from the best model fit and p-values were computed using one-sided tests, preformed independently for the two alternative null hypotheses (i.e. gene expression does not increase with schizophrenia and gene expression does not decrease with schizophrenia). The resulting p-values for the up- and down-regulated signatures were further adjusted for multiple testing using the q-value method

[143] to control the FDR. Alternatively, we also explored mixed-effect models (MEM), treating either Study and/or Batch as a random effect. For each probe the goodness of fit was compared across the different models by using the AIC. For the majority of the probes the FEM fit resulted in the lowest AIC value, indicating best fit of the data to the FEM model.

3.2.5 Literature-derived Signatures

Our signatures were compared to probe lists obtained from the original publication for each of the datasets used in our analysis. As the two SMRI datasets were unpublished, gene lists were compiled from the SMRI online genomics database. For the Mclean dataset we used the list of ‘significant probes’ as reported in [102]. For the Haroutunian data set we chose to use probes selected at the ‘low stringency

64

criteria’ described in [107]. Details on each of these gene sets can be found in Table 15 (probes were excluded if they were not on the HG-U133A chip). Additional signatures for comparison were obtained for published schizophrenia expression profiling studies, and a list of the top 45 candidate schizophrenia genes reported in the SZGene database [232]. Agreement of the meta-signature ranking with each validation gene set was assessed using ROC analysis. A meta-signature with an AUC score closer to 1.0 indicates many genes in the validation set are near the top of the ranked list. On the other hand, a score closer to 0.5 reflects that the validation gene set is randomly distributed across the ranking.

3.2.6 Enrichment Analysis

To characterize each meta-signature we looked for enrichment of GO terms [197] using the gene score re-sampling (GSR) method in ErmineJ [198, 246], and we evaluated CNS cell type enrichment by cross- referencing the genes in each cluster with published lists of neuron, oligodendrocyte and astrocyte marker genes [247]. We also evaluated each meta-signature against modules of coexpressed genes in the human brain as reported in [12]. We obtained the module membership data pertaining to the cortex dataset (CTX) consisting of 67 samples representing four cortical areas, and analyzed using the U133A array. The CTX gene coexpression network identifies a total of 19 modules to which we compared the probes from each of our meta-signatures by computing overlaps. Significance of overlap was corrected for multiple testing by use of the Benjamini-Hochberg method [142].

3.2.7 Network Analysis

We evaluated the path-length and node degree (number of associations) properties of the meta-signature genes in a large human PPI network obtained by aggregating data from multiple sources [248-253]. The network contains 100,623 unique interactions among 11,697 genes. Path lengths in the network were measured using Dijkstra’s algorithm [152]. Statistical significance was assessed by reference to an empirical null distribution obtained by randomly sampled 10000 gene sets of similar size and node degree.

65

3.3 Results

For each of our samples we obtained information pertaining to age, sex, brain pH, and PMI. These factors were assessed for significant differences between the control and schizophrenia cohorts to help determine the selection of factors used as fixed effects for our model. We observed no significant differences in age and PMI, and the number of males and females between the groups were well matched (Table 12). Brain pH, however, was significantly different between the two cohorts (t-test; p =

0.001). P-value distributions for each demographic variable indicated considerable differential expression for age and pH and PMI, but a fairly uniform distribution was observed for sex. We also found it was necessary to correct for “batch effects” (technical artifacts caused by running chips on different days or even years [238]), as they contributed the vast majority of variance in gene expression. Based on these observations we chose to include only age, pH and batch (in addition to disease) as fixed effects in our model.

Each probe was considered in a model selection procedure, to identify probe sets that were differentially expressed between schizophrenia and control samples. After multiple test correction we identified a meta-signature of 39 up-regulated and 86 down-regulated probes at an FDR of 0.1 (Table 14). If we assess the number of unique genes that appear in each signature we obtain a list of 25 up-regulated and

73 down-regulated genes. These numbers highlight several cases of a gene which appears in our signature more than once, suggesting higher confidence in the finding of expression changes for those genes. Figure 7 shows the expression levels of the top down-regulated probe we identified (mapping to the gene NECAB3). As expected, expression changes were small (~ 15% expression change), and more evident in some datasets. As required by our modeling procedure, the direction of expression changes is mostly consistent.

While our linear modeling approach controlled for the effects of age and brain pH, we checked our signatures against gene lists for pH and age from our study of normal postmortem human brain in

Chapter 2 [235]. The overlap was significant only for our down-regulated signature, which contains 32

66

genes previously identified to be down-regulated by age. Because our profiles are age-corrected and our cohorts age-matched, this suggests overlap in expression changes in age and schizophrenia rather than a confounding effect. We also cross-referenced our schizophrenia signatures with gene lists for sex and

PMI (from Chapter 2); the two factors excluded from our model selection approach. We observed a total of three overlapping genes, suggesting the effects from these factors are likely subtle and do not dominate our results. We also sought to address other factors that we were unable to account for in our approach, such as medication effects and alcohol and drug abuse. Using gene lists provided from the

SMRI Online Genomics Database (http://www.stanleygenomics.org ), we extracted significant gene lists

(p < 0.001; FC>1.2) pertaining to the effects of lifetime alcohol use (23 genes), lifetime drug use (26 genes), and lifetime antipsychotics (69 genes) in subjects with schizophrenia. A comparison of each of these lists to our meta-signatures identified only two overlapping genes. We found KCNK1, which is present in our down-regulated signature, also increases with lifetime alcohol use. From the up-regulated signature the gene LPL, appears to increase with lifetime antipsychotic use and decrease with increased drug use.

To test the robustness of these findings, we used a jackknife procedure, sequentially removing one of the seven studies and performing the meta-analysis on the remaining six, for each study in turn. We expected that results highly influenced by a single data set would not be stable across jackknife runs. Each leave- out iteration resulted in a new meta-signature, which was then ranked by q-value and compared against the final meta-signature. The range of rank correlations among jackknife iterations (0.87 - 0.99) illustrates the robustness of our meta-signatures, demonstrating that our results are not highly biased by any single dataset. The lowest correlations were observed upon removal of the Bahn and GSE21138 datasets (0.88 and 0.87, respectively) suggesting that these datasets may be contributing a slightly stronger signal, particularly to the up-regulated signature. The lack of significant genes at a q < 0.1 in the signature for those jackknife runs corroborates this finding. Finally, the top 100 probes were taken from each jackknife signature and an intersection set was retained to form a ‘core signature’ of 16 down-regulated and 14 up- regulated probes (highlighted in Table 14). We consider these probes to be the most reliable findings from our study as they are relatively insensitive to the choice of data sets used. In Figure 8, we have 67

assembled the ‘core signatures’ and plotted expression levels within each dataset with samples separated into control and schizophrenia groups. For some studies we observe a more obvious gradient between the two groups illustrating expression change, and for others the difference is more subtle.

To assess the sensitivity of our results to the choice of pre-processing algorithm we re-analyzed our data with four different methods (see Methods). We obtained good agreement between the results of each method and our final meta-signatures despite dramatic changes to the preprocessing procedure.

Additionally, we took the intersection of significant probes from each of the different methods to assemble a list of probes that are completely insensitive to the choice of pre-processing method. This list comprises a total of 5 up-regulated and 8 down-regulated probes, highlighting novel genes and genes that have been previously implicated in independent studies (marked in Table 14).

The set of 98 differentially expressed genes identified from our analysis implicates a variety of genes and functional groups, many of which have been previously reported in the literature. For example, down- regulation of mu-crystallin (CRYM), potassium channel subfamily K member 1 (KCNK1), and F-box protein 9 (FBXO9) and up-regulation of lipoprotein lipase (LPL) and lysyl hydroxylase 2 (PLOD2) are concordant with findings from previous studies [106, 108, 229, 254]. We manually evaluated the list of differentially expressed genes individually according to literature reports and Uniprot definitions to characterize genes into high-level functional categories. In the down-regulated signature we found genes to cluster into functional groups pertaining to various molecular mechanisms of neuronal communication.

On the pre-synaptic side we found genes involved in cell adhesion (for example, OPCML), and neurotransmitter secretion (for example, APBA2, PCSK2). We also observed genes involved in signalling pathways that elicit metabotropic effects (for example, GNAL, OPN3, CRHR, RGS7, GNB5). Concordant with previous studies, we also identified various genes involved in oxidative phosphorylation (for example,

CYP26B1, COQ4, SLC25A15, ATP5C1, SLC25A12) and ubiquitination (for example, FBXO9, COPS7B,

USP19, TACC2, DCAF8). From our up-regulated signature we found a number of transcription-related genes (for example, BAZ1A, CBFA2T2, BBX, ANP32A) and genes involved in translation (for example,

EIF3E, EIF2C3, PAIP2B). Other genes include cell organization/maintenance factors (for example, PKP4,

68

PLOD2) and various stress response genes (for example, SMG1). Additionally for both signatures we found a small group of genes with unknown function.

We performed a functional analysis to systematically detect enrichment of biological processes, using GO annotations. After multiple test correction, we were unable to identify any significant terms using the ORA method, but significant terms were found using the threshold-free GSR algorithm [198]. For the 73 genes with decreasing expression levels in schizophrenia, the top GO categories included those involved in energy metabolism, and ubiquitination, neurotransmitter transport and various metabolic processes. The

25 schizophrenia up-regulated genes showed enrichment in various immune-related GO categories in addition to terms related to cellular localization. While some of these categories corroborate with findings from the above manual evaluation, there are some that do not (i.e. immune response). It should be noted that the GSR algorithm provides a functional representation of top ranking genes, but not necessarily the significant ones (q < 0.1) that we discussed previously.

Because the genes we identified were functionally diverse, we hypothesized there might be additional insight gained at the level of gene networks. In particular we asked whether the signature genes had any unusual properties in their protein interaction patterns, compared to carefully selected groups of background genes (see Methods). Taking all 98 genes together, we specifically looked at within-group connectivity, node degree (the number of connections) and path lengths between genes. Our most striking finding is that the genes within our set were significantly closer to one another in the network than expected by chance (p<0.02). This relationship suggests a higher likelihood of functional relationships among the signature genes [160, 249]. In contrast, the signature genes did not possess a particularly high node degree within the network (23rd percentile in the whole network), that is, they tend not to be ‘hubs’.

We also evaluated each meta-signature against modules of coexpressed genes in the human cortex as reported in [12]. Our up- and down-regulated signatures significantly overlap with the “turquoise” and

“brown” modules (p < 0.01 and p < 0.05 respectively; Table 16). These are modules of interest as they display a notable extent of preservation across datasets in [12], suggesting that differential expression of

69

our signature genes may be disrupting core networks in the human brain. This also reinforces the importance of gene network structure analysis in determining the basis of this disorder.

To characterize our schizophrenia signatures with respect to cellular organization in the cortex we cross- referenced our ranked meta-signatures with published lists of CNS cell type markers [247]. An ROC analysis of the meta-signatures for astrocytes, oligodendrocytes and neurons revealed no preferential association with our ranked meta-signatures. However, evaluating only the significant probes (q<0.1) in our signatures, we find an enrichment of probes mapping to neuronal markers in the down-regulated signature.

Each meta-signature was evaluated against the top 45 candidate schizophrenia genes reported in the

SZGene database (http://www.szgene.org/). Agreement of the meta-signature ranking with the SZGene set was assessed using receiver operating characteristic (ROC) curve analysis. The SZGene list appeared to be randomly distributed across our ranking. We also computed a simple overlap between the

45 candidate genes and our results, identifying OPCML as the only common gene.

We were interested in comparing our re-analysis of these seven data sets to the “hit lists” provided by the data set providers. We first tested whether our meta-signature gene rankings were enriched for genes reported by the original study, using ROC analysis (Table 15). We observed high AUC scores for most gene sets; however the Haroutunian and GSE21138 studies exhibited exceptionally low scores, possibly in part because the original studies have an added dimension of variability as gene sets were generated for stratified cohorts as opposed to a case versus control comparison. While high AUCs suggest some similarity in the results, a more sensitive analysis examines just the very top of the rankings. We therefore computed the overlap of each reference gene set with the meta-signature of genes collected at q<0.1.

This reveals a handful of probes in each study that also show up in our significant gene lists (Table 15).

We also re-analyzed each individual dataset using our linear modeling approach. This allowed a more fair evaluation of the contribution of each to the final meta-signatures, since the original studies used a variety of methods for gene selection. After correcting for multiple testing, only two of the data sets (Altar and

70

Haroutunian) yielded significant genes at q < 0.1. We therefore considered the top 100 probes from each dataset, and computed overlaps with our meta-signatures. The overlap is highest with the Bahn and

GSE21138 datasets, which is in accord with the finding that these datasets contribute a stronger signal to the meta-signature than the others. Despite being the only two data sets which have significant differential expression after multiple test correction, the Altar and Haroutunian results showed very little overlap with the final meta-signature. We note that considering the seven data sets independent of our meta-signature, there was no overlap among their top 100 probes. Similarly, there was little correlation of the overall rankings of probes among the data sets (< 0.3 correlation, with most values closer to zero).

Overall these results suggest that our re-analysis is concordant with the analysis conducted by the original study authors, subject to important differences likely attributable to our analytic approach (for example, correction for batch effects), and only revealing commonalities through meta-analysis which contribute weakly to the findings of the individual studies.

71

Table 11: Schizophrenia datasets Dataset Reference Microarray Platform Brain region(s) No. of Subjects

CTL:SZ Stanley Bahn SMRI database HG-U133A Frontal BA46 31 : 34 Stanley AltarC SMRI database HG-U133A Frontal BA46/10 11 : 9 Mclean HBTRC HG-U133A Prefrontal cortex 26 : 19 (BA9) Mirnics Garbett K. et al, 2008 [242] HG-U133A/B Prefrontal cortex 6 : 9 (BA46) Haroutunian Katsel P. et al, 2005 [107] HG-U133A/B Frontal 29 : 31 (BA10/46) GSE17612 Maycox P. et al, 2009 [102] HG-U133 Plus 2.0 Anterior 21: 26 prefrontal cortex (BA10) GSE21138 Narayan S. et al, 2008 HG-U133 Plus 2.0 Frontal (BA46) 29 : 25 [255] SMRI, Stanley Medical Research Institute; HBTRC, Harvard Brain Tissue Resource Centre (Mclean66 collection

Table 12: Summary of demographic variables across combined cohort Control Schizophrenia P-value Number of Subjects 153 153 Age 56.25 ± 20 55.27 ± 19 p = 0.67 Sex 101M : 52F 113M : 40F p = 0.1 Brain pH 6.5 ± 0.28 6.39 ± 0.29 p = 0.001 PMI 21.95 ± 15.3 22.65 ± 15.2 p = 0.69

F, female; M, male; PMI, postmortem interval. There were 319 samples collected across seven datasets of which 306 passed quality control analysis. The summary demographics (mean ± standard deviation) and t- test p-values for group differences are shown for those subjects used in the analysis. For sex we report the p-value generated from a chi-squared test for equality of proportions.

72

Table 13: Probe model selection across schizophrenia signatures Model Description Up-regulated Down-regulated Signature (q < 0.1) Signature (q < 0.1) Full Model Disease + Age + pH + Batch + Study 5 15

Model 2 Disease + Age + Batch + Study 22 42

Model 3 Disease + pH + Batch + Study 7 20

Model 4 Disease + I(Age + pH) + Batch + Study 5 6

Model 5 Disease + Batch + Study 0 3

I(Age + pH), models the case in which the effect of age and pH on expression is the same The full model and each of the sub-models used in model selection are described above. Disease, Batch and Study are factors retained in each sub-model; the inclusion/exclusion of Age and pH being the distinguishing factors. The numbers reported indicate the number of probes in the final meta-signature (q < 0.1) that were best fit to each model.

73

Table 14: Schizophrenia meta-signatures A: Up-regulated in schizophrenia

Fold Overlapping Probe Probe Gene Symbol Gene Description Model Change Q-value Factor Specificity 203548_s_at LPL lipoprotein lipase 2 1.11 9.58E-03 age Insensitive SMG1 homolog, phosphatidylinositol 210057_at SMG1 3-kinase-related kinase 4 1.04 9.58E-03 Non-specific 209069_s_at Multiple gene mappings 2 1.11 2.05E-02 Non-specific Rho-related BTB domain containing 216048_s_at RHOBTB3 3 2 1.11 2.05E-02 age, pH Insensitive 213187_x_at FTL ferritin, light polypeptide 2 1.11 2.18E-02 procollagen-lysine, 2-oxoglutarate 5- 202619_s_at PLOD2 dioxygenase 2 fullModel 1.12 5.07E-02 age, pH Rho-related BTB domain containing 202975_s_at RHOBTB3 3 fullModel 1.16 6.31E-02 age, pH Insensitive 204060_s_at Multiple gene mappings 2 1.11 7.16E-02 Non-specific 209747_at Multiple gene mappings 2 1.11 7.16E-02 212788_x_at FTL ferritin, light polypeptide fullModel 1.09 7.16E-02 213501_at ACOX1 acyl-CoA oxidase 1, palmitoyl 2 1.06 7.16E-02 216762_at Unknown 2 1.05 7.16E-02 Mis-targeted 218345_at TMEM176A transmembrane protein 176A fullModel 1.10 7.16E-02 219156_at SYNJ2BP synaptojanin 2 binding protein 4 1.03 7.16E-02 karyopherin alpha 3 (importin alpha 221503_s_at KPNA3 4) 3 1.02 7.16E-02 59625_at NOL3 nucleolar protein 3 2 1.08 7.16E-02 202506_at SSFA2 sperm specific antigen 2 2 1.11 7.20E-02 203549_s_at LPL lipoprotein lipase 2 1.14 7.46E-02 age prolyl 4-hydroxylase, alpha 207543_s_at P4HA1 polypeptide I 3 1.12 7.46E-02 myeloid translocation gene-related 209144_s_at CBFA2T2 protein 1 3 1.04 7.46E-02 eukaryotic translation initiation factor 219426_at EIF2C3 2C, 3 2 1.05 7.46E-02 74

Fold Overlapping Probe Probe Gene Symbol Gene Description Model Change Q-value Factor Specificity poly (A) binding protein interacting 221868_at PAIP2B protein 2B 2 1.21 7.46E-02 WNK lysine deficient protein kinase 211992_at WNK1 1 2 1.07 8.22E-02 age 213015_at BBX bobby sox homolog (Drosophila) 2 1.05 8.22E-02 Insensitive 220522_at CRB1 crumbs homolog 1 (Drosophila) 2 1.09 8.22E-02 211997_x_at Multiple gene mappings 2 1.07 8.51E-02 Non-specific 200063_s_at Multiple gene mappings 4 1.05 8.60E-02 Non-specific eukaryotic translation initiation factor 208697_s_at EIF3E 3, subunit E 4 1.05 8.60E-02 213016_at BBX bobby sox homolog (Drosophila) 2 1.11 8.60E-02 Acidic (leucine-rich) nuclear phosphoprotein 32 fammily, member 201051_at ANP32A A 3 1.06 8.81E-02 breast cancer anti-estrogen 204032_at BCAR3 resistance 3 3 1.06 8.81E-02 age WNK lysine deficient protein kinase 211994_at WNK1 1 2 1.12 8.81E-02 age 216520_s_at Multiple gene mappings 4 1.05 8.81E-02 Non-specific ATP-binding cassette, sub-family A, 203504_s_at ABCA1 member 1 fullModel 1.12 8.98E-02 Insensitive ATP-dependent chromatin 217985_s_at BAZ1A remodeling protein 2 1.05 9.43E-02 218826_at SLC35F2 solute carrier family 35, member F2 2 1.05 9.43E-02 201927_s_at PKP4 plakophilin 4 3 1.11 9.46E-02 201929_s_at PKP4 plakophilin 4 3 1.16 9.46E-02 Non-specific 220532_s_at TMEM176B transmembrane protein 176 2 1.13 9.46E-02

75

B: Down-regulated in schizophrenia

Probe Gene Symbol Gene Description Model Fold Q-value Overlapping Probe Change Factor Specificity N-terminal EF-hand calcium binding 210720_s_at NECAB3 protein 2 0.92 6.17E-03 Non-specific 212646_at RFTN1 raftlin, lipid raft linker 1 2 0.91 6.17E-03 age 213924_at Unknown 2 0.89 6.17E-03 Insensitive guanine nucleotide binding protein 206355_at GNAL G(olf) subunit alpha 2 0.87 7.42E-03 Insensitive 220807_at HBQ1 hemoglobin, theta 1 2 0.91 7.42E-03 age 220741_s_at PPA2 pyrophosphatase 2 3 0.92 9.55E-03 205694_at TYRP1 -related protein 1 2 0.88 1.20E-02 Insensitive 219032_x_at OPN3 opsin3 2 0.89 2.71E-02 Insensitive 205510_s_at Multiple gene mappings 2 0.93 2.74E-02 212987_at FBXO9 F-box protein 9 fullModel 0.88 3.28E-02 age Insensitive 202596_at ENSA endosulfine alpha fullModel 0.90 4.20E-02 age Insensitive 203719_at ERCC1 DNA excision repair protein 2 0.94 4.20E-02 218328_at COQ4 coenzyme Q4 homolog 4 0.93 4.20E-02 solute carrier family 25 (ornithine 218653_at SLC25A15 transporter) member 15 2 0.94 4.20E-02 206290_s_at RGS7 regulator of G-protein signaling 7 2 0.89 4.36E-02 age Non-specific required for meiotic nuclear division 5 218262_at RMND5B homolog B 2 0.95 4.36E-02 cytochrome P450, family 26, 219825_at CYP26B1 subfamily B fullModel 0.86 4.36E-02 age guanine nucleotide binding protein 206356_s_at GNAL G(olf) subunit alpha 2 0.90 4.81E-02 219982_s_at Multiple gene mappings 3 0.89 4.81E-02 insulin-like growth factor binding 203851_at IGFBP6 protein 6 2 0.91 5.00E-02 203349_s_at ETV5 ets variant 5 2 0.92 6.76E-02 age, sex Insensitive metallophosphoesterase domain 205413_at MPPED2 containing 2 fullModel 0.89 6.76E-02 age

76

Probe Gene Symbol Gene Description Model Fold Q-value Overlapping Probe Change Factor Specificity COP9 constitutive photomorphogenic 219997_s_at COPS7B homolog subunit 7B 4 0.97 7.28E-02 206209_s_at CA4 carbonic anhydrase IV fullModel 0.92 7.41E-02 age opioid binding protein/cell adhesion 206215_at OPCML molecule like 2 0.94 7.41E-02 age 207949_s_at ICA1 islet cell autoantigen 1 2 0.95 7.41E-02 age amyloid beta (A4) precursor protein 209871_s_at APBA2 binding, family A, member 2 5 0.95 7.41E-02 DiGeorge syndrome critical region 215003_at DGCR5 gene 5 (non-protein coding) fullModel 0.93 7.41E-02 phosphatidylinositol transfer protein 201190_s_at PITPNA alpha fullModel 0.93 7.54E-02 age chromosome 5 open reading frame 201310_s_at C5orf13 13 2 0.93 7.54E-02 age 201694_s_at EGR1 early growth response 1 2 0.89 7.54E-02 tumor necrosis factor (ligand) 202688_at TNFSF10 superfamily, member 10 3 0.83 7.54E-02 guanine nucleotide binding protein, 204000_at GNB5 beta 5 4 0.94 7.54E-02 age, pH 205489_at CRYM crystallin mu fullModel 0.87 7.54E-02 age family with sequence similarity 134 , 221983_at FAM134A member A 3 0.93 7.54E-02 geranylgeranyl diphosphate synthase 202322_s_at GGPS1 1 5 0.95 8.55E-02 Non-specific 203769_s_at STS steroid sulfatase, isozyme S 3 0.91 8.55E-02 205003_at DOCK4 dedicator of cytokinesis 4 3 0.95 8.55E-02 ATP synthase subunit gamma, 208870_x_at ATP5C1 mitochondrial fullModel 0.93 8.55E-02 212942_s_at KIAA1199 2 0.92 8.55E-02 calcium channel voltage-dependent 34726_at CACNB3 subunit beta 3 2 0.93 8.55E-02 age v-ets erythroblastosis virus E26 201328_at ETS2 oncogene homolog 2 3 0.90 8.60E-02 sex 203339_at SLC25A12 solute carrier family 25, member 12 4 0.87 8.66E-02 Non-specific

77

Probe Gene Symbol Gene Description Model Fold Q-value Overlapping Probe Change Factor Specificity potassium channel, subfamily K 204679_at KCNK1 member 1 2 0.88 8.66E-02 age calcium/calmodulin-dependent 212252_at CAMKK2 protein kinase beta 2 0.94 8.66E-02 age Insensitive ATP synthase subunit gamma, 213366_x_at ATP5C1 mitochondrial fullModel 0.93 8.66E-02 phenylalanyl-tRNA synthetase, alpha 202159_at FARSA subunit 2 0.95 8.76E-02 age Beta-1,3-N- 203188_at B3GNT1 acteylglucosaminyltransferase 1 2 0.95 8.76E-02 age 204002_s_at ICA1 islet cell autoantigen 1 2 0.96 8.76E-02 age 204869_at PCSK2 neuroendocrine convertase 2 2 0.89 8.76E-02 age 205794_s_at NOVA1 neuro-oncological ventral antigen 1 3 0.94 8.76E-02 age 209093_s_at GBA glucosidase beta acid 2 0.95 8.76E-02 209699_x_at Multiple gene mappings 2 0.93 8.76E-02 Non-specific 210638_s_at FBXO9 F-box protein 9 fullModel 0.88 8.76E-02 age 218125_s_at CCDC25 coiled-coil domain containing 25 2 0.92 8.76E-02 220031_at OTUD7B OTU domain containing 7B 3 0.96 8.76E-02 alpha- and gamma-adaptin binding 202852_s_at FLJ11506 protein 3 0.91 8.79E-02 age, PMI 206490_at DLGAP1 PSD-95 binding protein 3 0.94 8.79E-02 age 205114_s_at Multiple gene mappings 2 0.88 8.97E-02 Non-specific 203476_at TPBG trophoblast glycoprotein 2 0.89 9.61E-02 age 204676_at TMEM186 transmembrane protein 186 3 0.94 9.61E-02 205381_at LRRC17 Leucine rich repeat containing 17 3 0.91 9.61E-02 210874_s_at HYAL3 hyaluronoglucosaminidase 3 2 0.94 9.61E-02 211038_s_at Multiple gene mappings 3 0.95 9.61E-02 Non-specific 214285_at FABP3 fatty acid binding protein 3 fullModel 0.90 9.61E-02 age 217946_s_at SAE1 SUMO1 activating enzyme subunit 1 fullModel 0.95 9.61E-02 age Non-specific 221921_s_at CADM3 cell adhesion molecule 3 2 0.97 9.61E-02 218569_s_at KBTBD4 kelch repeat and BTB domain 4 2 0.94 9.69E-02

78

Probe Gene Symbol Gene Description Model Fold Q-value Overlapping Probe Change Factor Specificity 201410_at PLEKHB2 pleckstrin homology domain 3 0.94 9.82E-02 age 202250_s_at DCAF8 DDB1 and CUL4 associated factor 8 3 0.94 9.82E-02

transforming acidic colied-coil 202289_s_at TACC2 containing protein 2 2 0.94 9.82E-02 202667_s_at SLC39A7 solute carrier family 39, member 7 3 0.95 9.82E-02 plasma membrane calcium- 204685_s_at ATP2B2 transporting ATPase 2 2 0.94 9.82E-02 age 206023_at NMU neuromedin U fullModel 0.90 9.82E-02 age 211796_s_at Multiple gene mappings fullModel 0.92 9.82E-02 Non-specific proteasome 26S subunit, non- 212296_at PSMD14 ATPase regulatory subunit 14 3 0.95 9.82E-02 putative splicing factor, 214092_x_at SFRS14 /serine-rich 14 3 0.97 9.82E-02 corticotropin releasing hormone 214619_at CRHR1 receptor 1 4 0.95 9.82E-02 214883_at THRA thyroid hormone receptor, alpha 4 0.96 9.82E-02 216970_at Unknown 2 0.96 9.82E-02 Mis-targeted 64488_at Multiple gene mappings 2 0.95 9.82E-02 Non-specific 218032_at SNN stannin 2 0.96 9.84E-02 age 205061_s_at EXOSC9 exosome component 9 3 0.96 9.97E-02 Non-specific 206435_at B4GALNT1 glycolipid synthesis 2 0.96 9.97E-02 age 214674_at USP19 ubiquitin specific peptidase 19 5 0.96 9.97E-02 218968_s_at ZFP64 zinc finger protein 64 homolog 2 0.94 9.97E-02

Each probe is listed with its associated gene symbol, gene description, linear model of best fit, and fold change. Score from meta-analysis is provided by q-value, an FDR adjusted p-value, see [143]. The overlapping factor column identifies genes that also appear in our previously reported age, pH, PMI and sex gene lists. Finally, we have also included a column for probe specificity. Probes identified as ‘mis-targeted’ or ‘non-specific’ were found to overlap when cross-referenced with lists extracted from http://masker.nci.nih.gov/ev/. Probes identified as ‘insensitive’ are robust expression changes that are completely insensitive to the choice of pre-processing algorithm. Rows highlighted in green indicate ‘core’ signature genes retained after jackknife validation.

79

Table 15: Comparison of meta-signatures with findings from original studies

Significance Criteria Down-regulated Up-regulated Probes AUC Overlap Probes AUC Overlap Dataset Stanley AltarC p < 0.05 848 0.70 14 34 0.78 0

Stanley Bahn p < 0.05 69 0.85 6 91 0.89 5

Mclean [102] p < 0.05, intensity > 30 570 0.75 13 300 0.76 7 Mirnics [242] p < 0.05, |ALR| > 0.58 7 0.55 0 4 0.94 1 Haroutunian p < 0.05, |FC| > 1.4, 5 0.5 0 14 0.66 1 (BA10) [107] present calls > 60% Haroutunian p < 0.05, |FC| > 1.4, 50 0.21 0 11 0.59 0 (BA46) [107] present calls > 60% GSE17612 [102] p < 0.05, intensity > 30 548 0.74 22 466 0.71 7 GSE21138 [255] p < 0.05; |FC| > 1.25 482 0.50 1 173 0.57 1

(Short DOI) GSE21138 [255] p < 0.05; |FC| >1.25 132 0.60 0 78 0.69 0

(Int DOI) GSE21138 [255] p < 0.05; |FC| > 1.25 37 0.63 1 89 0.63 1

(Long DOI) DOI, duration of illness; AUC, area under the curve; FC, fold change. The findings from each dataset are summarized including only probes used in our analysis. The ‘Probes’ column indicates number of probes found from the study based on study specific significance criteria. AUC values were computed from an ROC analysis of each gene set against the corresponding ranked meta-signatures. Overlap values report the number of probes in each gene set that overlaps with probes from the meta-signatures at q < 0.1.

80

Table 16: Evaluating meta-signatures against brain-specific gene coexpression modules

A: Genes down-regulated in schizophrenia Module Module Probe Corrected Genes Size Overlap p-value (BH) PITPNA,ENSA,DOCK4,CRYM, *brown 868 9 0.019 OPCML,FBXO9,FABP3 C5orf13,IGFBP6,GNB5,MPPED2 turquoise 1115 9 0.06 RGS7,CAMKK2,RFTN1,CACNB3 grey 158 1 0.26 WDR42A pink 98 1 0.15 FLJ11506 black 188 1 0.28 B3GNT1 *darkolivegreen 28 2 2.27E-03 SLC25A12,TPBG yellow 837 2 0.72 DLGAP1,APBA2 green 502 2 0.45 USP19,CADM3

B: Genes up-regulated in schizophrenia Module Module Probe Corrected Genes Size Overlap p-value (BH) *turquoise 1115 8 4.47E-04 ANP32A,PKP4,SSFA2, WNK1,BBX *black 188 4 2.06E-04 PLOD2,SYNJ2BP,EIF2C3 RHOBTB3,ABCA1,BCAR3, FTL, *brown 868 7 4.5E-04 ACOX1,TMEM176B grey 158 1 0.053 EIF3E blue 1037 1 0.616 Each meta-signature was cross-referenced against modules of coexpressed genes as reported in [12]. Modules showing an overlap with meta-signature genes are listed. The total number of probes contained in each module, and the number of module probes overlapping with each meta-signature is also presented. Additionally, we have included the gene names for the signature probes contained in these modules. Benjamini-Hochberg (BH) correction was applied, and corrected p-values are reported. Modules highlighted with an asterisk (*) indicate significant overlaps (p < 0.05).

81

Figure 7: Example of consistent expression changes for a gene across data sets Expression data within each dataset after covariate correction is presented for the top down-regulated gene NECAB3. Plots are labeled with the associated dataset. Samples were separated into disease and control cohorts and expression was plotted as a boxplot. Individual sample values were overlaid on with red squares representing control individuals and blue triangles representing schizophrenics.

82

83

Figure 8: Expression changes in the ‘core signatures’ For each probe in the core signatures (meaning they are retained as significant even after the removal of any single study), the corresponding data from each study was extracted and converted to a heat map. Expression values were normalized across all samples within each dataset, and as in Figure 1 the data are corrected for the covariates such as batch and age. Rows represent probes and are labeled with its unique gene mapping if one exists. Columns represent samples. Grey bars represent the control brain samples, and the black bar represents the schizophrenia samples. Light values in the heat map indicate higher expression values.

84

3.4 Discussion

In this chapter I present expression changes associated with schizophrenia which are consistent across up to seven independent cohorts of subjects. To my knowledge, the degree of validation and confirmation inherent in this analysis is unprecedented. Unlike previous studies, which use PCR assays to check results on the same RNA samples used for microarrays, or which compare at most two cohorts, I have identified changes in expression that are shared across independent subject cohorts, analyzed by laboratories distributed around the world. The results of this chapter provide a new window into the molecular changes that might underlie schizophrenia.

The larger number of down-regulated probes is in agreement with previous reports [98, 108, 227]. Many of the genes we have identified have been previously reported to be expressed in the brain, with some genes showing neuronal specificity. Some of the genes we report as differentially expressed have been previously implicated in schizophrenia, either through expression profiling studies of schizophrenia

(KCNK1, CRYM, FBXO9), or genetic association studies (OPCML [234]). We also identify three genes in our signature (up-regulated genes WNK1 and ABCA1 and down-regulated gene SNN) that overlap with results from a comparative analysis of two of the studies we used [102]. Additionally, we found functional gene groups discussed in previous expression studies of schizophrenia. Many of the same metabolic processes were observed in a study of 71 different metabolic genes groups in schizophrenia [229]. Also in agreement, genes related to energy pathway and mitochondrial function were found previously in dorsolateral PFC studies of schizophrenia [254, 256]. Over-expression of immune responses from our

GSR analysis is also concordant with recent findings of over-expression in genes related to immune function in schizophrenia [108, 109, 255].Thus, our results are supportive of at least some previous findings and reveal a previously unrecognized similarity across studies.

Our meta-signatures contain a number of interesting new candidate genes, particularly our down- regulated meta-signature which potentially reflects alterations in neuronal communication. NOVA1 is a regulator of RNA splicing recently found to inhibit splicing of exon6 from the dopamine receptor D2 gene

85

resulting in D2L, the long isoform of the receptor [257]. With NOVA1 decreasing in expression in schizophrenia, inhibition may be repressed leading to higher than normal levels of the spliced D2S isoform which is involved in neuron firing and dopamine release. The DLGAP1 gene encodes a protein interacting with PSD-95 and a complex of other proteins in the postsynaptic density. Decreased expression of this scaffold protein may have consequences for anchoring and organizing receptors and signaling molecules on the postsynaptic side. Moreover, we have identified several genes associated with calcium signaling (CACNB3), binding (SLC25A12, NECAB3) and homeostasis (CCL3, ATP2B2), processes of likely relevance to schizophrenia [258]. We have also identified genes that associate with the GPCR signaling pathway. One example is GNAL, a gene encoding for the alpha subunit of the G- protein Golf, expressed in many regions of the brain. Given the critical roles of G-proteins it is plausible that GNAL (and other GPCR related genes) may have a role in the pathophysiology of schizophrenia

[259]. GNAL expression has not been previously shown to be affected by schizophrenia, but it is located in a chromosomal region (18p.2) that has been linked to schizophrenia and bipolar disorder. More specifically, a di-nucleotide repeat in intron 5 of the GNAL gene has been linked to schizophrenia in some families [260]. These expression changes concerning synaptic function may reduce neuronal energy demand in the brains of affected patients thus providing explanation for the down-regulation of various oxidative phosphorylation and energy metabolism genes that we observe.

We also sought to examine whether our signature genes could be inferred to share some previously unknown function, by making use of gene network analysis. One way to do this is by the principle of “guilt by association”, which states that genes with shared function are more likely to interact [149]. However, the meta-signature genes have a fairly low number of interaction partners, making “guilt” difficult to ascertain. Another property to examine is path length in the network, where genes that have short paths between them might be more functionally related. In general, low node degrees would imply higher path lengths among the genes, but this was not the case for our gene set. That is, the signature genes are linked by unusually short paths in the network. Additionally, we found each of our meta-signatures revealed a significant overlap with previously identified gene coexpression modules in the human cortex

86

[12]. This suggests a relationship among the genes that is not reflected in current annotations and a network analysis of these schizophrenia genes will be investigated in greater detail in the next chapter.

We found that some of the down-regulated schizophrenia genes overlap with genes that decrease in expression with age. Many of the biological processes affected by age (for example, oxidative phosphorylation) also tend to appear as affected processes in schizophrenia, both in this study and existing profiling studies [106, 108, 229, 254, 256]. These findings suggest that many genes affected by age are also affected by schizophrenia, but also raises the possibility of confounding effects. A comparison of pH genes with schizophrenia meta-signature genes did not reveal much of an overlap, although we did observe a significant difference in pH levels between the control and schizophrenia samples. Our findings of patients with schizophrenia displaying decreased brain pH levels relative to controls, is a common feature observed in postmortem brain studies of schizophrenia [118, 256]. The cause of this decrease is unclear. It has been proposed that lower pH together with increased lactate levels in schizophrenia implicates oxidative stress and energy metabolism as a possible pathology of the illness [256]. However, studies of the frontal cortex of rats treated with antipsychotics also exhibit similar properties (with pH and lactate levels), suggesting decreased pH levels is secondary to medication effects [261]. As these age and pH effects could be confounded with the disease effect, one could filter the list of schizophrenia candidate genes from our results by simply removing known age- and pH- affected genes from the final signature (leaving 31 up- and 51 down-regulated probes) to investigate these effects more thoroughly.

The results from this chapter should be interpreted in the context of several caveats. First, the approach employed is specifically designed to find concordant results across studies, and does not detract from the findings from the original single data set studies. We do suggest, however, that genes found to be commonly differentially expressed by multiple studies are of particularly high value in identifying underlying etiological influences in schizophrenia. As is the case for all postmortem brain studies, we also cannot be sure that the expression changes we have identified are direct effects of the illness or are secondary effects of the disease, medication or other external factors. An additional caveat is that

87

because we were unable to obtain medication or illicit drug use information for all subjects, we were not able to incorporate this information into our analysis. To help address this we compared our signatures against gene lists derived from a recent review on convergent antipsychotic mechanisms [262]. We observed no overlap with our signatures. In addition to antipsychotics, the use and abuse of other recreational drugs and smoking are also compounds that can confound the study of disease-related gene expression. Due to a lack of sufficient information on these factors we were unable to strictly control for them in our analysis. However, using gene lists provided from the SMRI Online Genomics Database

(http://www.stanleygenomics.org) we were able to make comparisons to address some of these factors and identified two overlapping genes. While the small number of overlapping genes is suggestive that we have identified genes in our signature that are not affected by such extraneous factors; we acknowledge that we cannot entirely exclude the possibility that the gene expression changes we have identified are still in some way influenced.

In conclusion, I have contributed the most comprehensive meta-analysis of schizophrenia expression profiling studies to date. The most striking finding is that despite the heterogeneity of the disorder, we were able to detect a common signature of schizophrenia. Additionally, I elaborate on the biological relevance of our gene list, illustrating a need for further genetic study to fully enhance our understanding of the direct implication of these changes in expression with the illness. The signatures we identified are consistent with current hypotheses of molecular dysfunction in schizophrenia, including alterations in synaptic transmission and energy metabolism. However, the diversity of genes we found suggests that systems biology approaches, exemplified by the analysis of gene network structure, will be of value in determining the basis of this disorder. The approaches used in this chapter of work should be applicable to other neuropsychiatric disorders if sufficient data are available.

88

Chapter 4: Gene coexpression network analysis of schizophrenia

4.1 Introduction

Schizophrenia is a severe psychotic disorder for which a comprehensive biological understanding remains elusive. Evidence from neuroimaging, neurocognitive and postmortem brain studies of schizophrenia demonstrate that multiple cellular pathways are related to the pathophysiology [72].

Notably, there is a lack of concerted integration of our knowledge of molecular deficits which can hinder systems-level interpretation of schizophrenia. Gene expression profiling of postmortem human brain (e.g. microarrays) has been increasingly used as a means to investigate patterns of molecular disruption in the brains of patients with schizophrenia. One of the most common types of analysis applied to expression profiling data is differential expression; which is used to identify over- or under-expressed genes associated with the illness. Candidate genes identified from expression profiling studies in schizophrenia have implicated alterations in different cellular systems, including myelination, synaptic transmission, metabolism, and ubiquitination [106, 108, 224, 228, 229, 231, 256]. These findings are not always replicated across individual studies, nor have they been successfully integrated into a comprehensive biological framework.

As presented in Chapter 3, I performed a meta-analysis of differential expression using seven independent expression profiling datasets, to identify a set of candidate genes which are consistently differentially expressed in the prefrontal cortex of patients with schizophrenia [263]. The functions reflected in our ‘meta-signature’ of schizophrenia genes are diverse and the interactions between them are largely unexplored. Because gene function is partly defined by interactions with other genes (at the biochemical, physical interaction, genetic or regulatory levels), it is attractive to apply gene networks as an aid in the interpretation of gene function in the nervous system. In Chapter 3, I evaluated our ‘meta- signature’ of schizophrenia genes in the context of a PPI network, revealing a shared relationship not reflected in the current annotations of these genes. In this chapter I evaluate the ‘meta-signature’ of genes within gene coexpression networks to establish further evidence of relationships among them. My 89

aim is not to replicate our findings from the PPI network, as the two types of networks are not directly comparable. Rather, I ask whether our ‘meta-signature’ of schizophrenia genes also collectively exhibit unusual properties in brain coexpression networks and how those properties may differ between healthy controls and individuals with schizophrenia.

A gene coexpression network is an undirected graph, which is constructed from expression profiling data using correlation-based inference methods. The graph nodes correspond to genes, and the edges (or connections) between two genes represent significant coexpression based on thresholded values of the pairwise correlation coefficient calculated from the expression data [147]. In PPI networks, the edges correspond to binary values indicating the presence/absence of a known interaction between two proteins. The edges in a coexpression network on the other hand reflect the correlation structure of the data. While the edges in a coexpression network indeed contain come biological meaning, a connection between two genes should not be mistaken to suggest a physical interaction between them. In general, gene networks can be analyzed to identify higher-level features of gene-gene relationships based on graph theoretic considerations such as node degree (the number of connections a gene has, i.e., to identify “hubs”) or clustering coefficient (how well connected the neighbours of gene are to one another)

[148, 159, 264]. Evaluating the broader network structure allows us to detect modularity in the graph, or groups of densely connected nodes with sparse connections between groups [265]. Characterization of these tightly knit ‘modules’ can convey useful information as they may be associated with specific molecular complexes or functions, yielding hypotheses that would be difficult to ascertain based on a gene-by-gene analysis.

An especially attractive feature of coexpression analysis compared to PPI networks is that it can exploit data that are condition-specific. Given samples from two different conditions we can consider the occurrence of ‘differential coexpression’ which might reflect changes in regulatory network wiring [266].

We can evaluate modularity between condition-specific networks to elucidate similarities and differences in network topology. Network analyses have recently been applied to a number of postmortem human brain expression profiling datasets for examining general transcriptome patterns of the CNS [12], and to

90

interrogate the molecular basis of neuropsychiatric [10, 13, 183] and neurodegenerative diseases [11,

267]. A recent study by Torkamani and colleagues [13], conducted a network analysis by combining two independent schizophrenia expression profiling datasets. Expression data was merged across studies and separated into control and schizophrenia cohorts. Weighted gene coexpression network analysis

(WGCNA) based methods [148] were then applied to the merged data to create two networks to represent the control and schizophrenia brain. Modules of coexpressed genes were identified and characterized by disease association, cell type specificity and functional enrichment. Similar module composition was observed in both schizophrenia and control networks, highlighting relevant biological themes such as oxidative phosphorylation, energy production and metabolism.

In this chapter, we apply coexpression network analysis to the seven schizophrenia microarray datasets used in Chapter 3, and evaluate differences in network properties. We used a rank aggregation approach

[182] to combine coexpression data across studies and generate separate networks to represent the control and schizophrenia brain. The two networks exhibit a similar structural topology, suggesting that the overall coexpression structure of the prefrontal cortex is normal in individuals with schizophrenia. We then examined properties of our ‘meta-signature’ of schizophrenia genes within each network, and identified features that were not observed with other functionally similar groups of genes or other brain- related disease gene sets. Clustering of our networks into high density sub-networks, and association of our ‘meta-signature’ genes with these clusters, suggests the disruption of processes previously implicated in schizophrenia. In particular, our results provide evidence for dysfunction in oligodendrocytes and myelination-related processes in schizophrenia. Finally, we also discuss the challenges to be addressed for future studies of gene coexpression networks in schizophrenia.

91

4.2 Methods

4.2.1 Data Processing and Quality Control

Expression profiling data sets were selected on the basis of criteria required for inclusion in the meta- analysis of differential expression described in Chapter 3 and in [263]. Details on each of the seven datasets, including the source citation, can be found in Table 11 of Chapter 3. Data were preprocessed as described; briefly, expression levels were summarized, log2 transformed and normalized for each individual dataset using the R Bioconductor ‘affy’ package [243], with default settings for the RMA algorithm. Sample outliers were removed from each dataset based on an inter-sample correlation analysis, resulting in a total across the seven data sets of 306 samples (153 from schizophrenia subjects,

153 from unaffected controls). For each of the seven data sets, batch information was obtained using the

‘scan date’ stored in the CEL files; chips run on different days were considered different batches and batch effects for each dataset were removed using the ComBat algorithm [214].

4.2.2 Gene Coexpression Networks

For each dataset, samples were separated into control and schizophrenia cohorts. Probes were mapped to genes using annotations provided in Gemma (http://www.chibi.ubc.ca/Gemma), which are based on stringent methods described in [238]. For genes mapping to multiple probes, the average expression value was retained. Only genes that were represented in all seven datasets were considered, leaving a total of 12,582 genes. This yielded seven expression data matrices for schizophrenia and seven for controls (one for each study). Separate networks were constructed for the schizophrenia and control groups based on previously described methods [182]. Briefly, a gene expression profile similarity matrix was computed for each cohort by taking the absolute value of the Pearson correlation between all possible gene pairs. Correlation values in the similarity matrix were replaced by ranks. These similarity matrices were aggregated by cohort across datasets by taking the mean rank for each gene pair. We previously showed that this aggregation procedure is a robust method for producing high-quality coexpression networks. In keeping with previous work [182], the aggregated matrix was thresholded at 92

0.5% sparsity, resulting in an adjacency matrix of 392,606 connections for each of the control and schizophrenia cohorts.

4.2.3 Random Coexpression Networks

To evaluate the significance of network measures across the whole network, formulation of appropriately randomized null models are required. We devised a procedure that results in a random network with the same number of genes and the same node degree distribution as the original data. Additionally, the node degree for each individual gene is preserved (i.e. each gene still has the same number of connections, but the specific genes which it is connected to are scrambled). All gene pairs were assembled into an adjacency list (2 columns, 392,606 rows) and genes on one side of the edge were permuted. The resulting edges that represent self-connections and/or duplicate gene pairs (“problem edges”) were isolated and permutation was re-applied to them. This was done iteratively on the subset of problem gene pairs until the number of “problems” was reduced to ten or less. These remaining “problems” were removed from the final random network.

4.2.4 Network Properties

We explored four different network properties, each of which is briefly described below.

Node Degree

Each gene can be characterized by the number of connections it has, that is, the number of other genes it is significantly coexpressed with. This property is called the node degree. Node degrees were characterized by their distribution. For many biological networks the degree distribution has been characterized as ‘scale-free’, or at least ‘heavy tailed’. This can be observed by the quality of a linear fit of the distribution on log-log scale [268].

93

Shortest Path

The shortest path length measures the shortest distance to get from one gene to another gene by traversing edges in the network. In an un-weighted network this is the least number of edges traversed to get between the two genes. We computed shortest paths using Djikstra’s algorithm [152]. A value is obtained for a gene against every other gene in the network, and presented as the mean shortest path length across all genes. Genes without any direct neighbours are treated as missing values.

Clustering Coefficient

The clustering coefficient of a gene indicates how connected the direct neighbours of a gene are to one another. It is the ratio of the number of connections in the neighbourhood of a node to the number of connections if the neighbourhood was fully connected. The clustering coefficient ranges from zero to one.

A value of 1 would indicate that all the neighbours of a node are all connected to each other, or ‘cliquish’ in nature. A value of 0 would indicate that none of the neighbours of a node are connected to each other.

This measure can only be computed for nodes that interact with more than one other node.

Assortativity

A property that is computed at the whole network level is assortativity. It is defined as a preference for a network’s nodes to attach to other nodes that are similar in some way, most often evaluated in the context of node degree. In the example of a highly assortative network, highly connected nodes or ‘hubs’ would be connected to other hubs and nodes with few connections or ‘provincial’ nodes would be connected to other provincial nodes. Using simulated scale-free networks it has been proposed that increasing assortativity is correlated with higher path lengths and changes in the behaviours of clustering coefficient

[269], however these relationships remain to be validated in real biological networks.

4.2.5 Schizophrenia Meta-signature Network Analysis

The meta-signature gene set of 25 up- and 73 down-regulated schizophrenia genes were obtained from the results of our meta-analysis of differential expression in Chapter 3 [263]. Four genes were removed

94

from the down-regulated gene set as they were not present in the network, leaving a total of 94

‘schizophrenia genes’. Throughout this chapter we will refer to these gene sets as SZUP and SZDOWN for the genes up- and down-regulated, respectively. Average values of shortest path length and clustering coefficient for the SZUP and SZDOWN gene sets were evaluated within each network. To estimate the relevance of the network measures for SZUP and SZDOWN, we implemented three essential controls described below.

Random gene set comparison. For each meta-signature gene set, the average values of shortest path length and clustering coefficient were compared to a background distribution in each network. The background distribution was generated by randomly selecting 1000 gene sets with size and node degree matched to the meta-signature gene set. To ensure a well-matched node degree for each random gene set, selection was done on a per-gene basis by choosing a random gene within ± 50 of its node degree rank. Z-scores were then computed to quantify the difference between the mean of the background distribution to the observed values for each network measure of SZUP and SZDOWN (Table 18). For positive z-scores a p-value was computed reflecting how many random gene sets have values higher than the observed value. For negative z-scores a p-value was computed reflecting how many random gene sets have values less than the observed value.

Functional gene set comparison. Although our meta-signature of schizophrenia genes span a range of cellular functions, they possess a shared functional feature of altered expression in schizophrenia. Thus, it is important to assess whether the network properties we observe with our meta-signature gene sets are not just a property of gene groups that have shared functional features. To control for this, we generated functionally characterized gene sets using the Gene Ontology (GO). From the GO database http://www.geneontology.org/, we obtained 3,230 GO terms for which the associated gene set size ranged from 10-1000 genes. For each GO term we retrieved all human genes that were annotated with that term to compile a gene group for each GO term, also referred to as a functional gene set. Each of these functional gene sets were evaluated individually by comparison to a background distribution of randomly selected gene sets (of equivalent size and node degree), within each network. The distribution

95

of z-scores obtained from 3,230 functional gene groups was plotted for each network and used to evaluate network properties of the meta-signature gene sets in reference to other functionally related gene sets.

Disease gene set comparison. To assess the network properties of our schizophrenia meta-signature genes in relation to other sets of disease-associated genes, we compiled disease gene lists for five different brain-related disorders. Gene sets were assembled for Alzheimer’s disease

(http://www.alzgene.org/), Parkinson’s disease (http://www.pdgene.org/), multiple sclerosis

(http://www.msgene.org/), and schizophrenia (http://www.szgene.org/) from their respective gene databases. Each database has been compiled based on findings from genetic association studies and provide gene lists on their website. The schizophrenia list obtained from SZGene

(http://www.szgene.org/) comprised only the top 45 of the most reliable gene associations based on findings from an SZGene in-house meta-analysis. We also compiled an Autism spectrum disorder gene list from Toro et al [270]. Average values of shortest path length and clustering coefficient were computed for all five disease gene sets. Network measures were compared to a background distribution of randomly selected gene sets and z-scores were compared against functional gene set z-score distributions.

4.2.7 Network Clustering

To extract clusters (i.e. groups of densely connected nodes) from the control and schizophrenia networks we implemented two different algorithms, both of which are described below.

WGCNA-related methods. Each adjacency matrix was transformed into a distance matrix by computing the topological overlap between all probe pairs [271]. Topological overlap measure (TOM) between two genes is calculated by comparing the direct connections of each. If two nodes connect to the same group of other nodes they are said to have ‘high topological overlap’. We used a generalization of this measure that enriches TOM’s sensitivity to longer ranging connections between nodes by incorporating the number of m-step neighbours (m=2) that a pair of node share [271]. The TOM matrices were subjected to

96

WGCNA-based methods [148], whereby hierarchical clustering was applied with average linkage, and the resulting tree was used to define network clusters. The clusters generated by WGCNA-based methods are referred to as WC1, WC2, etc.

MCODE Algorithm. The MCODE [159] plugin for the Cytoscape platform [272] begins by computing a score of local density for each node whereby all neighbouring nodes are connected to each other with at least k-specified edges (k=0,1,2,3…). High scoring nodes become seeds which are expanded in a local search procedure, connecting highly scored nodes to the cluster. The expansion process proceeds until a given score threshold is reached (i.e. % score from the seed node) and re-iterates on the remaining nodes. Network clusters were generated by MCODE at five different k values (k = 2,3,4,5,6) using the default settings. We evaluated the top five clusters generated by MCODE since the gene membership for these clusters remained constant irrespective of choice of k. The clusters generated by MCODE are referred to as MC1, MC2, etc.

4.2.8 Enrichment Analysis

For each network, the top five clusters produced by the two clustering methods were further analyzed for functional enrichment of GO terms using the ROC scoring method in ErmineJ [198]. The ROC method evaluates the ranking of genes in the list to determine if there are gene sets which are statistically over- represented. P-values for this method are computed as described in [273], and then corrected for multiple testing using the Benjamini-Hochberg procedure. Clusters were also evaluated for CNS cell type enrichment by cross-referencing the genes in each cluster with published lists of neuron, oligodendrocyte and astrocyte marker genes [247]. Hypergeometric probabilities were computed to evaluate the significance of overlap in each cluster. For each meta-signature gene set (SZUP and SZDOWN), we extracted a ranked list of gene coexpression within each network. Briefly, we used a binary vector representation for each gene set and multiplied it against the unsparse matrix (CTL and SZ). This transforms the matrix into a column vector of 12,582 values. Each value is a sum of normalized ranks

97

which reflects the extent of coexpression of a gene with the given gene set. A standard ROC analysis was performed on the ranked column vector to quantify the enrichment of cell type marker genes.

4.3 Results

We constructed two gene coexpression networks; one representing the control human prefrontal cortex and the other representing the prefrontal cortex in schizophrenia (referred to as CTL and SZ throughout the chapter). Each network was comprised of 12,582 genes (nodes), and 392,606 coexpression ‘links’ among them. The two networks had similar values in the average clustering coefficient (p > 0.1), but average shortest path length across nodes differed slightly (p < 0.01). Positive assortativity values were observed in both networks with slightly higher values in CTL. These network properties are summarized in Table 17. Both networks exhibited a ‘heavy-tailed’ node degree distribution, with most of the genes interacting with few partners and a small proportion of genes displaying ‘hub’-like behaviour interacting with many genes. In the literature, such distributions are sometimes described as ‘scale free’. We used a linear regression of the log-scale node degree distribution to examine this in our networks (Figure 9).

While a linear fit explains over 80% of the variance in node degree distribution, (CTL R2 = 0.857; SZ R2 =

0.872), the fit is not very good at the extremes. Based on established criteria, our networks are not ‘scale- free’ [156]. However, the ‘heavy-tailed’ nature of the node degree distributions in our networks is typical of other ‘biologically relevant’ coexpression networks cited in the literature.

The small differences in global network properties observed between CTL and SZ suggests that there is an overall coexpression structure of the prefrontal cortex that is retained between the two. Fifty-seven percent of the edges (224,384 links) are the same in the two networks, much higher than expected by chance. The remaining 168,222 edges are not shared between the two networks (Figure 10). Differences in connectivity between the networks are also indicated by a higher maximum node degree in SZ (935) than CTL (737), and the increased number of non-connected nodes in CTL (2356) compared to SZ

(2288). These differences could indicate subtle biological differences between the two networks, but are presumably at least partly due to the effect of noise.

98

Previously our group showed that both PPI and coexpression networks show a correlation between node degree and gene multifunctionality [274].Using the large PPI network constructed for use in Chapter 3, the correlation of node degree with multifunctionality is 0.53. Gillis and Pavlidis [274] found that for coexpression networks, the correlation between node degree and multifunctionality tended to be much lower, reaching 0.28 for a network made from aggregating 47 large studies. With individual datasets of smaller size, lower values of correlation are found. In our networks the correlations were low but significant with 0.039 in CTL and 0.036 in SZ, consistent given the small number of datasets used. The low correlation suggests a biologically-relevant signal in the data.

In addition to comparing average network properties across the SZ and CTL networks to each other, we compared each separately to a node degree-matched random network (see Methods). For features based on connectivity (i.e. shortest path length and clustering coefficient), we found the observed distributions of both networks to be higher than compared to random networks. Shortest path length displayed slightly higher values than found in randomized networks (Figure 11). Additionally, genes showed an increased clustering into local communities compared to genes from a randomized network with identical degree distribution (Figure 11). Thus while the SZ and CTL networks are similar, they are also clearly distinct from random networks with the same node degree distribution.

We next investigated network properties at the level of gene groups, focusing on our previously identified meta-signature of genes differentially expressed in schizophrenia [263]. The meta-signature of 94 genes

(25 up-regulated and 69 down-regulated) will be referred to as SZUP and SZDOWN, respectively.

Network properties were assessed for each gene set individually by taking an average across all genes in the group. These results are summarized in Table 18. For each gene set we computed the average values for shortest path length, cluster coefficient and node degree and evaluated differences observed between the control and schizophrenia networks. In general, both gene sets had a low mean node degree with respect to the network degree distribution of CTL and SZ, tending not to be ‘hubs’. For the SZUP gene set, we found higher node degree, shorter path length and an increased clustering coefficient in the

SZ network, though these differences were not statistically significant (Wilcoxon- Rank Sum test p >

99

0.05). Conversely, the SZDOWN gene set exhibited a decreased node degree, larger path length and a significantly lower clustering coefficient (Wilcoxon- Rank Sum test p < 0.05) in the SZ network. Thus, each gene set displayed properties that differ between the two networks, and the two gene sets compared to one another revealed opposite changes in behaviour.

To assess the relevance of the changes observed for SZUP and SZDOWN between the two networks we implemented three different methods of control. A first control was supplied by comparing observed network measures for SZUP and SZDOWN to a background distribution of 1000 randomly selected gene sets of matched size and node degree (see Methods). The difference between the observed values and background was assessed by computing z-scores and p-values, as reported in Table 18. P-values represent the probability of a random gene set having a value higher (for positive z-scores) or lower (for negative z-scores) than the observed network measure. The shortest path length for SZUP was not different from the background in either network. However, values for SZDOWN were substantially different from the background in both networks. Thus, for both gene sets we observed no change in path length between networks. Our strongest result was the behaviour of the clustering coefficient for both gene sets. In the previous paragraph we reported that for SZUP the average clustering coefficient showed an increase in the SZ network compared to the CTL network. The p-values indicate that the increased value in SZ is significant when compared to a background distribution (p = 0.005). For

SZDOWN, we reported an increased average clustering coefficient in the CTL network compared to the

SZ network. A comparison to the background distribution generated a marginally significant p-value (p

=0.06) for the increased value in CTL indicating a difference from random, albeit small. Together, these results converge to highlight two properties of our gene sets, 1) the SZUP genes become highly interconnected in the SZ network, exhibiting a property that is different from the background distribution and 2) the SZDOWN genes are more interconnected in the CTL network compared to the background distribution, and lose this property in the SZ network.

A second control was applied to examine whether or not the properties observed for SZUP and SZDOWN are a feature of other functionally grouped sets of genes. This is a more stringent control than simply

100

comparing to random gene sets, because we are interested in properties of our genes that are special compared to functional gene sets. We created 3,230 different functional gene sets based on GO terms and their associated genes. Network measures were computed for each functional gene set, and compared against values obtained from randomly selected gene sets of the same size and node degree.

A z-score was computed for each comparison to the background. The distribution of 3,230 z-scores obtained from the GO groups was plotted and z-scores for SZUP and SZDOWN were evaluated by comparison (Figure 12). The results corroborate our findings from the background distribution control. For shortest path length, the z-scores for SZUP resembled values from GO groups and z-scores for

SZDOWN were more distinct from the GO group distribution. Together, the application of control methods revealed that shortest path length is not a special property of the SZUP gene set, and although it is for

SZDOWN this feature does not differ between the CTL and SZ networks. For the clustering coefficient, the z-score for SZUP is distinct from functional gene set values in the SZ network but not as much in the

CTL network. The opposite is true for the SZDOWN z-score values. Thus, the clustering coefficient is a unique property of our meta-signature genes based on the differences exhibited between the two networks.

A final control was applied to evaluate whether our meta-signature gene sets share properties with gene sets associated with other brain-related disorders. We assembled gene sets for five different illnesses mostly based on findings from genetic association studies. For each disease gene set, z-scores were computed based on a background distribution and compared against the functional gene set z-score distribution generated from GO groups. Of particular interest are the results observed for clustering coefficient in the two networks. Interestingly, the Alzheimer’s disease gene group (red line, Figure 12) exhibited strikingly similar properties to SZUP in both networks despite having only one overlapping gene.

Notably, the schizophrenia and Parkinson’s disease gene groups follow a similar but more subtle trend as

SZDOWN.

We next tested the robustness of the network measures observed for SZUP and SZDOWN using a jackknife procedure. In this process, we removed one of the seven datasets and regenerated aggregate

101

CTL and SZ networks on the remaining six, for each study in turn. This yields seven pairs of jackknife networks. For each jackknifed network, the average shortest path length and clustering coefficient was computed for SZUP and SZDOWN and values were compared between networks (Figure 13). For SZUP, we observed a general agreement of increasing clustering coefficient and consistently decreasing path length between CTL and SZ across all iterations. For SZDOWN, we found that only the clustering coefficient effects were robust to removing single data sets; the path length results proved to be more sensitive. Taken as a whole, these results are consistent with there being subtle network property differences for the SZUP and SZDOWN genes between the two networks.

The clustering coefficient of the meta-signature gene sets is a feature of particular interest as it reveals insight into the community structure of nodes. Genes contained in SZUP and SZDOWN are in more interlinked neighbourhoods in the SZ and CTL networks, respectively, as suggested by their high clustering coefficient. Considering a sub-network of only within-gene set interactions the average clustering coefficient roughly doubles in value for both SZUP (CCCTL = 0.62; CCSZ = 0.66) and SZDOWN

(CCCTL = 0.55; CCSZ = 0.46), suggesting within gene set interactions are driving interconnectivity in the full networks. Not surprisingly, these genes are highly linked to one another unlike random gene sets of matched size and node degree. Further, the within gene set connectivity of both meta-signature gene sets is increased in the SZ network. For SZUP, this is simply a result of the addition of new links, but in

SZDOWN there appears to be a rearrangement of topology, with a combination of addition and removal of links (Figure 14).

Our analysis to this point examined either the entire networks or used supervised approaches to select sets of genes for analysis. We complemented this with an unsupervised method based on clustering. This analysis was motivated in part by the observation that the meta-signature gene sets showed significant modularity differences between CTL and SZ. We hypothesized that this might be a more general property, and that in particular we might find ‘modules’ which contain SZUP and/or SZDOWN genes and link them to other cellular functions. To identify modules in each network we first converted our sparse matrices into TOM matrices which represents the neighbourhood overlap between all gene pairs.

102

Hierarchical clustering was then applied to the TOM matrices and the resulting trees were used to define network modules [148]. Each network resulted in 18 modules of varying sizes. A comparison of modules between the two networks was done by computing significance of module overlap using the hypergeometric distribution. A matrix of the resulting log10 transformed p-values is plotted using a heatmap representation in Figure 15. Significant overlaps were observed between modules indicating excellent overlap of gene membership between the control and schizophrenia network modules, concordant with findings reported by Torkamani et al [13].

In order to characterize network clusters with regard to central nervous system (CNS) function and cellular organization, we cross-referenced total gene membership of the top five clusters (ranked by size) from each network with gene lists of (1) CNS cell type markers [247], and (2) biological process terms from the Gene Ontology. Results are summarized in Figure 16 and Table 19. Also reported are the number of schizophrenia meta-signature genes that associate with each of the modules. Not surprisingly, a similar enrichment of GO terms was observed in modules of both networks. The WC1 module in both networks associated with terms related to oxidative phosphorylation and energy production. The WC1 module was also highly enriched with neuronal markers and significantly overlapped with SZDOWN genes in both networks. This is consistent with evidence indicating mitochondrial dysfunction and defects in brain metabolism leading to oxidative stress in schizophrenia [224, 256]. The WC2 module associated with only one significant GO term, “regulation of action potential” (GO:0019228), suggesting it is a myelination-related module in both networks. As expected, cell type enrichment of the WC2 module identified a large number of oligodendrocyte marker genes in the control network. Interestingly, in the schizophrenia network the enrichment is altered such that there are more astrocyte marker genes that exhibit a significant overlap, in addition to the few oligodendrocyte marker genes that remain. The WC2 module also overlaps significantly with SZUP genes in the schizophrenia network but not in the control network.

We also applied the MCODE algorithm [159], another clustering method, to each of our networks for comparison to the WGCNA-based results. MCODE is designed to identify dense interconnected genes

103

from networks. It uses a weighted clustering coefficient to score nodes, and then clusters nodes based on similarity of scores. We examined the top five MCODE clusters (ranked by score) for enrichment of GO terms (Table 20) cell type markers and overlap of SZUP and SZDOWN genes (Figure 17). Similar to the

WGCNA-based clustering we found a myelination-related cluster which was present in both networks

(MC2). The genes in this cluster overlap with genes in the WC2 module in both networks, although a higher significance of overlap is observed in the control network (Figure 18). Also, concordant with findings from the WC2 module is a loss of oligodendrocyte markers genes and a shift towards more astrocyte marker genes in MC2 of the schizophrenia network.

The results from clustering methods provide evidence of gene coexpression patterns in the myelination- related module between the two networks. The module is largely conserved between networks, but there are small differences as illustrated by the loss of oligodendrocyte marker genes and the addition of astrocyte marker genes. From the WGCNA-based clustering, we also observed changes in gene membership of the SZUP gene set with the myelination module (WC2). Only two of the SZUP genes displayed and overlap in the control network, but 12 genes overlapped in the schizophrenia network.

Thus, we examined coexpression patterns associated with all 25 SZUP genes to explore the possibility of a relationship between SZUP and coexpression alterations of the myelination module (see Methods). A ranked vector of gene coexpression was extracted within each network, with values representing how well each gene in the network is coexpressed with SZUP. A standard ROC analysis was applied to the ranked list to quantify the enrichment of cell type marker genes. We found that some of the astrocyte marker genes are coexpressed with SZUP as they exhibited an AUC score higher than random (0.60-

0.62). If we look at only the 12 overlapping SZUP genes, the AUC score increases to 0.68. It is therefore possible that the up-regulation of SZUP genes in schizophrenia may be contributing to the network alterations of astrocyte genes observed in the myelination module. However, investigation of coexpression patterns at the individual gene level is required to make any definite conclusions.

104

Table 17: Whole network properties of the control and schizophrenia brain networks

Control Schizophrenia Non-connected nodes 2356 2288 Maximum node degree 747 935 Mean node degree 77 76 Assortativity 0.219 0.158 Shortest path length 3.34 3.32 Cluster coefficient 0.29 0.29 log-log fit (R2) 0.857 0.872

Table 18: Schizophrenia gene set network properties

Up-regulated (25) Down-regulated (69) CTL SZ CTL SZ Node degree Mean 63.9 83.5 127.4 106.2 Non-interacting nodes 2 2 7 3

Edges (within gene set) 12 23 129 144

Shortest Path Mean 3.28 3.14 3.31 3.48

Random gene set comparison Z-score -0.58 -0.91 3.15 4.46 p-value 0.23 0.15 0 0

Cluster Coefficient Mean 0.35 0.38 0.32* 0.27*

Random gene set comparison Z-score 1.11 2.47 1.51 -0.607 p-value 0.14 0.005 0.06 0.28 *Difference is significant between CTL and SZ at p= 0.05

105

Table 19: Gene Ontology enrichment of modules identified by WGCNA-based clustering

A: Control Network GO Term GO ID Corrected P-value WC1 mitochondrial electron transport, NADH to ubiquinone GO:0006120 1.42E-16 oxidative phosphorylation GO:0006119 2.88E-16 ATP synthesis coupled electron transport GO:0042773 1.47E-13 respiratory electron transport chain GO:0022904 1.76E-11 electron transport chain GO:0022900 2.76E-10

WC2 regulation of action potential in neuron GO:0019228 6.93E-03

WC4 keratinocyte differentiation GO:0030216 9.98E-03 sensory perception of chemical stimulus GO:0030216 0.012 epidermal cell differentiation GO:0007606 0.017 peptide cross-linking GO:0009913 0.018 cellular defense response GO:0006968 0.028

B: Schizophrenia Network Name GO ID Corrected P-value WC1 mitochondrial electron transport, NADH to ubiquinone GO:0006120 1.42E-16 oxidative phosphorylation GO:0006119 2.88E-16 ATP synthesis coupled electron transport GO:0042773 1.47E-13 respiratory electron transport chain GO:0022904 1.76E-11 electron transport chain GO:0022900 2.76E-10

WC2 regulation of action potential in neuron GO:0019228 6.93E-03

W C4 keratinocyte differentiation GO:0030216 9.98E-03 sensory perception of chemical stimulus GO:0030216 0.012 epidermal cell differentiation GO:0007606 0.017 peptide cross-linking GO:0009913 0.018 cellular defense response GO:0006968 0.028 The top five modules (ranked by size) in each network were characterized by an enrichment of biological process terms from the Gene Ontology (GO). For each module, significant GO terms are listed (pcorr < 0.05) to a maximum of top five terms. GO terms listed are taken from results of an ROC analysis in ErmineJ [198].

106

Table 20: Gene Ontology enrichment of modules identified by MCODE clustering

A: Control Network

GO Term GO ID Corrected P-value MC1 gamma-aminobutyric acid signaling pathway GO:0007214 0.032 glial cell development GO:0021782 0.045 ATP synthesis coupled electron transport GO:0042773 0.048

MC2 regulation of action potential in neuron GO:0019228 2.36E-03 regulation of action potential GO:0001508 0.034 gamma-aminobutyric acid signaling pathway GO:0007214 0.045 ensheathment of neurons GO:0007272 0.047

MC4 nucleotide-excision repair, DNA gap filling GO:0006297 0.043

B: Schizophrenia Network

GO Term GO ID Corrected P-value MC2 regulation of action potential in neuron GO:0019228 0.013 gamma-aminobutyric acid signaling pathway GO:0007214 0.021 regulation of action potential GO:0001508 0.026

MC3 deoxyribonucleotide catabolic process GO:0009264 0.024 deoxyribonucleotide metabolic process GO:0009262 0.029

MC5 ATP synthesis coupled electron transport GO:0042773 0.014 mitochondrial electron transport, NADH to ubiquinone GO:0006120 0.02 gamma-aminobutyric acid signaling pathway GO:0007214 0.031 The top five clusters (ranked by score) in each network were characterized by an enrichment of biological process terms from the Gene Ontology (GO). For each module, significant GO terms are listed (pcorr < 0.05) to a maximum of top five terms. GO terms listed are taken from results of an ROC analysis in ErmineJ [198].

107

Figure 9: Connectivity distribution of control and schizophrenia networks The control brain network (A) and the schizophrenia brain network (B) connectivity distribution on a log10- log10 scale. Plotted on the x-axis is the number of links versus the number of genes that have the corresponding number of links on the y-axis.

108

Figure 10: Shared edges between networks To assess the node degree differences between networks, values of node degree for all 12,582 genes in CTL were plotted against the number of shared edges in SZ. Data was plotted using the ‘hexbin’ R package to display proportions of values rather than the raw data points. The presence of data points that deviate from the identity line indicate differences in gene-to-gene connections for a number of genes between the two networks.

109

A) B)

C) D)

Figure 11: Comparison to random network distributions For each network, we generated a corresponding random network by swapping edges and maintaining the same node degree distribution. (A, B) Shortest path length distribution of real networks are shifted slightly higher than corresponding random network distributions, but distributions between CTL and SZ are similar. Grey histograms reflect values from the random network, and black histograms represent the real network data. (C, D) Genes cluster into local communities with high number of interconnections compared to corresponding random networks. Black dots represent real network data and grey dots represent random network data.

110

Figure 12: Comparison of gene set properties to functional GO groups Histograms represent z-score distributions for cluster coefficient (A, B) and shortest path length (C,D) computed across 3,230 different GO groups in the control and schizophrenia networks. Z-scores represent the difference between the mean value of the network measure of the GO group compared to the mean of random gene sets of the same size and matched node degree. Lines plotted represent the z- score obtained for the up- and down-regulated meta-signature gene sets and additional disease gene sets as labeled in the legend.

111

Figure 13: Jackknifed network measures For each jackknifed network (in which one dataset is removed), we computed shortest path length and clustering coefficient for SZUP and SZDOWN. To summarize trends observed in the jackknife analysis, we plotted clustering coefficient, shortest path length found in the CTL and SZ networks. Results from SZUP are found in A-B, and SZDOWN in C-D. Each line represents a different jackknifed network, with the legend indicating which dataset was removed.

112

A) Up-regulated CTL

SZ

113

B) Down-regulated CTL

SZ

Figure 14: Network representation of within gene set interactions for schizophrenia meta- signature genes Genes were extracted from CTL and SZ networks, along with any edges that connected to nodes within the A) up-regulated and B) down-regulated schizophrenia meta-signature gene sets. An addition of nodes and edges are observed for the up-regulated gene set in the SZ network. In the down-regulated gene set, a re-wiring is reflected by the addition and removal of numerous within gene set interactions between the CTL and SZ networks. Yellow nodes represent genes that are present in both networks, purple nodes represent genes added in the SZ network and blue nodes represent genes lost in the SZ network. Black lines represent edges retained in both networks, red lines indicate new edges and blue lines indicate edges lost in SZ.

114

Figure 15: Comparison of modules between networks (WGCNA) Each network was clustered using WGCNA-based methods [148]. Modules were compared between networks by computing the number of overlapping genes. Significance of overlap was performed by using the hypergeometric distribution to test for the probabilistic significance of module overlap. The resulting log10 transformed p-values for each overlap are plotted with the color scale provided above.

115

Figure 16: Enrichment of cell type markers in WGCNA modules The five largest modules identified by WGCNA in each network were characterized by an enrichment of cell type markers. For each cluster we report the number of genes in each module that overlaps with oligodendrocyte (OL), neuron (NEU), and astrocyte (AST) marker genes provided by [247]. Overlap with each meta-signature gene set is also reported. Similar x-axis scale is used for each module for a fair comparison between networks. Hypergeometric probabilities were computed to evaluate significance of overlap. ** p <0.001; *** p < 1E-10

116

Figure 17: Enrichment of cell type markers in MCODE modules The top five modules identified by MCODE in each network were characterized by an enrichment of cell type markers. For each cluster we report the number of genes in each module that overlaps with oligodendrocyte (OL), neuron (NEU), and astrocyte (AST) marker genes provided by [247]. Overlap with each meta-signature gene set is also reported. Similar x-axis scale is used for each module for a fair comparison between networks Hypergeometric probabilities were computed to evaluate significance of overlap. ** p <0.001; *** p < 1E-10

117

Figure 18: Cluster comparison between WGCNA and MCODE clustering algorithms Each network was clustered using two different algorithms: 1) WGCNA-based methods [148] and 2) MCODE algorithm [159]. The top five clusters were obtained from each method and were compared against each other within each network. The number of overlapping genes was computed and hypergeometric probabilities were computed to assess the significance of module overlap. The resulting log10 transformed p-values for each overlap are plotted with the color scale provided above.

118

4.4 Discussion

Our network-based approach for evaluating gene coexpression provides a novel assessment of coexpression patterns across seven large schizophrenia microarray datasets. We implemented a rank aggregation approach for network analysis revealing interesting patterns of molecular connectivity in the control and schizophrenia postmortem human brain. Overall, the two coexpression networks were very similar to one another. This is consistent with the findings of Torkamani et al [13] which involved two datasets, both of which were also included in our analysis. The networks shared a similar node degree distribution, and average values of path length and clustering coefficient taken across all nodes in the network were not significantly different. However, two networks that are similar in average network properties can still be quite different with respect to their underlying topology.

To evaluate differences in gene-gene connectivity between networks, we focused on the network properties of a specific group of relevant genes. We used a list of 95 differentially expressed

‘schizophrenia genes’ as reported in our previous meta-analysis [263]. This gene list was divided into two groups: 1) genes which are up-regulated in schizophrenia and 2) genes which are down-regulated in schizophrenia. The network properties of each gene set were examined within the control and schizophrenia networks to identify any distinguishing features of these ‘schizophrenia genes’. Importantly, we applied different controls to ensure that any network features identified were specific to the

‘schizophrenia genes’. These stringent controls are necessary aspect of network analysis, although not typically observed in existing studies. The clustering coefficient, a measure which gives us insight into the community structure of nodes, proved to be an interesting characteristic of both gene sets. The SZUP genes exhibited a unique increase in clustering coefficient indicating a high level of interconnectivity in the schizophrenia network. In contrast, the SZDOWN genes displayed high interconnectivity in the CTL network which diminished in the SZ network. Loss of interconnectivity was demonstrated by the clustering coefficient being reduced to a value representative of the background distribution. Further, we demonstrated that this differential interconnectivity of the ‘schizophrenia genes’ between networks is a

119

feature is not observed with other functionally similar groups of genes and most other brain-related disease gene groups.

An assessment of modularity across all nodes in each network was performed using unsupervised methods based on clustering. As the modules identified in a network can be contingent on the algorithms applied, we used two different methods and highlighted the similarities observed between them. Both algorithms identified a ‘myelination-related’ module/cluster which consistently appeared in both networks.

The module was largely conserved but differences were observed in gene membership. In the schizophrenia network, the number of oligodendrocyte marker genes present in the module decreased to half the amount present in the control network module, suggesting alteration of myelination-related processes. Myelination is the process by which oligodendrocytes envelope axons in their myelin sheath, allowing more rapid action potential conductance and information flow between brain regions. A wide range of white matter abnormalities have been revealed in schizophrenia [275] and genetic studies have contributed a number of myelin and oligodendrocyte –related genes as candidate genes (eg. APOD,

PLP1, MAG) [276-278]. Moreover, myelination-related genes have also been found to be down-regulated from gene expression studies [106, 279].

Interestingly, the loss of oligodendrocyte marker genes in the myelination-related module was coupled with a large increase in astrocyte marker genes. Also, genes in SZUP were found to be coexpressed with some of the astrocyte markers. Astrocytes are the most abundant glial cell type in the brain, playing multiple roles in organizing and maintaining brain structure and function. An example pertinent to our results is the direct role of astrocytes in promoting myelination; supported by studies in vitro and in vivo. It has been shown that astrocytes secrete factors which influence the rate of myelin ensheathement by oligodendrocytes [280]. Further, astrocytes have a major influence on remyelination as demonstrated by the observation that oligodendrocytes preferentially remyelinate axons in areas containing astrocytes

[281], and transplantation of astrocytes into demyelinated lesions enhanced endogenous remyelination

[282]. Taken together, with the findings from our network study of a shift towards more astrocyte markers and less oligodendrocyte markers, there is evidence for the recruitment of astrocytes in response to

120

abnormal myelination in schizophrenia. This might in turn imply the presence of mild astrogliosis in the

PFC of individuals with schizophrenia, however this issue is still a matter of debate with studies reporting findings for and against it [283]. Also, a large number of genes in SZDOWN were found to overlap with the oxidative phosphorylation-related module in both networks. Studies into white and grey matter abnormalities of patients with mitochondrial encephelomyopathy have shown that white matter is particularly vulnerable to damage by oxidative stress [284]. The findings from our network analysis demonstrate that although our ‘schizophrenia genes’ are not functionally similar based on current annotations, they are intertwined based on gene-gene relationships derived from coexpression. We have used this comprehensive panel of genes to support a model linking together broad areas of dysfunction which may contribute to the pathophysiology of schizophrenia. However, our interpretation of the network model is based on GO enrichment and should not be considered to be biological evidence, thus further investigation at the individual gene level will provide an explanation of higher resolution.

For the remainder of the discussion, we summarize our findings in the context of caveats of coexpression network analysis and propose some possible interpretations and avenues for further research. In this study we have aggregated coexpression networks across seven independent studies to generate a network representation of the postmortem PFC in healthy controls and individuals with schizophrenia. By aggregating data across studies we aimed to increase the reliability of observed interactions whilst reducing chances of identifying spurious interactions. Although seven datasets is a much larger cohort than found in the current literature of schizophrenia network studies [13], we note that our study could benefit substantially with additional data. Aggregating across a larger number of datasets has been shown to result in networks more comparable to PPI networks [182, 246], and is likely to give more reliable interactions. Another important feature of our study is that we tested the robustness of our results. When conducting meta-analysis across heterogeneous data sources, it is imperative to determine whether the signal observed is being driven by a particular dataset. The jackknife procedure applied to our networks demonstrates that our findings, though subtle, are not overly sensitive to the choice of data used.

121

A final comment is on the interpretation of coexpression networks and the underlying relevance of the gene-gene interactions we observe in these networks. In contrast with other biological networks (i.e. PPI networks) whose edges represent well-defined biological interactions, the edges in a coexpression network are a representation of the correlation structure of the data. The edges are related to values of the pairwise correlation coefficient that are calculated from the expression data of the genes, and are dependent on the threshold applied to infer those networks. A connection between two genes in a coexpression network does not necessarily correspond with a connection in PPI networks or regulatory networks [285]. Thus, when considering gene coexpression networks it is important not to confuse the gene to gene connections as direct physical interactions.

We have contributed the largest meta-analysis of gene coexpression in schizophrenia. We evaluated various topological properties of the control and schizophrenia networks to reveal a shared coexpression structure between them. Characterization of functional clusters in each network with cell-type marker genes displayed differences that link to together disease-related processes. Differentially expressed genes in schizophrenia also associate with biologically relevant clusters providing evidence for systems level dysfunction. Further research is required to disentangle these network findings to distinguish primary from secondary disease phenomena, but we hope our study will encourage new directions in the network biology of schizophrenia. Finally, our work suggests that coexpression network analysis is difficult but promising, and to ensure we are not misled by the data future work in this area should proceed with careful interpretation.

122

Chapter 5: Conclusion

5.1 Summary of Major Findings

A decade ago, gene expression was investigated on only a few genes at a time; these genes were carefully selected for analysis on the basis of a hypothesis regarding possible implications with the disease under study. With the advent of microarrays and other expression profiling technologies, it is now possible to preform genome-wide expression analysis in a hypothesis-free manner. Gene expression profiling of the postmortem human brain represents an active area of research contributing to the sustained effort in understanding the neuropathological underpinnings of schizophrenia and other psychiatric illnesses. The focus of my thesis was to use expression profiling data of the postmortem human brain and evaluate gene expression changes across studies by using a meta-analytical framework. The first objective was to evaluate gene expression changes in the control human brain.

These results provided a meta-signature of gene expression changes associated with four factors that can have a potentially confounding effect in postmortem human brain research: age, sex, brain pH and

PMI. The second objective was to identify differentially expressed genes in the prefrontal cortex of individuals with schizophrenia and to evaluate those genes in the context of gene coexpression networks.

I focused on identifying consistencies in expression change across studies to ultimately identify a common biological theme to explain the neural basis of schizophrenia. Remarkably, this second objective resulted in the discovery of a robust signature of 98 ‘schizophrenia genes’ that showed expression changes associated with the illness. Moreover, I showed that these ‘schizophrenia genes’ exhibit unique network properties, providing insight into alterations of relevant molecular processes associated with schizophrenia

In Chapter 2, I explored gene expression changes associated with four different fatcors. These factors are associated with large expression changes that can mask the detection of expression patterns attributable to the psychiatric disease under study. Of the many important issues affecting RNA, I focused specifically on age, sex, brain pH and PMI. The unique aspect of Chapter 2 is that in comparison to studies that have 123

examined each of these factors using a single dataset [127, 128, 189], I identified expression profiles that were mostly consistent across eleven different datasets comprising over 400 samples. For each factor a meta-signature of up- and down-regulated genes was identified, implicating an assortment of critical and relevant cellular processes. A significant overlap was observed for each meta-signature when validated with independent gene lists extracted from the literature, yet there were also a large proportion of novel findings. Comparisons of the meta-signatures against one another identified shared biological themes which suggest potential relationships between factors. Moreover, I found that previously identified candidate schizophrenia genes to appear in the meta-signatures, reinforcing the need for careful consideration of these factors in postmortem human brain research.

In Chapters 3 and 4, I turned my focus to examining gene expression patterns associated with schizophrenia. Using a total of seven schizophrenia microarray datasets, I had the unprecedented opportunity to create a large combined cohort comprising a total of 153 schizophrenia and 153 healthy controls. This combined cohort represents the largest, most comprehensive integration of schizophrenia expression profiling studies to date. Using this cohort I exploited two types of gene expression analysis: differential expression in Chapter 3 and coexpression network analysis in Chapter 4. In Chapter 3, linear modeling was used to identify differentially expressed genes in schizophrenia with careful consideration of the factors examined in Chapter 2. To account for the effects of age, pH, study and batch, each was included as a covariate in the model. A meta-signature of 98 genes were found to be significantly differentially expressed at a false discovery rate of 0.1, highlighting several novel genes and a handful of previously identified genes. The results of Chapter 3 make a substantial case for meta-analysis. Existing single dataset studies in schizophrenia report transcriptome alterations in related to processes such as synaptic transmission, energy metabolism, and immune function [98, 108, 224, 256]; but the individual genes which show expression change are not often replicated. The reported findings from each individual study used in our meta-analysis also lacked consensus. Yet, when combined in a meta-analytical framework with careful control of covariates together these seven studies identify a signature of genes that exhibit significant and consistent changes in expression. Moreover, this schizophrenia signature

124

reflected biologically relevant changes including transcripts associated with aspects of neuronal communication, and processes affected as a consequence of changes in synaptic functioning.

In Chapter 4, the schizophrenia signature was examined using network analysis to relate these genes to one another and to functional modules of coexpression. Two aggregated coexpression networks of the brain were generated: one representing normal healthy controls and the other representing individuals with schizophrenia. As previously observed in the literature [13], the two networks were similar in overall structure. However, differences were observed when network properties were examined for the up- and down-regulated ‘schizophrenia genes’ from Chapter 3. A particular property of interest was the clustering coefficient, reflecting a high degree of coexpression among neighboring genes. The up- and down- regulated genes exhibited differences in the average clustering coefficient between networks illustrating a behavior unlike other functionally similar groups of genes or brain-related disease gene groups.

Functional characterization of modules in each network identified a ‘myelination-related’ cluster. While the majority of genes in the myelination-related module were retained between networks, we observed changes in the assignment of cell-type markers and ‘schizophrenia genes’ to this cluster. Specifically, we found an increased coexpression of astrocyte marker genes with the module in schizophrenia coupled with a loss of coexpression with oligodendrocyte markers. This is suggestive of recruitment of astrocytes in response to the abnormal myelination feature of the illness.

For the remainder of this chapter I will discuss the results from Chapters 2-4 in more detail, highlighting contributions to the field and particular strengths and weakness of the study. I will also attempt to pull together the findings from my work and provide a coherent, yet speculative biological interpretation.

Finally, I will close with some potential applications for future work.

5.2 Contribution to Field of Study

Existing expression profiling studies of schizophrenia have provided few replicable candidate genes. This is partly explained by technical differences between studies including: cohorts (representing patients of

125

varying duration of illness), microarray platform, pre-processing methods, and statistical methods applied for differential expression analysis. When studies have been conducted in different ways, their results are no longer comparable. The use of the meta-analytical framework demonstrated in Chapter 3, enabled the unique opportunity to reduce the effects of some of these sources of technical variation and focus on biological variation. For instance, the raw data was obtained for each dataset and pre-processed using the same algorithm. Using only probes common to all datasets, the combined data matrix was treated as single dataset for linear modeling. Sources of biological variation were also carefully partitioned by use of statistical modeling to extract patterns of expression associated specifically with the disease. Importantly,

I applied multiple testing correction and identified of a set of genes that show significant differential expression in schizophrenia at an FDR of 0.1. I believe that the major contribution of my thesis is the identification of the most reliable set of gene expression changes associated with schizophrenia to date.

Most studies of schizophrenia in the postmortem microarray literature have not applied standard FDRs to the dataset since doing so usually results in few to no significant genes. It is important to take this into consideration when trying to understand the discordance in findings across studies. If the gene lists are not identified using proper statistical considerations, it is unlikely that one could extract common findings.

Thus, using methods described in my thesis I was able extract consistencies across studies and draw conclusions that would have not been possible from a simple comparison of published findings. I have provided a robust set of expression changes that reflect relevant biological processes, which will be discussed in detail in later sections of this chapter. Hopefully, these genes will contribute to a line of future research that will seek more direct evidence for their involvement in schizophrenia and lead to novel targets for treatment strategies in this illness.

An additional contribution of this work is the results from Chapter 2. The gene lists identified for four factors provided critical information towards the identification of problematic genes, when investigating the postmortem human brain in psychiatric illnesses such as schizophrenia. To increase the utility of my findings to the scientific community, these meta-signatures are available at http://www.chibi.ubc.ca/postmortem-brain to disseminate the results.

126

5.3 Strengths and Limitations

Schizophrenia is a disease of the brain. Thus, direct investigation of the postmortem brain provides a window into the affected biological mechanisms at the specific site of illness. Such informative findings would not be possible with the sampling of the cerebrospinal fluid (CSF), urine, serum, blood or other tissues. The problem with looking at RNA in the postmortem brain of schizophrenia is that the expression differences associated with the illness are so small.

RNA is a useful quantitative phenotype intermediate between DNA and protein, and is particularly advantageous when examined using expression profiling technologies. Gene expression profiling enables whole genome exploration of RNA expression levels between two or more sample types. The differences observed between controls and affected samples, collectively referred to as an expression signature, constitute a useful endophenotype for the condition under study. An expression signature is especially attractive for polygenic, complex disorders such as schizophrenia, as we search beyond a single causative gene. The work presented in my dissertation is based on data obtained from microarray studies of RNA expression in schizophrenia. Microarrays provide an affordable option for gene expression profiling compared to sequence-based methods such as RNA-Seq. However, there has been a move towards NGS-based approaches for characterization of transcriptomes as they offer lower background noise, better sensitivity and quantitative measures [286].

These technologies remain challenging for the study of postmortem brain in schizophrenia as they require high-quality RNA to measure relatively small changes in brain gene expression (~15-20% fold change).

There is uncertainty as to whether the small changes in expression are brought about by limitations of using postmortem brain tissue, or if the subtlety of expression change is a feature of the disease. One limitation is that postmortem human brain specimen can contain variable amounts of grey and white matter, each of which are heterogeneous in cell type [99]. This can result in a dilution of biological signals, whereby genes of interest exhibit changes in expression that might appear even smaller than they really are. The studies included in my analyses utilized bulk cortical tissue, where the cellular complexity of the

127

samples was lost during the harvesting procedure. The molecular deficits identified from our analysis cannot be definitively localized to a distinct cell type, and there may be changes from rare cell types that were undetected. Expression profiling studies are rarely applied in a cell-type specific manner in studies of schizophrenia, but if we are to continue to work with postmortem brain tissue it would be highly beneficial to do so. Cell-type specific transcriptomics relies on the successful identification of the cell-type of interest. A commonly used technique is LCM which uses a laser to excise cells of interest (identified under a microscope) from mounted thin-tissue sections that have been either fixed or frozen. LCM has been used in recent studies of schizophrenia to facilitate the identification of lamina-specific molecular markers [287] and to examine expression differences in the supragranular and infragranular layers of the

PFC [112]. However, tissue fixation can degrade nucleic acids and there is heightened risk of contamination when extracting cells from intact tissue. These single-cell methods can also be more prone to producing false negatives (particularly for low abundance transcripts) due to the small amounts of collected RNA.

In general, the interpretation of results from postmortem studies is problematic because cause and effect are difficult to disentangle. Brain tissue is obtained from patients who have died after having lived with the disease for various lengths of time, often having received medications. There is uncertainty as to whether the expression changes we observe are involved in what is causing the disease, or if the changes are a downstream effect of having the disease. Furthermore, if it is an effect that we are observing, it is likely that it is not exclusively from the disease. For example, individuals were exposed to various environmental influences and have died from different causes. Also factors such as age, sex, brain pH and PMI which are associated with large gene expression changes can mask the disease effect.

Standard practices for minimizing the effects of extraneous factors include sample matching or treating these factors as covariates in regression models. In Chapter 2 of my dissertation, I used control human brain data to assess the impact of these factors on gene expression. In Chapter 3, I incorporated these findings to assess possible confounding effects when evaluating gene expression changes associated with schizophrenia. Even with well-matched cohorts and inclusion of age as a covariate we detected a schizophrenia signature that manifested as aberrant matching of age. We observed an intersection 128

between our age gene list and the ‘schizophrenia genes’ which could indicate that genes affected by age are also affected by schizophrenia, but also raises the possibility of confounding effects. Thus, while these practices help reduce the variability introduced by extraneous factors we cannot completely rule out the presence of their effects.

When dealing with such small effect sizes, larger sample sizes become a necessity in order to obtain the statistical power to identify that change. The small sample sizes of current expression profiling studies of schizophrenia (N ≈ 20) are limited to the collections provided by current brain banks. Thus, there is a need to take advantage of integrative approaches. The findings of my thesis converge towards making a case for meta-analyses of postmortem human brain expression profiling studies. Meta-analysis improves the power to detect subtle changes in gene expression and identify consistent changes despite diversity of studies, as observed in my work. It is particularly useful for small datasets that show inconsistencies across studies, as is the case with gene expression studies in schizophrenia. Although I have contributed the largest meta-analyses of schizophrenia both for differential expression and coexpression, each could benefit from the inclusion of more datasets. Our coexpression networks could be substantially improved with more datasets of larger sample size. Aggregating across a larger number of datasets has been shown to result in networks more comparable to PPI networks [182, 246], and is likely to give more reliable interactions. As more schizophrenia expression profiling data become available, they can be incorporated into the meta-analytical frameworks developed in my thesis to contribute statistical power and help refine my results.

5.4 Interpretation of Findings

There are many hypotheses about the underlying etiology of schizophrenia. Past and present research in the field demonstrate disturbances at different levels of brain functioning which translate into putative pathologies of schizophrenia. The research conducted in my thesis exploits different meta-analytical approaches for interrogating expression alterations in schizophrenia. The results demonstrate deficits at different levels of cell functioning which are suggestive of neuropathological alterations in the prefrontal

129

cortex of individuals with schizophrenia. In this section, I attempt to weave together speculative thoughts with evidence from the literature to provide a plausible biological interpretation of my findings.

Schizophrenia has been described as a disease of the synapse in which the fundamental pathology involves a convergence of factors leading to dysfunction of synaptic transmission [288]. In support with this theory, our meta-signature of differentially expressed ‘schizophrenia genes’ also included several genes related to synapse function. Thus, one would assume that a loss of communication between neurons may be resulting from of a decrease in the total number of neurons. However, morphometric studies of postmortem brain report that this is not the case, and rather there is a consensus for increased neuronal densities [289, 290]. The increased neuronal density results from a reduction in the overall volume of tissue, with no loss of neurons but a reduction in the neuropil (i.e. dendrites, axon terminals, synapses, glial cell processes and microvasculature).

In addition to synaptic functioning, the genes identified by differential expression in Chapter 3 involve a number of other biological processes. The relationship among these genes is still not clear but with the use of coexpression analysis as a complementary approach, a possible explanation emerged. Clustering of the coexpression networks revealed a neuron marker-enriched ‘oxidative phosphorylation’ module that was highly conserved between the control and schizophrenia brain. High conservation of the module between networks implies that the expression of genes in this module must remain highly correlated.

More than half of the down-regulated genes in our schizophrenia meta-signature were in this module in both control and schizophrenia suggesting down-regulation of the ‘oxidative phosphorylation’ module.

This is further supported by enrichment of ‘oxidative phosphorylation’ GO terms in the down-regulated meta-signature, indicating related genes are down-regulated but changes are not significant. Oxidative stress has been suggested to contribute to the pathophysiology of schizophrenia through mechanisms that likely involve aberrant inflammatory responses, mitochondrial dysfunction, hypoactive NMDA receptors and oligodendrocyte abnormalities [291]. Oxidative stress occurs when cellular antioxidant defense mechanisms fail to counterbalance endogenous reactive oxygen species (ROS) and reactive nitrogen species (RNS) generated from normal oxidative metabolism. Excessive amounts of ROS and

130

RNS can induce reactions that have detrimental effects to brain function. For instance, peroxide and hydroxyl radical are thought to react with the polyunsaturated fatty acids that are present in myelin sheaths, directly triggering demyelination [292]. In vitro studies using purified myelin have also demonstrated the vulnerability of myelin to oxidative stress [293].

There are numerous lines of evidence for myelin-related dysfunction in schizophrenia [294] including. imaging and neurocytochemical evidence [81, 295], similarities with demyelinating diseases (e.g. metachromatic leukodystrophy [296]), myelin-related gene abnormalities [106] and morphologic abnormalities in the oligodendrocytes [297]. Cluster analysis of coexpression networks in Chapter 4 of my thesis also supports this idea. A ‘myelination-related’ module was identified in both the control and schizophrenia brain; and while this module was mostly conserved there was a substantial loss of coexpression with the oligodendrocyte marker genes in schizophrenia. Thus, expression levels of oligodendrocyte marker genes are altered in schizophrenia such that they are no longer coexpressed with other genes in the module. A possible interpretation of these results is that the oligodendroglial dysfunction is a secondary event that results from an insult imposed by oxidative stress (demonstrated by down-regulation of the ‘oxidative phosphorylation’ module). The strength of the synapses is reinforced by the conductance of the axon provided by myelin sheath, keeping with the notion of schizophrenia as a disease of the synapse.

The loss of oligodendrocyte marker coexpression from the ‘myelination-related’ module was coupled with increased coexpression of astrocyte markers in schizophrenia. Astrocytes have a major influence on remyelination as demonstrated by the observation that oligodendrocytes preferentially remyelinate axons in areas containing astrocytes [281], and transplantation of astrocytes into demyelinated lesions enhanced endogenous remyelination [282]. A possible explanation for this is that in response to oligodendrocyte dysfunction in schizophrenia, the astrocytes exhibit expression patterns in concert with genes of the myelination module because they are being recruited in a partial attempt re-myelinate and restore synaptic function.

131

Taken together, the findings from my dissertation suggest a cascade of dysfunction at the molecular level which might lead to aberrant synaptic transmission in schizophrenia. Again, we are faced with the cause and effect dilemma. The cascade begins with oxidative stress; but we are unable to determine whether this is a cause of schizophrenia or a downstream effect. If we accept the latter, we must also consider that oxidative stress may be triggered by extraneous factors rather than the disease itself. For example, several studies of rat brain have demonstrated that chronic administration of antipsychotics (i.e. haloperidol, and clozapine) can induce oxidative damage [298, 299]. It is generally the case that for the populations used in these studies, most or all of the schizophrenia subjects have received antipsychotics.

In our study we were unable to control for the effects of antipsychotics as we were unable to obtain this information for all subjects. Thus, it remains to be determined whether the hypothesized oxidative damage is exclusively a disease effect, exclusively a medication effect, or a combination of both whereby the use of antipsychotics impose an additional burden to cells.

An important aspect of my proposed interpretation is that it is based on differential coexpression patterns observed in the brains of controls and individuals with schizophrenia. A complex illness such as schizophrenia requires a systems level perspective, and as such I have provided an interpretation by which at least some aspects of the pathophysiology of schizophrenia can be explained.

5.5 Potential Applications and Future Directions

In this thesis I have presented results obtained from the integration of data across large-scale gene expression datasets. I demonstrated the potential of combining data across studies to provide a ‘big picture’ perspective on molecular abnormalities associated with schizophrenia. Notably, incorporating additional data from independent cohorts will be a good source of validation for the changes identified.

The meta-analytical approaches described in my thesis are primarily focused on microarray data. Thus, another area for future research will be the modification of these approaches to facilitate their application to sequence-based data.

132

The biological interpretation of my findings is limited to the information that can be harnessed from gene expression data. I believe that that the integration of other large-scale data types will be an important means of bridging these knowledge gaps. By coupling genome sequence information with gene expression profiling it may be possible to correlate changes in expression with nearby or distant polymorphisms and mutations that reside in regulatory regions. Expression quantitative trait locus (eQTL) analysis is a major advance in integrating large-scale genomic and genetic datasets to evaluate the consequences of genetic variation on gene expression [300]. The rationale for this approach is that expression levels are viewed as quantitative traits and these gene expression phenotypes can be mapped to particular genomic loci. In the context of schizophrenia, there are groups that have undertaken the task of investigating whether schizophrenia risk is mediated in part by common variants that influence gene expression [301-303]. If we were able to obtain SNP data for the cohorts used in our analyses, it would be particularly interesting to investigate possible disease associated polymorphic loci that modulate expression changes of the genes identified from my analysis.

The integration of my results with animal model studies will be instrumental in directly linking gene expression changes to observable phenotypes in brain tissue and behaviour. The meta-signature of

‘schizophrenia genes’ identified in this work represents expression changes that are consistent across individuals with the disease and reflect an underlying pathophysiology. To assess the impact of individual gene expression changes, one could mimic the change by creating knock-down or knock-in mouse models and evaluate the subsequent behavioural and neuropathological abnormalities. A recent study demonstrating the utility of this approach evaluated reduced expression levels of GAD1 in interneurons expressing NPY in mice [304].

Another promising avenue in psychiatric research, is that it is now possible to directly reprogram fibroblasts from affected patients into human induced pluripotent stem cells (hiPSCs) and subsequently differentiate these disorder-specific hiPSCs into neurons [305]. These neurons have been shown to exhibit schizophrenia-specific cellular phenotypes such as diminished neuronal connectivity, decreased neurite number, PSD95-protein levels and glutamate receptor expression [306]. As schizophrenia is

133

thought to be a neurodevelopmental disorder, the use of hiPSCs will be particularly useful in observing the abnormal development of neurons in vitro. One could also use these cells to examine specific synaptic defects that are believed to contribute to the illness. With genetic backgrounds that are known to result in schizophrenia, this group of neurons can be considered a close representation of neurons in the brain. Expression changes observed with these cells are therefore free of the limitations of postmortem studies, but are also exposed to an environment which is not entirely representative of in vivo conditions.

A comparison of expression signatures obtained from hiPSC neurons against our meta-signatures obtained from postmortem brain will illustrate similarities and differences between the two appraoches.

Many genes have been identified in schizophrenia and many biological pathways that link these genes.

There is a need for the concerted integration across the genetic and genomic studies in human and animal models to provide a more complete understanding of the underlying pathophysiology of schizophrenia. The overarching goal of future research in schizophrenia should strive towards identifying convergence at the level of molecular mechanisms. While the results of the meta-analyses described in this thesis stand alone, they are also a key component in helping accomplish this goal.

134

References

1. Luo, Z. and D.H. Geschwind, Microarray applications in neuroscience. Neurobiol Dis, 2001. 8(2): p. 183-93. 2. Mirnics, K., et al., Analysis of complex brain disorders with gene expression microarrays: schizophrenia as a disease of the synapse. Trends Neurosci, 2001. 24(8): p. 479-86. 3. Colantuoni, C., et al., Temporal dynamics and genetic control of transcription in the human prefrontal cortex. Nature, 2011. 478(7370): p. 519-23. 4. Kang, H.J., et al., Spatio-temporal transcriptome of the human brain. Nature, 2011. 478(7370): p. 483-9. 5. Owen, M.J., N. Craddock, and M.C. O'Donovan, Schizophrenia: genes at last? Trends Genet, 2005. 21(9): p. 518-25. 6. Plomin, R., M.J. Owen, and P. McGuffin, The genetic basis of complex human behaviors. Science, 1994. 264(5166): p. 1733-9. 7. Choi, K.H., et al., Putative psychosis genes in the prefrontal cortex: combined analysis of gene expression microarrays. BMC Psychiatry, 2008. 8: p. 87. 8. de Magalhaes, J.P., J. Curado, and G.M. Church, Meta-analysis of age-related gene expression profiles identifies common signatures of aging. Bioinformatics, 2009. 25(7): p. 875-81. 9. Elashoff, M., et al., Meta-analysis of 12 genomic studies in bipolar disorder. J Mol Neurosci, 2007. 31(3): p. 221-43. 10. Gaiteri, C. and E. Sibille, Differentially expressed genes in major depression reside on the periphery of resilient gene coexpression networks. Front Neurosci, 2011. 5: p. 95. 11. Miller, J.A., S. Horvath, and D.H. Geschwind, Divergence of human and mouse brain transcriptome highlights Alzheimer disease pathways. Proc Natl Acad Sci U S A, 2010. 107(28): p. 12698-703. 12. Oldham, M.C., et al., Functional organization of the transcriptome in human brain. Nat Neurosci, 2008. 11(11): p. 1271-82. 13. Torkamani, A., et al., Coexpression network analysis of neural tissue reveals perturbations in developmental processes in schizophrenia. Genome Res, 2010. 20(4): p. 403-12. 14. Gould, T.D. and H.K. Manji, The molecular medicine revolution and psychiatry: bridging the gap between basic neuroscience research and clinical psychiatry. J Clin Psychiatry, 2004. 65(5): p. 598-604. 15. Uhl, G.R. and R.W. Grow, The burden of complex genetics in brain disorders. Arch Gen Psychiatry, 2004. 61(3): p. 223-9. 16. Jablensky, A., Epidemiology of schizophrenia: the global burden of disease and disability. Eur Arch Psychiatry Clin Neurosci, 2000. 250(6): p. 274-85. 17. Association, A.P., Diagnostic and statistical manual of mental disorders (4th ed., text rev.)2000, Washington, DC. 18. Green, M.F., et al., Approaching a consensus cognitive battery for clinical trials in schizophrenia: the NIMH-MATRICS conference to select cognitive domains and test criteria. Biol Psychiatry, 2004. 56(5): p. 301-7. 19. Kuperberg, G. and S. Heckers, Schizophrenia and cognitive function. Curr Opin Neurobiol, 2000. 10(2): p. 205-10. 135

20. Hafner, H., Gender differences in schizophrenia. Psychoneuroendocrinology, 2003. 28 Suppl 2: p. 17-54. 21. Goldner, E.M., et al., Prevalence and incidence studies of schizophrenic disorders: a systematic review of the literature. Can J Psychiatry, 2002. 47(9): p. 833-43. 22. Carlsson, A., N. Waters, and M.L. Carlsson, Neurotransmitter interactions in schizophrenia-- therapeutic implications. Biol Psychiatry, 1999. 46(10): p. 1388-95. 23. Palermo-Neto, J., Dopaminergic systems. Dopamine receptors. Psychiatr Clin North Am, 1997. 20(4): p. 705-21. 24. Carlsson, A., The current status of the dopamine hypothesis of schizophrenia. Neuropsychopharmacology, 1988. 1(3): p. 179-86. 25. Castner, S.A. and P.S. Goldman-Rakic, Enhancement of working memory in aged monkeys by a sensitizing regimen of dopamine D1 receptor stimulation. J Neurosci, 2004. 24(6): p. 1446-50. 26. Goldman-Rakic, P.S., E.C. Muly, 3rd, and G.V. Williams, D(1) receptors in prefrontal cells and circuits. Brain Res Brain Res Rev, 2000. 31(2-3): p. 295-301. 27. Abi-Dargham, A., et al., Prefrontal dopamine D1 receptors and working memory in schizophrenia. J Neurosci, 2002. 22(9): p. 3708-19. 28. Karlsson, P., et al., PET study of D(1) dopamine receptor binding in neuroleptic-naive patients with schizophrenia. Am J Psychiatry, 2002. 159(5): p. 761-7. 29. Weinberger, D.R., Implications of normal brain development for the pathogenesis of schizophrenia. Arch Gen Psychiatry, 1987. 44(7): p. 660-9. 30. Tsai, G. and J.T. Coyle, Glutamatergic mechanisms in schizophrenia. Annu Rev Pharmacol Toxicol, 2002. 42: p. 165-79. 31. Krystal, J.H., et al., Subanesthetic effects of the noncompetitive NMDA antagonist, ketamine, in humans. Psychotomimetic, perceptual, cognitive, and neuroendocrine responses. Arch Gen Psychiatry, 1994. 51(3): p. 199-214. 32. Luby, E.D., et al., Model psychoses and schizophrenia. Am J Psychiatry, 1962. 119: p. 61-7. 33. Javitt, D.C. and S.R. Zukin, Recent advances in the phencyclidine model of schizophrenia. Am J Psychiatry, 1991. 148(10): p. 1301-8. 34. Lane, H.Y., et al., Sarcosine or D-serine add-on treatment for acute exacerbation of schizophrenia: a randomized, double-blind, placebo-controlled study. Arch Gen Psychiatry, 2005. 62(11): p. 1196-204. 35. Patil, S.T., et al., Activation of mGlu2/3 receptors as a new approach to treat schizophrenia: a randomized Phase 2 clinical trial. Nat Med, 2007. 13(9): p. 1102-7. 36. Benes, F.M., et al., Deficits in small interneurons in prefrontal and cingulate cortices of schizophrenic and schizoaffective patients. Arch Gen Psychiatry, 1991. 48(11): p. 996-1001. 37. Benes, F.M. and S. Berretta, GABAergic interneurons: implications for understanding schizophrenia and bipolar disorder. Neuropsychopharmacology, 2001. 25(1): p. 1-27. 38. Akbarian, S., et al., Gene expression for glutamic acid decarboxylase is reduced without loss of neurons in prefrontal cortex of schizophrenics. Arch Gen Psychiatry, 1995. 52(4): p. 258-66. 39. Volk, D.W., et al., Decreased glutamic acid decarboxylase67 messenger RNA expression in a subset of prefrontal cortical gamma-aminobutyric acid neurons in subjects with schizophrenia. Arch Gen Psychiatry, 2000. 57(3): p. 237-45. 40. Kondziella, D., et al., How do glial-neuronal interactions fit into current neurotransmitter hypotheses of schizophrenia? Neurochem Int, 2007. 50(2): p. 291-301. 136

41. Jentsch, J.D. and R.H. Roth, The neuropsychopharmacology of phencyclidine: from NMDA receptor hypofunction to the dopamine hypothesis of schizophrenia. Neuropsychopharmacology, 1999. 20(3): p. 201-25. 42. Jafari, S., F. Fernandez-Enright, and X.F. Huang, Structural contributions of antipsychotic drugs to their therapeutic profiles and metabolic side effects. J Neurochem, 2012. 120(3): p. 371-84. 43. Cardno, A.G. and Gottesman, II, Twin studies of schizophrenia: from bow-and-arrow concordances to star wars Mx and functional genomics. Am J Med Genet, 2000. 97(1): p. 12-7. 44. Cardno, A.G., et al., Heritability estimates for psychotic disorders: the Maudsley twin psychosis series. Arch Gen Psychiatry, 1999. 56(2): p. 162-8. 45. Kety, S.S., The significance of genetic factors in the etiology of schizophrenia: results from the national study of adoptees in Denmark. J Psychiatr Res, 1987. 21(4): p. 423-9. 46. Chakravarti, A., Population genetics--making sense out of sequence. Nat Genet, 1999. 21(1 Suppl): p. 56-60. 47. McClellan, J.M., E. Susser, and M.C. King, Schizophrenia: a common disease caused by multiple rare alleles. Br J Psychiatry, 2007. 190: p. 194-9. 48. Straub, R.E., et al., Genetic variation in the 6p22.3 gene DTNBP1, the human ortholog of the mouse dysbindin gene, is associated with schizophrenia. Am J Hum Genet, 2002. 71(2): p. 337- 48. 49. Funke, B., et al., Association of the DTNBP1 locus with schizophrenia in a U.S. population. Am J Hum Genet, 2004. 75(5): p. 891-8. 50. Schwab, S.G., et al., Support for association of schizophrenia with genetic variation in the 6p22.3 gene, dysbindin, in sib-pair families with linkage and in an additional sample of triad families. Am J Hum Genet, 2003. 72(1): p. 185-90. 51. Williams, N.M., et al., Identification in 2 independent samples of a novel schizophrenia risk haplotype of the dystrobrevin binding protein gene (DTNBP1). Arch Gen Psychiatry, 2004. 61(4): p. 336-44. 52. Numakawa, T., et al., Evidence of novel neuronal functions of dysbindin, a susceptibility gene for schizophrenia. Hum Mol Genet, 2004. 13(21): p. 2699-708. 53. Stefansson, H., et al., Neuregulin 1 and susceptibility to schizophrenia. Am J Hum Genet, 2002. 71(4): p. 877-92. 54. Harrison, P.J. and A.J. Law, Neuregulin 1 and schizophrenia: genetics, gene expression, and neurobiology. Biol Psychiatry, 2006. 60(2): p. 132-40. 55. Blackwood, D.H., et al., Schizophrenia and affective disorders--cosegregation with a translocation at chromosome 1q42 that directly disrupts brain-expressed genes: clinical and P300 findings in a family. Am J Hum Genet, 2001. 69(2): p. 428-33. 56. Millar, J.K., et al., Disruption of two novel genes by a translocation co-segregating with schizophrenia. Hum Mol Genet, 2000. 9(9): p. 1415-23. 57. Hennah, W., et al., Haplotype transmission analysis provides evidence of association for DISC1 to schizophrenia and suggests sex-dependent effects. Hum Mol Genet, 2003. 12(23): p. 3151-9. 58. Hodgkinson, C.A., et al., Disrupted in schizophrenia 1 (DISC1): association with schizophrenia, schizoaffective disorder, and bipolar disorder. Am J Hum Genet, 2004. 75(5): p. 862-72. 59. Bassett, A.S., et al., 22q11 deletion syndrome in adults with schizophrenia. Am J Med Genet, 1998. 81(4): p. 328-37. 60. Murphy, K.C., L.A. Jones, and M.J. Owen, High rates of schizophrenia in adults with velo-cardio- facial syndrome. Arch Gen Psychiatry, 1999. 56(10): p. 940-5. 137

61. Williams, N.M., et al., Strong evidence that GNB1L is associated with schizophrenia. Hum Mol Genet, 2008. 17(4): p. 555-66. 62. Glaser, B., et al., No association between the putative functional ZDHHC8 single nucleotide polymorphism rs175174 and schizophrenia in large European samples. Biol Psychiatry, 2005. 58(1): p. 78-80. 63. Coon, H., et al., Genomic scan for genes predisposing to schizophrenia. Am J Med Genet, 1994. 54(1): p. 59-71. 64. St Clair, D., Copy number variation and schizophrenia. Schizophr Bull, 2009. 35(1): p. 9-12. 65. Stefansson, H., et al., Common variants conferring risk of schizophrenia. Nature, 2009. 460(7256): p. 744-7. 66. Steinberg, S., et al., Common variants at VRK2 and TCF4 conferring risk of schizophrenia. Hum Mol Genet, 2011. 20(20): p. 4076-81. 67. Li, T., et al., Common variants in major histocompatibility complex region and TCF4 gene are significantly associated with schizophrenia in Han Chinese. Biol Psychiatry, 2010. 68(7): p. 671-3. 68. Austin, J., Schizophrenia: an update and review. J Genet Couns, 2005. 14(5): p. 329-40. 69. Carter, C.J., Schizophrenia susceptibility genes directly implicated in the life cycles of pathogens: cytomegalovirus, influenza, herpes simplex, rubella, and Toxoplasma gondii. Schizophr Bull, 2009. 35(6): p. 1163-82. 70. Harrison, P.J., Schizophrenia susceptibility genes and neurodevelopment. Biol Psychiatry, 2007. 61(10): p. 1119-20. 71. Walsh, T., et al., Rare structural variants disrupt multiple genes in neurodevelopmental pathways in schizophrenia. Science, 2008. 320(5875): p. 539-43. 72. Harrison, P.J., The neuropathology of schizophrenia. A critical review of the data and their interpretation. Brain, 1999. 122 ( Pt 4): p. 593-624. 73. Miller, E.K. and J.D. Cohen, An integrative theory of prefrontal cortex function. Annu Rev Neurosci, 2001. 24: p. 167-202. 74. Paus, T., M. Keshavan, and J.N. Giedd, Why do many psychiatric disorders emerge during adolescence? Nat Rev Neurosci, 2008. 9(12): p. 947-57. 75. Feinberg, I., Schizophrenia: caused by a fault in programmed synaptic elimination during adolescence? J Psychiatr Res, 1982. 17(4): p. 319-34. 76. Keshavan, M.S., S. Anderson, and J.W. Pettegrew, Is schizophrenia due to excessive synaptic pruning in the prefrontal cortex? The Feinberg hypothesis revisited. J Psychiatr Res, 1994. 28(3): p. 239-65. 77. Lawrie, S.M. and S.S. Abukmeil, Brain abnormality in schizophrenia. A systematic and quantitative review of volumetric magnetic resonance imaging studies. Br J Psychiatry, 1998. 172: p. 110-20. 78. Shenton, M.E., et al., A review of MRI findings in schizophrenia. Schizophr Res, 2001. 49(1-2): p. 1-52. 79. Pantelis, C., et al., Structural brain imaging evidence for multiple pathological processes at different stages of brain development in schizophrenia. Schizophr Bull, 2005. 31(3): p. 672-96. 80. Mathalon, D.H., et al., Progressive brain volume changes and the clinical course of schizophrenia in men: a longitudinal magnetic resonance imaging study. Arch Gen Psychiatry, 2001. 58(2): p. 148-57.

138

81. White, T., M. Nelson, and K.O. Lim, Diffusion tensor imaging in psychiatric disorders. Top Magn Reson Imaging, 2008. 19(2): p. 97-109. 82. Kubicki, M., et al., Cingulate fasciculus integrity disruption in schizophrenia: a magnetic resonance diffusion tensor imaging study. Biol Psychiatry, 2003. 54(11): p. 1171-80. 83. Minzenberg, M.J., et al., Meta-analysis of 41 functional neuroimaging studies of executive function in schizophrenia. Arch Gen Psychiatry, 2009. 66(8): p. 811-22. 84. Plum, F., Prospects for research on schizophrenia. 3. Neurophysiology. Neuropathological findings. Neurosci Res Program Bull, 1972. 10(4): p. 384-8. 85. Garey, L.J., et al., Reduced dendritic spine density on cerebral cortical pyramidal neurons in schizophrenia. J Neurol Neurosurg Psychiatry, 1998. 65(4): p. 446-53. 86. Glantz, L.A. and D.A. Lewis, Decreased dendritic spine density on prefrontal cortical pyramidal neurons in schizophrenia. Arch Gen Psychiatry, 2000. 57(1): p. 65-73. 87. Rajkowska, G., L.D. Selemon, and P.S. Goldman-Rakic, Neuronal and glial somal size in the prefrontal cortex: a postmortem morphometric study of schizophrenia and Huntington disease. Arch Gen Psychiatry, 1998. 55(3): p. 215-24. 88. Pakkenberg, B., Post-mortem study of chronic schizophrenic brains. Br J Psychiatry, 1987. 151: p. 744-52. 89. Selemon, L.D. and P.S. Goldman-Rakic, The reduced neuropil hypothesis: a circuit based model of schizophrenia. Biol Psychiatry, 1999. 45(1): p. 17-25. 90. English, J.A., et al., The neuroproteomics of schizophrenia. Biol Psychiatry, 2011. 69(2): p. 163- 72. 91. Akbarian, S., The molecular pathology of schizophrenia--focus on histone and DNA modifications. Brain Res Bull, 2010. 83(3-4): p. 103-7. 92. Roth, T.L., et al., Epigenetic mechanisms in schizophrenia. Biochim Biophys Acta, 2009. 1790(9): p. 869-77. 93. Huang, H.S. and S. Akbarian, GAD1 mRNA expression and DNA methylation in prefrontal cortex of subjects with schizophrenia. PLoS ONE, 2007. 2(8): p. e809. 94. Abdolmaleky, H.M., et al., Hypermethylation of the reelin (RELN) promoter in the brain of schizophrenic patients: a preliminary report. Am J Med Genet B Neuropsychiatr Genet, 2005. 134B(1): p. 60-6. 95. Grayson, D.R., et al., Reelin promoter hypermethylation in schizophrenia. Proc Natl Acad Sci U S A, 2005. 102(26): p. 9341-6. 96. Abdolmaleky, H.M., et al., Hypomethylation of MB-COMT promoter is a major risk factor for schizophrenia and bipolar disorder. Hum Mol Genet, 2006. 15(21): p. 3132-45. 97. Iwamoto, K., et al., DNA methylation status of SOX10 correlates with its downregulation and oligodendrocyte dysfunction in schizophrenia. J Neurosci, 2005. 25(22): p. 5376-81. 98. Mirnics, K., et al., Molecular characterization of schizophrenia viewed by microarray analysis of gene expression in prefrontal cortex. Neuron, 2000. 28(1): p. 53-67. 99. Sequeira, P.A., M.V. Martin, and M.P. Vawter, The first decade and beyond of transcriptional profiling in schizophrenia. Neurobiol Dis, 2012. 45(1): p. 23-36. 100. Vawter, M.P., et al., Reduction of synapsin in the hippocampus of patients with bipolar disorder and schizophrenia. Mol Psychiatry, 2002. 7(6): p. 571-8. 101. Hemby, S.E., et al., Gene expression profile for schizophrenia: discrete neuron transcription patterns in the entorhinal cortex. Arch Gen Psychiatry, 2002. 59(7): p. 631-40. 139

102. Maycox, P.R., et al., Analysis of gene expression in two large schizophrenia cohorts identifies multiple changes associated with nerve terminal function. Mol Psychiatry, 2009. 14(12): p. 1083- 94. 103. Hashimoto, T., et al., Alterations in GABA-related transcriptome in the dorsolateral prefrontal cortex of subjects with schizophrenia. Mol Psychiatry, 2008. 13(2): p. 147-61. 104. Duncan, C.E., et al., Prefrontal GABA(A) receptor alpha-subunit expression in normal postnatal human development and schizophrenia. J Psychiatr Res, 2010. 44(10): p. 673-81. 105. Straub, R.E., et al., Allelic variation in GAD1 (GAD67) is associated with schizophrenia and influences cortical function and gene expression. Mol Psychiatry, 2007. 12(9): p. 854-69. 106. Hakak, Y., et al., Genome-wide expression analysis reveals dysregulation of myelination-related genes in chronic schizophrenia. Proc Natl Acad Sci U S A, 2001. 98(8): p. 4746-51. 107. Katsel, P., et al., Variations in differential gene expression patterns across multiple brain regions in schizophrenia. Schizophr Res, 2005. 77(2-3): p. 241-52. 108. Arion, D., et al., Molecular evidence for increased expression of genes related to immune and chaperone function in the prefrontal cortex in schizophrenia. Biol Psychiatry, 2007. 62(7): p. 711- 21. 109. Saetre, P., et al., Inflammation-related genes up-regulated in schizophrenia brains. BMC Psychiatry, 2007. 7: p. 46. 110. Harrison, P.J., Using our brains: the findings, flaws, and future of postmortem studies of psychiatric disorders. Biol Psychiatry, 2011. 69(2): p. 102-3. 111. Bezzi, P. and A. Volterra, A neuron-glia signalling network in the active brain. Curr Opin Neurobiol, 2001. 11(3): p. 387-94. 112. Arion, D., et al., Infragranular gene expression disturbances in the prefrontal cortex in schizophrenia: signature of altered neural development? Neurobiol Dis, 2010. 37(3): p. 738-46. 113. Benes, F.M., et al., Regulation of the GABA cell phenotype in hippocampus of schizophrenics and bipolars. Proc Natl Acad Sci U S A, 2007. 104(24): p. 10164-9. 114. Harris, L.W., et al., The cerebral microvasculature in schizophrenia: a laser capture microdissection study. PLoS ONE, 2008. 3(12): p. e3964. 115. Emmert-Buck, M.R., et al., Laser capture microdissection. Science, 1996. 274(5289): p. 998- 1001. 116. Harrison, P.J., et al., The relative importance of premortem acidosis and postmortem interval for human brain gene expression studies: selective mRNA vulnerability and comparison with their encoded proteins. Neurosci Lett, 1995. 200(3): p. 151-4. 117. Kingsbury, A.E., et al., Tissue pH as an indicator of mRNA preservation in human post-mortem brain. Brain Res Mol Brain Res, 1995. 28(2): p. 311-8. 118. Lipska, B.K., et al., Critical factors in gene expression in postmortem human brain: Focus on studies in schizophrenia. Biol Psychiatry, 2006. 60(6): p. 650-8. 119. Bahn, S., et al., Gene expression profiling in the post-mortem human brain--no cause for dismay. J Chem Neuroanat, 2001. 22(1-2): p. 79-94. 120. Atz, M., et al., Methodological considerations for gene expression profiling of human brain. J Neurosci Methods, 2007. 163(2): p. 295-309. 121. Deep-Soboslay, A., et al., Psychiatric brain banking: three perspectives on current trends and future directions. Biol Psychiatry, 2011. 69(2): p. 104-12.

140

122. Haroutunian, V., et al., The human homolog of the QKI gene affected in the severe dysmyelination "quaking" mouse phenotype: downregulated in multiple brain regions in schizophrenia. Am J Psychiatry, 2006. 163(10): p. 1834-7. 123. Oni-Orisan, A., et al., Altered vesicular glutamate transporter expression in the anterior cingulate cortex in schizophrenia. Biol Psychiatry, 2008. 63(8): p. 766-75. 124. Li, J.Z., et al., Systematic changes in gene expression in postmortem human brains associated with tissue pH and terminal medical conditions. Hum Mol Genet, 2004. 13(6): p. 609-16. 125. Tomita, H., et al., Effect of agonal and postmortem factors on gene expression profile: quality control in microarray analyses of postmortem human brain. Biol Psychiatry, 2004. 55(4): p. 346- 52. 126. Li, J.Z., et al., Sample matching by inferred agonal stress in gene expression analyses of the brain. BMC Genomics, 2007. 8: p. 336. 127. Lu, T., et al., Gene regulation and DNA damage in the ageing human brain. Nature, 2004. 429(6994): p. 883-91. 128. Vawter, M.P., et al., Gender-specific gene expression in post-mortem human brain: localization to sex . Neuropsychopharmacology, 2004. 29(2): p. 373-84. 129. Mirnics, K. and J. Pevsner, Progress in the use of microarray technology to study the neurobiology of disease. Nat Neurosci, 2004. 7(5): p. 434-9. 130. Velculescu, V.E., et al., Serial analysis of gene expression. Science, 1995. 270(5235): p. 484-7. 131. Brenner, S., et al., Gene expression analysis by massively parallel signature sequencing (MPSS) on microbead arrays. Nat Biotechnol, 2000. 18(6): p. 630-4. 132. Lee, J., et al., Effects of RNA degradation on gene expression analysis of human postmortem tissues. FASEB J, 2005. 19(10): p. 1356-8. 133. Popova, T., et al., Effect of RNA quality on transcript intensity levels in microarray analysis of human post-mortem brain tissues. BMC Genomics, 2008. 9: p. 91. 134. Schroeder, A., et al., The RIN: an RNA integrity number for assigning integrity values to RNA measurements. BMC Mol Biol, 2006. 7: p. 3. 135. Mei, R., et al., Probe selection for high-density oligonucleotide arrays. Proc Natl Acad Sci U S A, 2003. 100(20): p. 11237-42. 136. Flikka, K., et al., XHM: a system for detection of potential cross hybridizations in DNA microarrays. BMC Bioinformatics, 2004. 5: p. 117. 137. Shi, L., et al., The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. Nat Biotechnol, 2006. 24(9): p. 1151-61. 138. Hubbell, E., W.M. Liu, and R. Mei, Robust estimators for expression analysis. Bioinformatics, 2002. 18(12): p. 1585-92. 139. Irizarry, R.A., et al., Summaries of Affymetrix GeneChip probe level data. Nucleic Acids Res, 2003. 31(4): p. e15. 140. Wu Z, I.R., Gentleman R, Martinez-Murillo F, Spencer F, A Model-Based Background Adjustment for Oligonucleotide Expression Arrays. Journal of the American Statistical Association, 2004. 99(9): p. 909-917. 141. Leek, J.T., et al., Tackling the widespread and critical impact of batch effects in high-throughput data. Nat Rev Genet, 2010. 11(10): p. 733-9. 142. Benjamini, Y.H., Y., Controlling the False Discovery Rate: a Practical and Powerful Approach to Multiple Testing. J. R. Statist. Soc. B, 1995. 57(1): p. 11. 141

143. Storey, J.D. and R. Tibshirani, Statistical significance for genomewide studies. Proc Natl Acad Sci U S A, 2003. 100(16): p. 9440-5. 144. Borate, B.R., et al., Comparison of threshold selection methods for microarray gene co- expression matrices. BMC Res Notes, 2009. 2: p. 240. 145. Ruan, J., A.K. Dean, and W. Zhang, A general co-expression network-based approach to gene expression analysis: comparison and applications. BMC Syst Biol, 2010. 4: p. 8. 146. Elo, L.L., et al., Systematic construction of gene coexpression networks with applications to human T helper cell differentiation process. Bioinformatics, 2007. 23(16): p. 2096-103. 147. Lee, H.K., et al., Coexpression analysis of human genes across many microarray data sets. Genome Res, 2004. 14(6): p. 1085-94. 148. Zhang, B. and S. Horvath, A general framework for weighted gene co-expression network analysis. Stat Appl Genet Mol Biol, 2005. 4: p. Article17. 149. Oliver, S., Guilt-by-association goes global. Nature, 2000. 403(6770): p. 601-3. 150. A.L., A.R.J.H.B., Internet: Diameter of the World-Wide Web. Nature, 1999. 401: p. 2. 151. Watts, D.J. and S.H. Strogatz, Collective dynamics of 'small-world' networks. Nature, 1998. 393(6684): p. 440-2. 152. Dijkstra, E.W., A note on two problems in connexion with graphs. Numerische Mathematik, 1959. 1: p. 269-271. 153. Jordan, I.K., et al., Conservation and coevolution in the scale-free human gene coexpression network. Mol Biol Evol, 2004. 21(11): p. 2058-70. 154. Tsaparas, P., et al., Global similarity and local divergence in human and mouse gene co- expression networks. BMC Evol Biol, 2006. 6: p. 70. 155. van Noort, V., B. Snel, and M.A. Huynen, The yeast coexpression network has a small-world, scale-free architecture and can be explained by a simple model. EMBO Rep, 2004. 5(3): p. 280- 4. 156. Stumpf, M.P. and M.A. Porter, Mathematics. Critical truths about power laws. Science, 2012. 335(6069): p. 665-6. 157. Hanisch, D., et al., Co-clustering of biological networks and gene expression data. Bioinformatics, 2002. 18 Suppl 1: p. S145-54. 158. Ideker, T., et al., Discovering regulatory and signalling circuits in molecular interaction networks. Bioinformatics, 2002. 18 Suppl 1: p. S233-40. 159. Bader, G.D. and C.W. Hogue, An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinformatics, 2003. 4: p. 2. 160. Zhou, X., M.C. Kao, and W.H. Wong, Transitive functional annotation by shortest-path analysis of gene expression data. Proc Natl Acad Sci U S A, 2002. 99(20): p. 12783-8. 161. Barrett, T., et al., NCBI GEO: mining tens of millions of expression profiles--database and tools update. Nucleic Acids Res, 2007. 35(Database issue): p. D760-5. 162. Brazma, A., et al., ArrayExpress--a public repository for microarray gene expression data at the EBI. Nucleic Acids Res, 2003. 31(1): p. 68-71. 163. Edgar, R., M. Domrachev, and A.E. Lash, Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res, 2002. 30(1): p. 207-10. 164. Tseng, G.C., D. Ghosh, and E. Feingold, Comprehensive literature review and statistical considerations for microarray meta-analysis. Nucleic Acids Res, 2012.

142

165. Campain, A. and Y.H. Yang, Comparison study of microarray meta-analysis methods. BMC Bioinformatics, 2010. 11: p. 408. 166. Fisher, R.A., Statistical Methods for Research Workers1925, Edinburgh: Oliver and Boyd. 167. Hwang, D., et al., A data integration methodology for systems biology. Proc Natl Acad Sci U S A, 2005. 102(48): p. 17296-301. 168. Parmigiani, G., et al., A cross-study comparison of gene expression studies for the molecular classification of lung cancer. Clin Cancer Res, 2004. 10(9): p. 2922-7. 169. Breitling, R., et al., Rank products: a simple, yet powerful, new method to detect differentially regulated genes in replicated microarray experiments. FEBS Lett, 2004. 573(1-3): p. 83-92. 170. Choi, J.K., et al., Combining multiple microarray studies and modeling interstudy variation. Bioinformatics, 2003. 19 Suppl 1: p. i84-90. 171. Dawany, N.B. and A. Tozeren, Asymmetric microarray data produces gene lists highly predictive of research literature on multiple cancer types. BMC Bioinformatics, 2010. 11: p. 483. 172. Tusher, V.G., R. Tibshirani, and G. Chu, Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci U S A, 2001. 98(9): p. 5116-21. 173. Park, T., et al., Combining multiple microarrays in the presence of controlling variables. Bioinformatics, 2006. 22(14): p. 1682-9. 174. Yi, S.G. and T. Park, Integrated analysis of the heterogeneous microarray data. BMC Bioinformatics, 2011. 12 Suppl 5: p. S3. 175. Yu, T., et al., Dimension reduction and mixed-effects model for microarray meta-analysis of cancer. Front Biosci, 2008. 13: p. 2714-20. 176. Varrault, A., et al., Zac1 regulates an imprinted gene network critically involved in the control of embryonic growth. Dev Cell, 2006. 11(5): p. 711-22. 177. Srivastava, G.P., et al., Identification of transcription factor's targets using tissue-specific transcriptomic data in Arabidopsis thaliana. BMC Syst Biol, 2010. 4 Suppl 2: p. S2. 178. Choi, J.K., et al., Differential coexpression analysis using microarray data and its application to human cancer. Bioinformatics, 2005. 21(24): p. 4348-55. 179. Segal, E., et al., A module map showing conditional activity of expression modules in cancer. Nat Genet, 2004. 36(10): p. 1090-8. 180. Mabbott, N.A., et al., Meta-analysis of lineage-specific gene expression signatures in mouse leukocyte populations. Immunobiology, 2010. 215(9-10): p. 724-36. 181. Ucar, D., et al., Construction of a reference gene association network from multiple profiling data: application to data analysis. Bioinformatics, 2007. 23(20): p. 2716-24. 182. Gillis, J. and P. Pavlidis, The role of indirect connections in gene networks in predicting function. Bioinformatics, 2011. 27(13): p. 1860-6. 183. Voineagu, I., et al., Transcriptomic analysis of autistic brain reveals convergent molecular pathology. Nature, 2011. 474(7351): p. 380-4. 184. Piomelli, D., Cannabinoid activity curtails cocaine craving. Nat Med, 2001. 7(10): p. 1099-100. 185. Mirnics, K., P. Levitt, and D.A. Lewis, Critical appraisal of DNA microarrays in psychiatric genomics. Biol Psychiatry, 2006. 60(2): p. 163-76. 186. Erraji-Benchekroun, L., et al., Molecular aging in human prefrontal cortex is selective and continuous throughout adult life. Biol Psychiatry, 2005. 57(5): p. 549-58.

143

187. Galfalvy, H.C., et al., Sex genes for genomic analysis in human brain: internal controls for comparison of probe level data extraction. BMC Bioinformatics, 2003. 4: p. 37. 188. Reinius, B., et al., An evolutionarily conserved sexual signature in the primate brain. PLoS Genet, 2008. 4(6): p. e1000100. 189. Mexal, S., et al., Brain pH has a significant impact on human postmortem hippocampal gene expression profiles. Brain Res, 2006. 1106(1): p. 1-11. 190. Borozan, I., et al., MAID : an effect size based model for microarray data integration across laboratories and platforms. BMC Bioinformatics, 2008. 9: p. 305. 191. Cahan, P., et al., Meta-analysis of microarray results: challenges, opportunities, and recommendations for standardization. Gene, 2007. 401(1-2): p. 12-8. 192. Rhodes, D.R., et al., Large-scale meta-analysis of cancer microarray data identifies common transcriptional profiles of neoplastic transformation and progression. Proc Natl Acad Sci U S A, 2004. 101(25): p. 9309-14. 193. RDevelopmentCoreTeam, R: A language and environment for statistical computing, 2005, R Foundation for Statistical Computing: Vienna, Austria. 194. Fisher, R.A., Combining independent tests of significance. American Statistician, 1948. 2(3): p. 30. 195. Rhodes, D.R., et al., Meta-analysis of microarrays: interstudy validation of gene expression profiles reveals pathway dysregulation in prostate cancer. Cancer Res, 2002. 62(15): p. 4427-33. 196. Hess, A. and H. Iyer, Fisher's combined p-value for detecting differentially expressed genes using Affymetrix expression arrays. BMC Genomics, 2007. 8: p. 96. 197. Ashburner, M., et al., Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet, 2000. 25(1): p. 25-9. 198. Lee, H.K., et al., ErmineJ: tool for functional analysis of gene expression data sets. BMC Bioinformatics, 2005. 6: p. 269. 199. Vawter, M.P., et al., Mitochondrial-related gene expression changes are sensitive to agonal-pH state: implications for brain disorders. Mol Psychiatry, 2006. 11(7): p. 615, 663-79. 200. Colantuoni, C., et al., Age-related changes in the expression of schizophrenia susceptibility genes in the human prefrontal cortex. Brain Struct Funct, 2008. 201. Hong, M.G., et al., Transcriptome-wide assessment of human brain and lymphocyte senescence. PLoS ONE, 2008. 3(8): p. e3024. 202. Bartke, A., Impact of reduced insulin-like growth factor-1/insulin signaling on aging in mammals: novel findings. Aging Cell, 2008. 7(3): p. 285-90. 203. Oh, S., G.C. Tseng, and E. Sibille, Reciprocal phylogenetic conservation of molecular aging in mouse and human brain. Neurobiol Aging, 2009. 204. Canales, R.D., et al., Evaluation of DNA microarray results with quantitative gene expression platforms. Nat Biotechnol, 2006. 24(9): p. 1115-22. 205. Shippy, R., et al., Using RNA sample titrations to assess microarray platform performance and normalization techniques. Nat Biotechnol, 2006. 24(9): p. 1123-31. 206. Dobbin, K.K., et al., Interlaboratory comparability study of cancer gene expression analysis using oligonucleotide microarrays. Clin Cancer Res, 2005. 11(2 Pt 1): p. 565-72. 207. Irizarry, R.A., et al., Multiple-laboratory comparison of microarray platforms. Nat Methods, 2005. 2(5): p. 345-50.

144

208. Petersen, D., et al., Three microarray platforms: an analysis of their concordance in profiling gene expression. BMC Genomics, 2005. 6(1): p. 63. 209. Pedotti, P., et al., Can subtle changes in gene expression be consistently detected with different microarray platforms? BMC Genomics, 2008. 9: p. 124. 210. Ernst, C., et al., Confirmation of region-specific patterns of gene expression in the human brain. Neurogenetics, 2007. 8(3): p. 219-24. 211. Khaitovich, P., et al., Regional patterns of gene expression in human and chimpanzee brains. Genome Res, 2004. 14(8): p. 1462-73. 212. Roth, R.B., et al., Gene expression analyses reveal molecular relationships among 20 regions of the human CNS. Neurogenetics, 2006. 7(2): p. 67-80. 213. Berchtold, N.C., et al., Gene expression changes in the course of normal brain aging are sexually dimorphic. Proc Natl Acad Sci U S A, 2008. 105(40): p. 15605-10. 214. Johnson, W.E., C. Li, and A. Rabinovic, Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics, 2007. 8(1): p. 118-27. 215. El Idrissi, A., Taurine improves learning and retention in aged mice. Neurosci Lett, 2008. 436(1): p. 19-22. 216. Jiang, C.H., et al., The effects of aging on gene expression in the hypothalamus and cortex of mice. Proc Natl Acad Sci U S A, 2001. 98(4): p. 1930-4. 217. Loerch, P.M., et al., Evolution of the aging brain transcriptome and synaptic regulation. PLoS ONE, 2008. 3(10): p. e3329. 218. Terzi, D., et al., Regulators of G protein signaling in neuropsychiatric disorders. Prog Mol Biol Transl Sci, 2009. 86: p. 299-333. 219. Levitt, P., et al., Making the case for a candidate vulnerability gene in schizophrenia: Convergent evidence for regulator of G-protein signaling 4 (RGS4). Biol Psychiatry, 2006. 60(6): p. 534-7. 220. Falls, D.L., Neuregulins: functions, forms, and signaling strategies. Exp Cell Res, 2003. 284(1): p. 14-30. 221. Corfas, G., K. Roy, and J.D. Buxbaum, Neuregulin 1-erbB signaling and the molecular/cellular basis of schizophrenia. Nat Neurosci, 2004. 7(6): p. 575-80. 222. Cashion, A.B., M.J. Smith, and P.M. Wise, Glutamic acid decarboxylase 67 (GAD67) gene expression in discrete regions of the rostral preoptic area change during the oestrous cycle and with age. J Neuroendocrinol, 2004. 16(8): p. 711-6. 223. Siegmund, K.D., et al., DNA methylation in the human cerebral cortex is dynamically regulated throughout the life span and involves differentiated neurons. PLoS ONE, 2007. 2(9): p. e895. 224. Iwamoto, K., M. Bundo, and T. Kato, Altered expression of mitochondria-related genes in postmortem brains of patients with bipolar disorder or schizophrenia, as revealed by large-scale DNA microarray analysis. Hum Mol Genet, 2005. 14(2): p. 241-53. 225. Pongrac, J., et al., Gene expression profiling with DNA microarrays: advancing our understanding of psychiatric disorders. Neurochemical research, 2002. 27(10): p. 1049-63. 226. Vawter, M.P., et al., Microarray analysis of gene expression in the prefrontal cortex in schizophrenia: a preliminary study. Schizophr Res, 2002. 58(1): p. 11-20. 227. Iwamoto, K. and T. Kato, Gene expression profiling in schizophrenia and related mental disorders. The Neuroscientist : a review journal bringing neurobiology, neurology and psychiatry, 2006. 12(4): p. 349-61.

145

228. Altar, C.A., et al., Deficient hippocampal neuron expression of proteasome, ubiquitin, and mitochondrial genes in multiple schizophrenia cohorts. Biol Psychiatry, 2005. 58(2): p. 85-96. 229. Middleton, F.A., et al., Gene expression profiling reveals alterations of specific metabolic pathways in schizophrenia. The Journal of neuroscience : the official journal of the Society for Neuroscience, 2002. 22(7): p. 2718-29. 230. Aston, C., L. Jiang, and B.P. Sokolov, Microarray analysis of postmortem temporal cortex from patients with schizophrenia. Journal of neuroscience research, 2004. 77(6): p. 858-66. 231. Dracheva, S., et al., Myelin-associated mRNA and protein expression deficits in the anterior cingulate cortex and hippocampus in elderly schizophrenia patients. Neurobiol Dis, 2006. 21(3): p. 531-40. 232. Allen, N.C., et al., Systematic meta-analyses and field synopsis of genetic association studies in schizophrenia: the SzGene database. Nat Genet, 2008. 40(7): p. 827-34. 233. Mathieson, I., M.R. Munafo, and J. Flint, Meta-analysis indicates that common variants at the DISC1 locus are not associated with schizophrenia. Mol Psychiatry, 2011. 234. O'Donovan, M.C., et al., Identification of loci associated with schizophrenia by genome-wide association and follow-up. Nat Genet, 2008. 40(9): p. 1053-5. 235. Mistry, M. and P. Pavlidis, A cross-laboratory comparison of expression profiling data from normal human postmortem brain. Neuroscience, 2010. 167(2): p. 384-95. 236. Choi, K.H., et al., Gene expression and genetic variation data implicate PCLO in bipolar disorder. Biol Psychiatry, 2011. 69(4): p. 353-9. 237. Liu, C., et al., Whole-genome association mapping of gene expression in the human prefrontal cortex. Mol Psychiatry, 2010. 15(8): p. 779-84. 238. Barnes, M., et al., Experimental comparison and cross-validation of the Affymetrix and Illumina gene expression analysis platforms. Nucleic Acids Res, 2005. 33(18): p. 5914-23. 239. Baum, A.E., et al., Meta-analysis of two genome-wide association studies of bipolar disorder reveals important points of agreement. Mol Psychiatry, 2008. 13(5): p. 466-7. 240. Liu, Y., et al., Meta-analysis of genome-wide association data of bipolar disorder and major depressive disorder. Mol Psychiatry, 2011. 16(1): p. 2-4. 241. Leek, J.T. and J.D. Storey, Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS Genet, 2007. 3(9): p. 1724-35. 242. Garbett, K., et al., Transcriptome alterations in the prefrontal cortex of subjects with schizophrenia who committed suicide. Neuropsychopharmacologia Hungarica : a Magyar Pszichofarmakologiai Egyesulet lapja = official journal of the Hungarian Association of Psychopharmacology, 2008. 10(1): p. 9-14. 243. Team, R.D.C., R: A Language and Environment for Statistical Computing, R.F.f.S. Computing, Editor 2011: Vienna, Austria. 244. Bolstad, B.M., et al., A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics, 2003. 19(2): p. 185-93. 245. Sibille, E., et al., A molecular signature of depression in the amygdala. Am J Psychiatry, 2009. 166(9): p. 1011-24. 246. Gillis, J., M. Mistry, and P. Pavlidis, Gene function analysis in complex data sets using ErmineJ. Nature protocols, 2010. 5(6): p. 1148-59. 247. Cahoy, J.D., et al., A transcriptome database for astrocytes, neurons, and oligodendrocytes: a new resource for understanding brain development and function. The Journal of neuroscience : the official journal of the Society for Neuroscience, 2008. 28(1): p. 264-78. 146

248. Chatr-aryamontri, A., et al., MINT: the Molecular INTeraction database. Nucleic Acids Res, 2007. 35(Database issue): p. D572-4. 249. Chua, H.N., W.K. Sung, and L. Wong, Exploiting indirect neighbours and topological weight to predict protein function from protein-protein interactions. Bioinformatics, 2006. 22(13): p. 1623-30. 250. Gilbert, D., Biomolecular interaction network database. Briefings in bioinformatics, 2005. 6(2): p. 194-8. 251. Lynn, D.J., et al., InnateDB: facilitating systems-level analyses of the mammalian innate immune response. Molecular systems biology, 2008. 4: p. 218. 252. Prasad, T.S., K. Kandasamy, and A. Pandey, Human Protein Reference Database and Human Proteinpedia as discovery tools for systems biology. Methods in molecular biology, 2009. 577: p. 67-79. 253. Razick, S., G. Magklaras, and I.M. Donaldson, iRefIndex: a consolidated protein interaction database with provenance. BMC Bioinformatics, 2008. 9: p. 405. 254. Glatt, S.J., et al., Comparative gene expression analysis of blood and brain provides concurrent validation of SELENBP1 up-regulation in schizophrenia. Proc Natl Acad Sci U S A, 2005. 102(43): p. 15533-8. 255. Narayan, S., et al., Molecular profiles of schizophrenia in the CNS at different stages of illness. Brain Res, 2008. 1239: p. 235-48. 256. Prabakaran, S., et al., Mitochondrial dysfunction in schizophrenia: evidence for compromised brain metabolism and oxidative stress. Mol Psychiatry, 2004. 9(7): p. 684-97, 643. 257. Park, E., et al., Regulatory roles of hnRNP M and Nova-1 in the alternative splicing of the dopamine D2 receptor pre-mRNA. The Journal of biological chemistry, 2011. 258. Eyles, D.W., J.J. McGrath, and G.P. Reynolds, Neuronal calcium-binding proteins and schizophrenia. Schizophr Res, 2002. 57(1): p. 27-34. 259. Manji, H.K., G proteins: implications for psychiatry. Am J Psychiatry, 1992. 149(6): p. 746-60. 260. Schwab, S.G., et al., Support for a chromosome 18p locus conferring susceptibility to functional psychoses in families with schizophrenia, by association and linkage analysis. Am J Hum Genet, 1998. 63(4): p. 1139-52. 261. Halim, N.D., et al., Increased lactate levels and reduced pH in postmortem brains of schizophrenics: medication confounds. J Neurosci Methods, 2008. 169(1): p. 208-13. 262. Thomas, E.A., Molecular profiling of antipsychotic drug function: convergent mechanisms in the pathology and treatment of psychiatric disorders. Molecular neurobiology, 2006. 34(2): p. 109-28. 263. Mistry, M., J. Gillis, and P. Pavlidis, Genome-wide expression profiling of schizophrenia using a large combined cohort. Mol Psychiatry, 2012. 264. Spirin, V. and L.A. Mirny, Protein complexes and functional modules in molecular networks. Proc Natl Acad Sci U S A, 2003. 100(21): p. 12123-8. 265. Newman, M.E., Modularity and community structure in networks. Proc Natl Acad Sci U S A, 2006. 103(23): p. 8577-82. 266. Gillis, J. and P. Pavlidis, A methodology for the analysis of differential coexpression across the human lifespan. BMC Bioinformatics, 2009. 10: p. 306. 267. Miller, J.A., M.C. Oldham, and D.H. Geschwind, A systems level analysis of transcriptional changes in Alzheimer's disease and normal aging. J Neurosci, 2008. 28(6): p. 1410-20. 268. Albert, R.J., H.; Barabasi, A.L., Internet: Diameter of the World-Wide Web. Nature, 1999. 401: p. 2. 147

269. I.M., X.-B.R.S., Reshuffling scale-free networks: From random to assortative. Physical Review E, 2004. 70(6). 270. Toro, R., et al., Key role for gene dosage and synaptic homeostasis in autism spectrum disorders. Trends Genet, 2010. 26(8): p. 363-72. 271. Yip, A.M. and S. Horvath, Gene network interconnectedness and the generalized topological overlap measure. BMC Bioinformatics, 2007. 8: p. 22. 272. Smoot, M.E., et al., Cytoscape 2.8: new features for data integration and network visualization. Bioinformatics, 2011. 27(3): p. 431-2. 273. Breslin, T., P. Eden, and M. Krogh, Comparing functional annotation analyses with Catmap. BMC Bioinformatics, 2004. 5: p. 193. 274. Gillis, J. and P. Pavlidis, The impact of multifunctional genes on "guilt by association" analysis. PLoS One, 2011. 6(2): p. e17258. 275. Walterfang, M., et al., Neuropathological, neurogenetic and neuroimaging evidence for white matter pathology in schizophrenia. Neurosci Biobehav Rev, 2006. 30(7): p. 918-48. 276. Hansen, T., et al., Apolipoprotein D is associated with long-term outcome in patients with schizophrenia. Pharmacogenomics J, 2006. 6(2): p. 120-5. 277. Qin, W., et al., A family-based association study of PLP1 and schizophrenia. Neurosci Lett, 2005. 375(3): p. 207-10. 278. Yang, Y.F., et al., Possible association of the MAG locus with schizophrenia in a Chinese Han cohort of family trios. Schizophr Res, 2005. 75(1): p. 11-9. 279. Haroutunian, V., et al., Variations in oligodendrocyte-related gene expression across multiple cortical regions: implications for the pathophysiology of schizophrenia. Int J Neuropsychopharmacol, 2007. 10(4): p. 565-73. 280. Watkins, T.A., et al., Distinct stages of myelination regulated by gamma-secretase and astrocytes in a rapidly myelinating CNS coculture system. Neuron, 2008. 60(4): p. 555-69. 281. Talbott, J.F., et al., Endogenous Nkx2.2+/Olig2+ oligodendrocyte precursor cells fail to remyelinate the demyelinated adult rat spinal cord in the absence of astrocytes. Exp Neurol, 2005. 192(1): p. 11-24. 282. Franklin, R.J., A.J. Crang, and W.F. Blakemore, Transplanted type-1 astrocytes facilitate repair of demyelinating lesions by host oligodendrocytes in adult rat spinal cord. J Neurocytol, 1991. 20(5): p. 420-30. 283. Schnieder, T.P. and A.J. Dwork, Searching for neuropathology: gliosis in schizophrenia. Biol Psychiatry, 2011. 69(2): p. 134-9. 284. Brockmann, K., et al., Succinate in dystrophic white matter: a proton magnetic resonance spectroscopy finding characteristic for complex II deficiency. Ann Neurol, 2002. 52(1): p. 38-46. 285. Xulvi-Brunet, R. and H. Li, Co-expression networks: graph properties and topological comparisons. Bioinformatics, 2010. 26(2): p. 205-14. 286. Marioni, J.C., et al., RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res, 2008. 18(9): p. 1509-17. 287. Arion, D., et al., Molecular markers distinguishing supragranular and infragranular layers in the human prefrontal cortex. Eur J Neurosci, 2007. 25(6): p. 1843-54. 288. Frankle, W.G., J. Lerma, and M. Laruelle, The synaptic hypothesis of schizophrenia. Neuron, 2003. 39(2): p. 205-16.

148

289. Chana, G., et al., Two-dimensional assessment of cytoarchitecture in the anterior cingulate cortex in major depressive disorder, bipolar disorder, and schizophrenia: evidence for decreased neuronal somal size and increased neuronal density. Biol Psychiatry, 2003. 53(12): p. 1086-98. 290. Selemon, L.D., G. Rajkowska, and P.S. Goldman-Rakic, Abnormally high neuronal density in the schizophrenic cortex. A morphometric analysis of prefrontal area 9 and occipital area 17. Arch Gen Psychiatry, 1995. 52(10): p. 805-18; discussion 819-20. 291. Bitanihirwe, B.K. and T.U. Woo, Oxidative stress in schizophrenia: an integrated approach. Neurosci Biobehav Rev, 2011. 35(3): p. 878-93. 292. Halliwell, B., Reactive oxygen species and the central nervous system. J Neurochem, 1992. 59(5): p. 1609-23. 293. Bongarzone, E.R., J.M. Pasquini, and E.F. Soto, Oxidative damage to proteins and lipids of CNS myelin produced by in vitro generated reactive oxygen species. Journal of neuroscience research, 1995. 41(2): p. 213-21. 294. Davis, K.L., et al., White matter changes in schizophrenia: evidence for myelin-related dysfunction. Arch Gen Psychiatry, 2003. 60(5): p. 443-56. 295. Cannon, T.D., et al., Regional gray matter, white matter, and cerebrospinal fluid distributions in schizophrenic patients, their siblings, and controls. Arch Gen Psychiatry, 1998. 55(12): p. 1084- 91. 296. Hyde, T.M., J.C. Ziegler, and D.R. Weinberger, Psychiatric disturbances in metachromatic leukodystrophy. Insights into the neurobiology of psychosis. Arch Neurol, 1992. 49(4): p. 401-6. 297. Uranova, N., et al., Electron microscopy of oligodendroglia in severe mental illness. Brain Res Bull, 2001. 55(5): p. 597-610. 298. Agostinho, F.R., et al., Effects of chronic haloperidol and/or clozapine on oxidative stress parameters in rat brain. Neurochemical research, 2007. 32(8): p. 1343-50. 299. Martins, M.R., et al., Antipsychotic-induced oxidative stress in rat brain. Neurotox Res, 2008. 13(1): p. 63-9. 300. Gilad, Y., S.A. Rifkin, and J.K. Pritchard, Revealing the architecture of gene regulation: the promise of eQTL studies. Trends Genet, 2008. 24(8): p. 408-15. 301. de Jong, S., et al., Expression QTL analysis of top loci from GWAS meta-analysis highlights additional schizophrenia candidate genes. Eur J Hum Genet, 2012. 302. Richards, A.L., et al., Schizophrenia susceptibility alleles are enriched for alleles that affect gene expression in adult human brain. Mol Psychiatry, 2012. 17(2): p. 193-201. 303. Vawter, M.P., F. Mamdani, and F. Macciardi, An integrative functional genomics approach for discovering biomarkers in schizophrenia. Brief Funct Genomics, 2011. 10(6): p. 387-99. 304. Garbett, K.A., et al., Novel animal models for studying complex brain disorders: BAC-driven miRNA-mediated in vivo silencing of gene expression. Mol Psychiatry, 2010. 15(10): p. 987-95. 305. Brennand, K.J., et al., Modeling psychiatric disorders at the cellular and network levels. Mol Psychiatry, 2012. 306. Brennand, K.J., et al., Modelling schizophrenia using human induced pluripotent stem cells. Nature, 2011. 473(7346): p. 221-5.

149

Appendix

Appendix A: ‘Core’ meta-signature lists for age, brain pH, PMI and sex

Age Down-regulated

GeneSymbol GeneName Meta Q-value AAK1 AP2 associated kinase 1 3.48E-04 ACOT8 acyl-CoA thioesterase 8 5.20E-05 ACP1 acid phosphatase 1, soluble 1.09E-04 ACTN2 actinin, alpha 2 7.53E-06 ACTR10 actin-related protein 10 homolog (S. cerevisiae) 5.81E-06 ACTR3B ARP3 actin-related protein 3 homolog B (yeast) 1.74E-03 ACVR1B activin A receptor, type IB 2.35E-06 ADAM23 ADAM metallopeptidase domain 23 1.99E-05 ADAMTSL1 ADAMTS-like 1 4.41E-05 ADCY1 adenylate cyclase 1 (brain) 1.20E-07 ADCY2 adenylate cyclase 2 (brain) 1.18E-09 ADD2 adducin 2 (beta) 6.33E-07 ADRA2A adrenergic, alpha-2A-, receptor 1.28E-04 AGMAT agmatine ureohydrolase () 8.44E-05 v-akt murine thymoma viral oncogene homolog 3 (protein AKT3 kinase B, gamma) 1.44E-08 AL390170 unknown 4.98E-08 ALAS1 aminolevulinate, delta-, synthase 1 3.43E-05 AMACR alpha-methylacyl-CoA racemase 5.43E-05 ANK2 ankyrin 2, neuronal 1.64E-04 ANKRD13C ankyrin repeat domain 13C 5.51E-09 AP1M1 adaptor-related protein complex 1, mu 1 subunit 2.78E-09 AP1S1 adaptor-related protein complex 1, sigma 1 subunit 2.35E-05 AP2M1 adaptor-related protein complex 2, mu 1 subunit 1.39E-05 AP2S1 adaptor-related protein complex 2, sigma 1 subunit 1.65E-06 APC adenomatous polyposis coli 7.15E-07 ARF1 ADP-ribosylation factor 1 1.94E-05 ARF3 ADP-ribosylation factor 3 4.70E-05 ARFIP2 ADP-ribosylation factor interacting protein 2 1.60E-04 ARHGAP20 Rho GTPase activating protein 20 6.85E-05 ARHGEF12 Rho guanine nucleotide exchange factor (GEF) 12 3.41E-05 ARHGEF2 rho/rac guanine nucleotide exchange factor (GEF) 2 1.29E-05 ARHGEF9 Cdc42 guanine nucleotide exchange factor (GEF) 9 1.62E-07 ARL4C ADP-ribosylation factor-like 4C 2.05E-04 ARL6 ADP-ribosylation factor-like 6 2.20E-05 ARPC2 actin related protein 2/3 complex, subunit 2, 34kDa 1.12E-04 ARPC5 actin related protein 2/3 complex, subunit 5, 16kDa 3.45E-04 ARPP-21 cyclic AMP-regulated phosphoprotein, 21 kD 2.79E-05 ARRB2 arrestin, beta 2 9.86E-07 ASTN1 astrotactin 1 2.18E-04 150

GeneSymbol GeneName Meta Q-value ATP2B1 ATPase, Ca++ transporting, plasma membrane 1 3.24E-06 ATP2B2 ATPase, Ca++ transporting, plasma membrane 2 1.04E-04 ATP2C1 ATPase, Ca++ transporting, type 2C, member 1 9.69E-05 ATP synthase, H+ transporting, mitochondrial F0 complex, ATP5F1 subunit B1 8.29E-05 ATP synthase, H+ transporting, mitochondrial F0 complex, ATP5G3 subunit C3 (subunit 9) 1.28E-04 ATP6AP2 ATPase, H+ transporting, lysosomal accessory protein 2 4.26E-07 ATP6V0A1 ATPase, H+ transporting, lysosomal V0 subunit a1 7.96E-06 ATP6V0B ATPase, H+ transporting, lysosomal 21kDa, V0 subunit b 4.15E-05 ATP6V0D1 ATPase, H+ transporting, lysosomal 38kDa, V0 subunit d1 3.38E-04 ATP6V1A ATPase, H+ transporting, lysosomal 70kDa, V1 subunit A 4.24E-05 ATP6V1C1 ATPase, H+ transporting, lysosomal 42kDa, V1 subunit C1 7.44E-10 ATP6V1D ATPase, H+ transporting, lysosomal 34kDa, V1 subunit D 2.40E-04 ATPase, aminophospholipid transporter-like, class I, type 8A, ATP8A2 member 2 1.55E-05 ATRNL1 attractin-like 1 2.37E-08 ATXN1 ataxin 1 1.46E-07 ATXN10 ataxin 10 1.77E-05 AZIN1 antizyme inhibitor 1 1.16E-05 B3GALT6 UDP-Gal:betaGal beta 1,3-galactosyltransferase polypeptide 6 4.86E-06 B3GAT1 beta-1,3-glucuronyltransferase 1 (glucuronosyltransferase P) 5.23E-04 UDP-GlcNAc:betaGal beta-1,3-N- B3GNT1 acetylglucosaminyltransferase 1 5.43E-04 UDP-GlcNAc:betaGal beta-1,3-N- B3GNT2 acetylglucosaminyltransferase 2 5.53E-05 UDP-Gal:betaGlcNAc beta 1,4- galactosyltransferase, B4GALT6 polypeptide 6 8.34E-05 BAD BCL2-associated agonist of cell death 3.55E-04 BASP1 brain abundant, membrane attached signal protein 1 2.30E-07 BBS7 Bardet-Biedl syndrome 7 8.38E-05 BCAP29 B-cell receptor-associated protein 29 1.88E-06 BCKDK branched chain ketoacid dehydrogenase kinase 2.24E-06 BCL11A B-cell CLL/lymphoma 11A (zinc finger protein) 1.29E-05 BDNF brain-derived neurotrophic factor 5.50E-04 BECN1 beclin 1, autophagy related 2.14E-06 BEX1 brain expressed, X-linked 1 2.97E-07 BRAP BRCA1 associated protein 1.07E-04 BRF1 homolog, subunit of RNA polymerase III transcription BRF1 initiation factor IIIB (S. cerevisiae) 5.43E-06 BRF2, subunit of RNA polymerase III transcription initiation BRF2 factor, BRF1-like 4.37E-06 BRP44 brain protein 44 2.77E-04 C10orf46 chromosome 10 open reading frame 46 4.59E-04 C11orf41 chromosome 11 open reading frame 41 2.04E-07 C12orf44 chromosome 12 open reading frame 44 8.89E-05 C14orf138 chromosome 14 open reading frame 138 1.48E-05 C16orf42 chromosome 16 open reading frame 42 5.58E-06 C17orf76 chromosome 17 open reading frame 76 3.21E-05 C18orf1 chromosome 18 open reading frame 1 1.59E-05 C18orf10 chromosome 18 open reading frame 10 1.32E-05 151

GeneSymbol GeneName Meta Q-value C19orf10 chromosome 19 open reading frame 10 2.18E-04 C19orf42 chromosome 19 open reading frame 42 1.49E-05 C1orf31 chromosome 1 open reading frame 31 1.60E-10 C1orf59 chromosome 1 open reading frame 59 7.17E-09 C1orf95 chromosome 1 open reading frame 95 3.93E-04 C1orf96 chromosome 1 open reading frame 96 1.71E-06 C1QTNF4 C1q and tumor necrosis factor related protein 4 3.21E-07 C20orf112 chromosome 20 open reading frame 112 5.20E-05 C5orf13 chromosome 5 open reading frame 13 4.29E-05 C5orf30 chromosome 5 open reading frame 30 2.78E-09 C6orf106 chromosome 6 open reading frame 106 1.00E-05 C6orf153 chromosome 6 open reading frame 153 1.29E-05 C6orf154 chromosome 6 open reading frame 154 4.43E-05 C6orf206 chromosome 6 open reading frame 206 1.10E-04 C7orf44 chromosome 7 open reading frame 44 2.88E-05 C8orf46 open reading frame 46 2.44E-06 C9orf127 open reading frame 127 1.08E-03 C9orf16 chromosome 9 open reading frame 16 7.24E-10 CA10 carbonic anhydrase X 8.96E-09 CA11 carbonic anhydrase XI 3.15E-06 CABP1 calcium binding protein 1 3.54E-04 CACNB1 calcium channel, voltage-dependent, beta 1 subunit 4.71E-06 CACNB2 calcium channel, voltage-dependent, beta 2 subunit 2.06E-05 CACNB3 calcium channel, voltage-dependent, beta 3 subunit 1.29E-06 CACNG3 calcium channel, voltage-dependent, gamma subunit 3 4.18E-04 CALB1 calbindin 1, 28kDa 1.02E-06 CALB2 calbindin 2 1.40E-05 CALM1 calmodulin 1 (phosphorylase kinase, delta) 3.78E-04 CALM3 calmodulin 3 (phosphorylase kinase, delta) 4.30E-05 CALU calumenin 2.35E-06 CAMK1 calcium/calmodulin-dependent protein kinase I 1.31E-07 CAMK2B calcium/calmodulin-dependent protein kinase II beta 1.08E-04 CAMK4 calcium/calmodulin-dependent protein kinase IV 1.47E-04 CAMKK2 calcium/calmodulin-dependent protein kinase kinase 2, beta 5.47E-08 CAMSAP1 calmodulin regulated spectrin-associated protein 1 2.27E-04 CAP2 CAP, adenylate cyclase-associated protein, 2 (yeast) 1.07E-03 CAPRIN1 cell cycle associated protein 1 1.36E-04 CAPZA2 capping protein (actin filament) muscle Z-line, alpha 2 3.27E-05 CARS cysteinyl-tRNA synthetase 3.02E-04 calcium/calmodulin-dependent serine protein kinase (MAGUK CASK family) 9.59E-08 CBLN4 cerebellin 4 precursor 4.51E-06 CBX6 chromobox homolog 6 1.14E-03 CCDC85A coiled-coil domain containing 85A 3.07E-06 CCDC85B coiled-coil domain containing 85B 1.57E-04 CCK cholecystokinin 1.63E-04 CCKBR cholecystokinin B receptor 1.44E-04 CCM2 cerebral cavernous malformation 2 2.05E-05

152

GeneSymbol GeneName Meta Q-value CCNA1 cyclin A1 2.20E-06 CCNC cyclin C 1.69E-04 CCND2 cyclin D2 4.15E-05 CCNG2 cyclin G2 3.27E-05 CCNY cyclin Y 2.75E-06 CCRK cell cycle related kinase 1.09E-04 CD200 CD200 molecule 5.43E-06 CD47 CD47 molecule 1.65E-04 CDC34 cell division cycle 34 homolog (S. cerevisiae) 7.42E-06 CDC40 cell division cycle 40 homolog (S. cerevisiae) 1.46E-07 CDC42 cell division cycle 42 (GTP binding protein, 25kDa) 6.87E-05 CDC42EP3 CDC42 effector protein (Rho GTPase binding) 3 2.09E-04 CDH12 cadherin 12, type 2 (N-cadherin 2) 6.24E-05 CDH13 cadherin 13, H-cadherin (heart) 1.08E-04 CDH8 cadherin 8, type 2 1.22E-07 CDK5 cyclin-dependent kinase 5 2.59E-04 CDK5R1 cyclin-dependent kinase 5, regulatory subunit 1 (p35) 7.58E-06 CDK5R2 cyclin-dependent kinase 5, regulatory subunit 2 (p39) 8.07E-07 CDKL5 cyclin-dependent kinase-like 5 1.93E-04 CDKN2D cyclin-dependent kinase inhibitor 2D (p19, inhibits CDK4) 4.87E-06 CDKN3 cyclin-dependent kinase inhibitor 3 1.09E-06 CDV3 CDV3 homolog (mouse) 4.98E-08 CECR6 cat eye syndrome chromosome region, candidate 6 3.96E-05 CHGB chromogranin B (secretogranin 1) 5.99E-06 CHRM3 cholinergic receptor, muscarinic 3 1.61E-04 CHST2 carbohydrate (N-acetylglucosamine-6-O) sulfotransferase 2 6.75E-06 CINP cyclin-dependent kinase 2-interacting protein 3.41E-09 CKAP4 cytoskeleton-associated protein 4 4.49E-05 CLCN4 chloride channel 4 4.04E-07 CLTA clathrin, light chain (Lca) 8.29E-05 CLTB clathrin, light chain (Lcb) 1.28E-03 CLYBL citrate beta like 3.20E-05 CNIH2 cornichon homolog 2 (Drosophila) 4.04E-06 CNKSR2 connector enhancer of kinase suppressor of Ras 2 2.62E-05 CNR1 cannabinoid receptor 1 (brain) 6.04E-04 COMMD7 COMM domain containing 7 2.08E-05 COP9 constitutive photomorphogenic homolog subunit 4 COPS4 (Arabidopsis) 8.85E-06 COP9 constitutive photomorphogenic homolog subunit 7A COPS7A (Arabidopsis) 4.36E-05 COP9 constitutive photomorphogenic homolog subunit 8 COPS8 (Arabidopsis) 4.31E-05 CPLX3 complexin 3 9.84E-05 CRH corticotropin releasing hormone 3.76E-04 CRHBP corticotropin releasing hormone binding protein 2.14E-06 CRIP2 -rich protein 2 1.17E-04 CRK v-crk sarcoma virus CT10 oncogene homolog (avian) 3.33E-04 CRMP1 collapsin response mediator protein 1 3.35E-05 CRYM crystallin, mu 4.71E-05

153

GeneSymbol GeneName Meta Q-value CSNK2A1 casein kinase 2, alpha 1 polypeptide 3.63E-04 CTSB cathepsin B 5.95E-08 CUGBP1 CUG triplet repeat, RNA binding protein 1 3.22E-04 CUGBP2 CUG triplet repeat, RNA binding protein 2 1.99E-04 CUL2 cullin 2 2.23E-04 CX3CL1 chemokine (C-X3-C motif) ligand 1 1.45E-05 CXADR coxsackie virus and adenovirus receptor 6.13E-05 CYB561D1 cytochrome b-561 domain containing 1 2.34E-04 CYC1 cytochrome c-1 3.87E-04 CYCS cytochrome c, somatic 1.61E-04 CYP26B1 cytochrome P450, family 26, subfamily B, polypeptide 1 2.14E-04 dapper, antagonist of beta-catenin, homolog 3 (Xenopus DACT3 laevis) 4.51E-06 DBC1 deleted in bladder cancer 1 6.31E-09 DCBLD1 discoidin, CUB and LCCL domain containing 1 1.55E-04 DCTN3 dynactin 3 (p22) 1.30E-05 DCN1, defective in cullin neddylation 1, domain containing 5 DCUN1D5 (S. cerevisiae) 2.45E-05 DCX doublecortin 1.56E-04 DDN dendrin 8.43E-05 DDOST dolichyl-diphosphooligosaccharide-protein glycosyltransferase 1.20E-05 DDT D-dopachrome tautomerase 5.47E-05 DENR density-regulated protein 2.64E-05 DGKB diacylglycerol kinase, beta 90kDa 2.97E-07 DGKI diacylglycerol kinase, iota 6.89E-05 DGUOK deoxyguanosine kinase 1.79E-04 DHPS deoxyhypusine synthase 2.29E-06 DIRAS1 DIRAS family, GTP-binding RAS-like 1 2.96E-04 DKFZP564O0823 DKFZP564O0823 protein 3.90E-04 DLAT dihydrolipoamide S-acetyltransferase 2.20E-06 DLG1 discs, large homolog 1 (Drosophila) 8.56E-07 DLG2 discs, large homolog 2 (Drosophila) 1.81E-05 DLG3 discs, large homolog 3 (Drosophila) 7.65E-08 DLG4 discs, large homolog 4 (Drosophila) 1.36E-06 DLGAP1 discs, large (Drosophila) homolog-associated protein 1 6.65E-05 DLGAP2 discs, large (Drosophila) homolog-associated protein 2 5.95E-08 DLX1 distal-less homeobox 1 4.04E-06 DLX2 distal-less homeobox 2 1.98E-06 DMXL2 Dmx-like 2 1.05E-05 DNAJA1 DnaJ (Hsp40) homolog, subfamily A, member 1 9.45E-06 DNM1L dynamin 1-like 2.93E-05 DOK5 docking protein 5 2.58E-06 DOK6 docking protein 6 4.28E-06 DPF1 D4, zinc and double PHD fingers family 1 2.21E-05 DPP6 dipeptidyl-peptidase 6 3.92E-04 DPYSL4 dihydropyrimidinase-like 4 5.19E-04 down-regulator of transcription 1, TBP-binding (negative DR1 2) 1.90E-04 DUSP14 dual specificity phosphatase 14 7.56E-07

154

GeneSymbol GeneName Meta Q-value DUSP3 dual specificity phosphatase 3 7.65E-08 DYNC1I1 dynein, cytoplasmic 1, intermediate chain 1 1.51E-05 DYNC1LI1 dynein, cytoplasmic 1, light intermediate chain 1 1.13E-04 DYRK2 dual-specificity -(Y)-phosphorylation regulated kinase 2 2.62E-04 ECHDC1 enoyl Coenzyme A hydratase domain containing 1 7.05E-07 EDF1 endothelial differentiation-related factor 1 2.64E-08 EEF1A2 eukaryotic translation elongation factor 1 alpha 2 1.52E-07 EFNB3 ephrin-B3 1.98E-04 EGR3 early growth response 3 1.12E-05 EHBP1 EH domain binding protein 1 9.61E-05 EIF2S1 eukaryotic translation initiation factor 2, subunit 1 alpha, 35kDa 3.81E-05 EIF4E2 eukaryotic translation initiation factor 4E family member 2 2.57E-04 ELAV (embryonic lethal, abnormal vision, Drosophila)-like 1 ELAVL1 (Hu antigen R) 1.68E-05 ELOVL family member 6, elongation of long chain fatty acids ELOVL6 (FEN1/Elo2, SUR4/Elo3-like, yeast) 4.49E-05 ENC1 ectodermal-neural cortex (with BTB-like domain) 6.81E-05 ENSA endosulfine alpha 3.56E-05 EPHA4 EPH receptor A4 1.70E-06 EPHA5 EPH receptor A5 4.78E-05 EPHB2 EPH receptor B2 3.83E-04 EPS15 epidermal growth factor receptor pathway substrate 15 8.89E-06 ERC2 ELKS/RAB6-interacting/CAST family member 2 2.04E-05 excision repair cross-complementing rodent repair deficiency, complementation group 1 (includes overlapping antisense ERCC1 sequence) 4.19E-05 ETV1 ets variant 1 2.41E-06 EXOC6 exocyst complex component 6 4.57E-04 EXOSC2 4.29E-05 EXOSC4 2.37E-04 EXPH5 exophilin 5 2.64E-08 EXT1 exostoses (multiple) 1 3.41E-04 fatty acid binding protein 3, muscle and heart (mammary- FABP3 derived growth inhibitor) 5.96E-05 FAM110B family with sequence similarity 110, member B 1.69E-04 FAM131A family with sequence similarity 131, member A 1.75E-04 FAM32A family with sequence similarity 32, member A 1.67E-04 FAM49A family with sequence similarity 49, member A 6.22E-07 FAM5B family with sequence similarity 5, member B 5.95E-05 FAM5C family with sequence similarity 5, member C 5.40E-04 FAM83H family with sequence similarity 83, member H 9.19E-06 FARSA phenylalanyl-tRNA synthetase, alpha subunit 4.37E-05 FBXL16 F-box and leucine-rich repeat protein 16 2.75E-06 FBXL2 F-box and leucine-rich repeat protein 2 2.07E-07 FBXO31 F-box protein 31 4.89E-06 FBXW11 F-box and WD repeat domain containing 11 1.03E-04 FBXW2 F-box and WD repeat domain containing 2 1.12E-04 FBXW5 F-box and WD repeat domain containing 5 5.60E-07 FBXW7 F-box and WD repeat domain containing 7 2.35E-04 FGF12 fibroblast growth factor 12 1.01E-05 155

GeneSymbol GeneName Meta Q-value FGF13 fibroblast growth factor 13 1.33E-04 FJX1 four jointed box 1 (Drosophila) 2.41E-06 FKBP11 FK506 binding protein 11, 19 kDa 9.33E-05 FKBP1B FK506 binding protein 1B, 12.6 kDa 1.40E-03 FLJ11506 alpha- and gamma-adaptin-binding protein p34 5.64E-04 FLJ22536 hypothetical locus LOC401237 1.17E-06 FLJ25076 similar to CG4502-PA 1.70E-06 FRMPD4 FERM and PDZ domain containing 4 2.13E-04 GABBR2 gamma-aminobutyric acid (GABA) B receptor, 2 2.79E-05 GABRA1 gamma-aminobutyric acid (GABA) A receptor, alpha 1 2.48E-06 GAD1 glutamate decarboxylase 1 (brain, 67kDa) 7.83E-07 GAD2 glutamate decarboxylase 2 (pancreatic islets and brain, 65kDa) 3.48E-08 UDP-N-acetyl-alpha-D-galactosamine:polypeptide N- GALNT2 acetylgalactosaminyltransferase 2 (GalNAc-T2) 2.67E-04 GAP43 growth associated protein 43 1.46E-04 GARS glycyl-tRNA synthetase 7.51E-05 GFPT1 -fructose-6-phosphate transaminase 1 1.65E-08 GFRA2 GDNF family receptor alpha 2 1.27E-06 GHITM growth hormone inducible transmembrane protein 9.65E-06 GLRB glycine receptor, beta 4.14E-07 guanine nucleotide binding protein (G protein), alpha activating GNAO1 activity polypeptide O 4.14E-07 guanine nucleotide binding protein (G protein), alpha z GNAZ polypeptide 5.94E-07 guanine nucleotide binding protein (G protein), beta GNB1 polypeptide 1 5.04E-05 GNB5 guanine nucleotide binding protein (G protein), beta 5 2.32E-06 GNG3 guanine nucleotide binding protein (G protein), gamma 3 1.77E-04 GOSR2 golgi SNAP receptor complex member 2 7.52E-09 GPHN gephyrin 5.67E-07 GPI glucose phosphate 3.58E-04 GPM6A glycoprotein M6A 6.49E-05 GPR26 G protein-coupled receptor 26 6.98E-05 GPR6 G protein-coupled receptor 6 3.42E-08 GRB2 growth factor receptor-bound protein 2 4.32E-06 GRIA1 glutamate receptor, ionotropic, AMPA 1 4.94E-05 GRIK2 glutamate receptor, ionotropic, kainate 2 3.80E-04 GRIK5 glutamate receptor, ionotropic, kainate 5 1.20E-05 GRLF1 glucocorticoid receptor DNA binding factor 1 9.01E-05 GRM7 glutamate receptor, metabotropic 7 2.83E-09 GRPEL1 GrpE-like 1, mitochondrial (E. coli) 6.72E-07 GSPT1 G1 to S phase transition 1 6.02E-06 GTF2F1 general transcription factor IIF, polypeptide 1, 74kDa 9.98E-05 GUCY1B3 guanylate cyclase 1, soluble, beta 3 3.00E-05 GUF1 GUF1 GTPase homolog (S. cerevisiae) 2.71E-05 GULP1 GULP, engulfment adaptor PTB domain containing 1 2.42E-05 HAGH hydroxyacylglutathione hydrolase 2.41E-04 HCCS holocytochrome c synthase (cytochrome c heme-lyase) 1.52E-06 1.21E-04 HDGFRP3 hepatoma-derived growth factor, related protein 3 156

GeneSymbol GeneName Meta Q-value

HIST1H2BK histone cluster 1, H2bk 2.50E-04 human immunodeficiency virus type I enhancer binding protein HIVEP2 2 3.05E-06 HLF hepatic leukemia factor 3.48E-08 HMGCLL1 3-hydroxymethyl-3-methylglutaryl-Coenzyme A lyase-like 1 9.17E-08 HMGCR 3-hydroxy-3-methylglutaryl-Coenzyme A reductase 3.57E-08 HMOX2 heme (decycling) 2 7.20E-04 HMP19 HMP19 protein 4.08E-04 HPCA hippocalcin 8.14E-08 HPCAL4 hippocalcin like 4 1.42E-04 HRAS v-Ha-ras Harvey rat sarcoma viral oncogene homolog 1.27E-07 HRASLS HRAS-like suppressor 1.98E-06 HS2ST1 heparan sulfate 2-O-sulfotransferase 1 8.87E-08 HS6ST3 heparan sulfate 6-O-sulfotransferase 3 1.30E-04 HSBP1 heat shock factor binding protein 1 3.15E-06 HSD17B12 hydroxysteroid (17-beta) dehydrogenase 12 1.25E-09 HSPA12A heat shock 70kDa protein 12A 5.19E-07 HSPA4 heat shock 70kDa protein 4 1.23E-06 HTR2A 5-hydroxytryptamine (serotonin) receptor 2A 2.60E-08 HTR2C 5-hydroxytryptamine (serotonin) receptor 2C 1.70E-06 IDH3B isocitrate dehydrogenase 3 (NAD+) beta 1.85E-04 IDS iduronate 2-sulfatase 2.98E-06 IGF1 insulin-like growth factor 1 (somatomedin C) 2.50E-04 ILF3 interleukin enhancer binding factor 3, 90kDa 1.53E-04 IMP4 IMP4, U3 small nucleolar ribonucleoprotein, homolog (yeast) 9.61E-09 INPP4A inositol polyphosphate-4-phosphatase, type I, 107kDa 8.99E-08 IQSEC1 IQ motif and Sec7 domain 1 1.52E-04 ITPKA inositol 1,4,5-trisphosphate 3-kinase A 5.47E-05 JTB jumping translocation breakpoint 1.67E-05 KALRN kalirin, RhoGEF kinase 9.05E-05 KATNB1 katanin p80 (WD repeat containing) subunit B 1 5.00E-06 potassium voltage-gated channel, shaker-related subfamily, KCNAB1 beta member 1 4.38E-07 KCNF1 potassium voltage-gated channel, subfamily F, member 1 6.87E-05 KCNIP1 Kv channel interacting protein 1 7.41E-07 KCNIP3 Kv channel interacting protein 3, calsenilin 1.97E-04 KCNJ3 potassium inwardly-rectifying channel, subfamily J, member 3 1.96E-04 KCNJ6 potassium inwardly-rectifying channel, subfamily J, member 6 2.18E-04 KCNJ9 potassium inwardly-rectifying channel, subfamily J, member 9 5.67E-06 KCNK1 potassium channel, subfamily K, member 1 3.40E-06 KCNK3 potassium channel, subfamily K, member 3 1.66E-05 potassium voltage-gated channel, KQT-like subfamily, member KCNQ2 2 1.77E-04 KIAA0090 KIAA0090 3.15E-06 KIAA0317 KIAA0317 2.45E-05 KIAA1045 KIAA1045 1.01E-06 KIAA1468 KIAA1468 3.65E-04 KIAA1549 KIAA1549 5.64E-06

157

GeneSymbol GeneName Meta Q-value KIF2A kinesin heavy chain member 2A 8.41E-07 KIF3B kinesin family member 3B 3.23E-05 KIF3C kinesin family member 3C 9.82E-06 KIFAP3 kinesin-associated protein 3 8.99E-08 KITLG KIT ligand 1.02E-04 KLF16 Kruppel-like factor 16 3.56E-04 KLHDC5 kelch domain containing 5 1.98E-04 KPNA1 karyopherin alpha 1 (importin alpha 5) 1.39E-05 KPNA6 karyopherin alpha 6 (importin alpha 7) 1.12E-03 KRAS v-Ki-ras2 Kirsten rat sarcoma viral oncogene homolog 2.59E-06 LAGE3 L antigen family, member 3 3.23E-04 LANCL1 LanC lantibiotic synthetase component C-like 1 (bacterial) 7.67E-05 LANCL2 LanC lantibiotic synthetase component C-like 2 (bacterial) 1.38E-05 LARGE like-glycosyltransferase 1.59E-06 LARP5 La ribonucleoprotein domain family, member 5 5.03E-06 LDOC1 leucine zipper, down-regulated in cancer 1 4.01E-06 LDOC1L leucine zipper, down-regulated in cancer 1-like 9.98E-05 LINGO1 leucine rich repeat and Ig domain containing 1 7.66E-06 LINGO2 leucine rich repeat and Ig domain containing 2 8.11E-06 LMO4 LIM domain only 4 1.53E-06 LOC150568 hypothetical LOC150568 1.07E-04 LOC283951 hypothetical protein LOC283951 2.19E-04 LOC552889 hypothetical protein LOC552889 9.32E-06 LPPR4 plasticity related gene 1 8.44E-08 low density lipoprotein receptor-related protein associated LRPAP1 protein 1 1.70E-06 LRRC20 leucine rich repeat containing 20 2.55E-07 LRRC7 leucine rich repeat containing 7 1.16E-06 LRRC8B leucine rich repeat containing 8 family, member B 6.63E-04 LRRTM1 leucine rich repeat transmembrane neuronal 1 2.20E-04 LSM4 homolog, U6 small nuclear RNA associated (S. LSM4 cerevisiae) 4.39E-07 LY6E lymphocyte antigen 6 complex, locus E 1.24E-04 LZTS1 leucine zipper, putative tumor suppressor 1 8.06E-05 MAD2L1BP MAD2L1 binding protein 2.42E-06 MAGED1 melanoma antigen family D, 1 4.52E-05 MAL2 mal, T-cell differentiation protein 2 3.27E-04 MAN1A1 mannosidase, alpha, class 1A, member 1 1.65E-09 MANEAL mannosidase, endo-alpha-like 2.93E-05 MAP1LC3A microtubule-associated protein 1 light chain 3 alpha 5.12E-04 MAP2 microtubule-associated protein 2 3.04E-04 MAP3K13 mitogen-activated protein kinase kinase kinase 13 5.23E-06 MAPK1 mitogen-activated protein kinase 1 1.66E-05 MAPK10 mitogen-activated protein kinase 10 6.65E-07 MAPK11 mitogen-activated protein kinase 11 5.53E-06 MAPK14 mitogen-activated protein kinase 14 2.27E-05 MAPK8 mitogen-activated protein kinase 8 2.48E-04 MAPK9 mitogen-activated protein kinase 9 2.59E-06 MAPKAP1 mitogen-activated protein kinase associated protein 1 1.47E-05 158

GeneSymbol GeneName Meta Q-value MAPT microtubule-associated protein tau 6.70E-04 MARCH4 membrane-associated ring finger (C3HC4) 4 3.43E-04 MAST3 microtubule associated serine/threonine kinase 3 3.84E-04 MCAT malonyl CoA:ACP acyltransferase (mitochondrial) 6.35E-06 MCHR1 -concentrating hormone receptor 1 2.55E-05 MCOLN3 mucolipin 3 8.69E-05 MDH2 malate dehydrogenase 2, NAD (mitochondrial) 2.05E-05 MED6 mediator complex subunit 6 4.31E-05 MEF2A myocyte enhancer factor 2A 1.75E-04 MEF2C myocyte enhancer factor 2C 1.63E-08 MEF2D myocyte enhancer factor 2D 1.02E-07 MFSD3 major facilitator superfamily domain containing 3 4.09E-07 MFSD4 major facilitator superfamily domain containing 4 1.71E-06 myeloid/lymphoid or mixed-lineage leukemia (trithorax MLLT11 homolog, Drosophila); translocated to, 11 1.62E-04 MMD monocyte to macrophage differentiation-associated 1.78E-04 MOAP1 modulator of apoptosis 1 8.12E-04 MPPED1 metallophosphoesterase domain containing 1 3.85E-05 MRPL14 mitochondrial ribosomal protein L14 7.44E-10 MRPL28 mitochondrial ribosomal protein L28 1.06E-05 MRPL33 mitochondrial ribosomal protein L33 7.81E-05 MRPL9 mitochondrial ribosomal protein L9 1.80E-04 MRPS12 mitochondrial ribosomal protein S12 1.02E-07 MSH2 mutS homolog 2, colon cancer, nonpolyposis type 1 (E. coli) 2.38E-04 MTMR9 myotubularin related protein 9 1.15E-08 MTX2 metaxin 2 6.81E-05 MYH10 myosin, heavy chain 10, non-muscle 4.29E-07 MYO16 myosin XVI 1.80E-04 MYO5A myosin VA (heavy chain 12, myoxin) 5.10E-04 NAPA N-ethylmaleimide-sensitive factor attachment protein, alpha 1.75E-04 NAT14 N-acetyltransferase 14 (GCN5-related, putative) 3.13E-06 NAV3 neuron navigator 3 6.61E-05 NBL1 neuroblastoma, suppression of tumorigenicity 1 3.23E-04 NCALD neurocalcin delta 1.93E-05 NCAM1 neural cell adhesion molecule 1 2.62E-04 NDFIP1 Nedd4 family interacting protein 1 2.52E-04 NDN necdin homolog (mouse) 4.65E-05 NDRG3 NDRG family member 3 3.49E-05 NADH dehydrogenase (ubiquinone) Fe-S protein 2, 49kDa NDUFS2 (NADH-coenzyme Q reductase) 2.64E-08 NADH dehydrogenase (ubiquinone) Fe-S protein 4, 18kDa NDUFS4 (NADH-coenzyme Q reductase) 4.19E-04 NDUFV1 NADH dehydrogenase (ubiquinone) 1, 51kDa 3.96E-05 NEFL neurofilament, light polypeptide 3.92E-08 NEGR1 neuronal growth regulator 1 8.34E-05 NELL2 NEL-like 2 (chicken) 6.86E-11 NETO2 neuropilin (NRP) and tolloid (TLL)-like 2 6.94E-04 nuclear factor of kappa light polypeptide gene enhancer in B- NFKBIE cells inhibitor, epsilon 2.61E-07 NHP2L1 NHP2 non-histone chromosome protein 2-like 1 (S. cerevisiae) 2.02E-07 159

GeneSymbol GeneName Meta Q-value NIPSNAP1 nipsnap homolog 1 (C. elegans) 5.29E-04 NLGN4X neuroligin 4, X-linked 1.64E-04 NLK nemo-like kinase 3.74E-07 NMU neuromedin U 1.15E-08 NNAT neuronatin 5.99E-06 NOVA1 neuro-oncological ventral antigen 1 2.04E-05 NPM3 nucleophosmin/nucleoplasmin, 3 2.46E-04 NPTX2 neuronal pentraxin II 1.13E-07 NPTXR neuronal pentraxin receptor 3.42E-06 NRCAM neuronal cell adhesion molecule 2.14E-06 NRG1 neuregulin 1 3.24E-06 NRGN neurogranin (protein kinase C substrate, RC3) 4.18E-04 NRN1 neuritin 1 1.00E-04 NRSN1 neurensin 1 2.38E-04 NSF N-ethylmaleimide-sensitive factor 5.19E-05 NUDCD1 NudC domain containing 1 4.37E-08 NUDT21 nudix (nucleoside diphosphate linked moiety X)-type motif 21 8.01E-06 NXPH1 neurexophilin 1 3.93E-04 OCIAD1 OCIA domain containing 1 3.69E-05 OLFM1 olfactomedin 1 7.98E-06 OPCML opioid binding protein/cell adhesion molecule-like 8.68E-07 OPTN optineurin 1.94E-05 ORC5L origin recognition complex, subunit 5-like (yeast) 1.82E-06 OSBPL1A oxysterol binding protein-like 1A 2.91E-05 OTUB1 OTU domain, ubiquitin aldehyde binding 1 2.38E-04 OTUB2 OTU domain, ubiquitin aldehyde binding 2 1.92E-04 OXCT1 3-oxoacid CoA transferase 1 8.44E-05 P2RX5 purinergic receptor P2X, ligand-gated ion channel, 5 1.02E-07 PACSIN1 protein kinase C and casein kinase substrate in neurons 1 1.02E-07 PAK1 p21 protein (Cdc42/Rac)-activated kinase 1 2.66E-06 PAK6 p21 protein (Cdc42/Rac)-activated kinase 6 3.42E-04 PANK2 pantothenate kinase 2 5.03E-06 PAP2D phosphatidic acid phosphatase type 2 1.15E-07 PARK2 Parkinson disease (autosomal recessive, juvenile) 2, parkin 1.75E-04 PARP2 poly (ADP-ribose) polymerase 2 3.80E-05 PART1 prostate androgen-regulated transcript 1 6.56E-05 PCDH7 protocadherin 7 4.96E-09 PCLO piccolo (presynaptic cytomatrix protein) 6.05E-05 PCMT1 protein-L-isoaspartate (D-aspartate) O-methyltransferase 1.37E-03 PCTK1 PCTAIRE protein kinase 1 1.98E-04 PCTK2 PCTAIRE protein kinase 2 6.43E-05 PDCD2 programmed cell death 2 2.12E-04 PDE2A phosphodiesterase 2A, cGMP-stimulated 3.05E-06 PDIA6 protein disulfide isomerase family A, member 6 5.33E-04 PDK3 pyruvate dehydrogenase kinase, isozyme 3 1.02E-04 PENK proenkephalin 9.06E-05 PFN2 profilin 2 6.19E-05 PGBD5 piggyBac transposable element derived 5 5.33E-04 160

GeneSymbol GeneName Meta Q-value PGM2L1 phosphoglucomutase 2-like 1 8.85E-04 PHF14 PHD finger protein 14 2.29E-05 PHF20L1 PHD finger protein 20-like 1 9.57E-06 PHTF2 putative homeodomain transcription factor 2 2.51E-06 PIAS2 protein inhibitor of activated STAT, 2 1.53E-05 PIN1 peptidylprolyl cis/trans isomerase, NIMA-interacting 1 3.86E-04 PINK1 PTEN induced putative kinase 1 7.59E-05 PIP5K1B phosphatidylinositol-4-phosphate 5-kinase, type I, beta 3.41E-09 PITPNA phosphatidylinositol transfer protein, alpha 1.46E-04 PITPNB phosphatidylinositol transfer protein, beta 4.61E-05 PKIG protein kinase (cAMP-dependent, catalytic) inhibitor gamma 1.17E-06 PKP4 plakophilin 4 3.31E-05 PLCB1 phospholipase C, beta 1 (phosphoinositide-specific) 1.95E-05 PLCL2 phospholipase C-like 2 1.82E-06 PLD3 phospholipase D family, member 3 6.55E-06 pleckstrin homology domain containing, family B (evectins) PLEKHB2 member 2 2.50E-04 PLK2 polo-like kinase 2 (Drosophila) 2.15E-09 PMM1 phosphomannomutase 1 5.37E-11 PNMA1 paraneoplastic antigen MA1 4.31E-05 PNOC prepronociceptin 3.61E-06 POLB polymerase (DNA directed), beta 8.80E-06 POLR1D polymerase (RNA) I polypeptide D, 16kDa 3.91E-09 processing of precursor 4, ribonuclease P/MRP subunit (S. POP4 cerevisiae) 2.78E-04 PPEF1 protein phosphatase, EF-hand calcium binding domain 1 8.62E-06 protein tyrosine phosphatase, receptor type, f polypeptide PPFIA2 (PTPRF), interacting protein (liprin), alpha 2 7.04E-04 PPIE peptidylprolyl isomerase E (cyclophilin E) 2.16E-08 PPM1E protein phosphatase 1E (PP2C domain containing) 1.27E-04 PPP1R14C protein phosphatase 1, regulatory (inhibitor) subunit 14C 6.24E-06 PPP1R1A protein phosphatase 1, regulatory (inhibitor) subunit 1A 5.96E-05 PPP1R7 protein phosphatase 1, regulatory (inhibitor) subunit 7 7.96E-06 protein phosphatase 2 (formerly 2A), regulatory subunit A, PPP2R1A alpha isoform 7.85E-09 protein phosphatase 2 (formerly 2A), regulatory subunit B, PPP2R2C gamma isoform 2.29E-04 PPP2R5C protein phosphatase 2, regulatory subunit B', gamma isoform 1.72E-06 protein phosphatase 3 (formerly 2B), catalytic subunit, beta PPP3CB isoform 3.74E-10 PPP5C protein phosphatase 5, catalytic subunit 7.47E-06 PRAF2 PRA1 domain family, member 2 2.59E-04 PREP prolyl endopeptidase 1.37E-05 PRICKLE2 prickle homolog 2 (Drosophila) 1.73E-07 protein kinase, cAMP-dependent, regulatory, type I, alpha PRKAR1A (tissue specific extinguisher 1) 4.22E-07 PRKCD protein kinase C, delta 2.91E-05 PRKCG protein kinase C, gamma 4.70E-04 PRKCZ protein kinase C, zeta 3.24E-09 PRMT6 protein arginine methyltransferase 6 4.86E-06 PRPS2 phosphoribosyl pyrophosphate synthetase 2 6.27E-06 161

GeneSymbol GeneName Meta Q-value PRR7 rich 7 (synaptic) 1.39E-04 PSMA1 proteasome (prosome, macropain) subunit, alpha type, 1 6.97E-06 PSMB2 proteasome (prosome, macropain) subunit, beta type, 2 4.81E-04 PSMB5 proteasome (prosome, macropain) subunit, beta type, 5 1.60E-10 PSMB6 proteasome (prosome, macropain) subunit, beta type, 6 5.13E-04 PSMB7 proteasome (prosome, macropain) subunit, beta type, 7 1.65E-06 proteasome (prosome, macropain) 26S subunit, non-ATPase, PSMD1 1 5.62E-06 proteasome (prosome, macropain) 26S subunit, non-ATPase, PSMD13 13 1.17E-05 proteasome (prosome, macropain) 26S subunit, non-ATPase, PSMD7 7 3.10E-06 proteasome (prosome, macropain) 26S subunit, non-ATPase, PSMD8 8 7.66E-06 proteasome (prosome, macropain) activator subunit 3 (PA28 PSME3 gamma; Ki) 2.60E-08 PTHLH parathyroid hormone-like hormone 1.44E-05 PTP4A1 protein tyrosine phosphatase type IVA, member 1 1.89E-06 PTPN3 protein tyrosine phosphatase, non-receptor type 3 1.35E-05 PTPRD protein tyrosine phosphatase, receptor type, D 7.42E-06 PTPRF protein tyrosine phosphatase, receptor type, F 5.44E-06 PTPRN protein tyrosine phosphatase, receptor type, N 1.36E-06 PTPRN2 protein tyrosine phosphatase, receptor type, N polypeptide 2 1.12E-05 PTPRO protein tyrosine phosphatase, receptor type, O 7.65E-08 PTPRR protein tyrosine phosphatase, receptor type, R 4.08E-04 PTS 6-pyruvoyltetrahydropterin synthase 4.28E-04 PUM2 pumilio homolog 2 (Drosophila) 3.58E-05 R3HDM1 R3H domain containing 1 3.74E-10 RAB11FIP2 RAB11 family interacting protein 2 (class I) 1.84E-04 RAB11FIP4 RAB11 family interacting protein 4 (class II) 2.01E-07 RAB15 RAB15, member RAS onocogene family 6.88E-06 RAB2A RAB2A, member RAS oncogene family 4.04E-06 RAB33A RAB33A, member RAS oncogene family 3.57E-08 RAB3A RAB3A, member RAS oncogene family 1.84E-13 RAB3B RAB3B, member RAS oncogene family 3.74E-10 RAB40B RAB40B, member RAS oncogene family 2.15E-09 RAB40C RAB40C, member RAS oncogene family 1.92E-04 RAB4A RAB4A, member RAS oncogene family 1.46E-07 RAB6A RAB6A, member RAS oncogene family 8.38E-07 RAB6B RAB6B, member RAS oncogene family 1.84E-04 RABIF RAB interacting factor 7.60E-07 ras-related C3 botulinum toxin substrate 3 (rho family, small RAC3 GTP binding protein Rac3) 2.14E-06 RAD51C RAD51 homolog C (S. cerevisiae) 3.40E-04 RALA v-ral simian leukemia viral oncogene homolog A (ras related) 4.59E-05 RANBP9 RAN binding protein 9 1.22E-04 RAP1GAP RAP1 GTPase activating protein 8.44E-08 RAP1GDS1 RAP1, GTP-GDP dissociation stimulator 1 2.39E-06 RASAL1 RAS protein activator like 1 (GAP1 like) 6.31E-09 RASGRF1 Ras protein-specific guanine nucleotide-releasing factor 1 1.33E-04

162

GeneSymbol GeneName Meta Q-value RASGRP1 RAS guanyl releasing protein 1 (calcium and DAG-regulated) 9.24E-08 RASL10A RAS-like, family 10, member A 1.08E-05 RASL10B RAS-like, family 10, member B 2.32E-06 RBP4 retinol binding protein 4, plasma 3.25E-05 RCN2 reticulocalbin 2, EF-hand calcium binding domain 3.58E-07 REEP1 receptor accessory protein 1 2.10E-04 REEP5 receptor accessory protein 5 1.19E-04 RER1 retention in 1 homolog (S. RER1 cerevisiae) 2.05E-05 RFC3 replication factor C (activator 1) 3, 38kDa 3.25E-06 RFTN1 raftlin, lipid raft linker 1 3.26E-09 RGS12 regulator of G-protein signaling 12 1.15E-05 RGS4 regulator of G-protein signaling 4 1.20E-07 RGS6 regulator of G-protein signaling 6 3.25E-06 RGS7 regulator of G-protein signaling 7 1.17E-05 RHBDD2 rhomboid domain containing 2 9.74E-07 RIMBP2 RIMS binding protein 2 4.67E-04 RIT2 Ras-like without CAAX 2 1.03E-04 RNASEH1 ribonuclease H1 2.27E-06 RNF150 ring finger protein 150 2.26E-05 RNF187 ring finger protein 187 1.24E-07 RPRM reprimo, TP53 dependent G2 arrest mediator candidate 1.84E-08 RPS6KA3 ribosomal protein S6 kinase, 90kDa, polypeptide 3 4.29E-07 RPUSD3 RNA pseudouridylate synthase domain containing 3 1.97E-05 Rtf1, Paf1/RNA polymerase II complex component, homolog RTF1 (S. cerevisiae) 1.25E-05 RTN1 reticulon 1 1.12E-04 RTN2 reticulon 2 2.25E-04 RTN4 reticulon 4 1.34E-04 runt-related transcription factor 1; translocated to, 1 (cyclin D- RUNX1T1 related) 9.03E-07 SAE1 SUMO1 activating enzyme subunit 1 2.41E-07 SARS seryl-tRNA synthetase 2.97E-05 SCAMP1 secretory carrier membrane protein 1 6.37E-08 SCAMP5 secretory carrier membrane protein 5 1.28E-09 SCG2 secretogranin II (chromogranin C) 2.39E-04 SCG5 secretogranin V (7B2 protein) 9.34E-07 SCN2A sodium channel, voltage-gated, type II, alpha subunit 4.78E-04 SCN2B sodium channel, voltage-gated, type II, beta 3.76E-04 SCN3B sodium channel, voltage-gated, type III, beta 1.55E-07 SCN8A sodium channel, voltage gated, type VIII, alpha subunit 1.09E-04 SEC23IP SEC23 interacting protein 3.28E-09 SEPT11 septin 11 3.23E-05 SEPT6 septin 6 6.04E-06 SERINC3 serine incorporator 3 1.94E-05 serpin peptidase inhibitor, clade F (alpha-2 antiplasmin, SERPINF1 pigment epithelium derived factor), member 1 1.13E-07 SERPINI1 serpin peptidase inhibitor, clade I (neuroserpin), member 1 2.85E-04 SEZ6L seizure related 6 homolog (mouse)-like 6.90E-05 SFRS2 splicing factor, arginine/serine-rich 2 1.29E-07 163

GeneSymbol GeneName Meta Q-value SFRS2B splicing factor, arginine/serine-rich 2B 2.32E-05 SGIP1 SH3-domain GRB2-like (endophilin) interacting protein 1 2.41E-07 SKAP2 src kinase associated phosphoprotein 2 2.87E-04 solute carrier family 1 (neuronal/epithelial high affinity SLC1A1 glutamate transporter, system Xag), member 1 3.64E-04 solute carrier family 24 (sodium/potassium/calcium exchanger), SLC24A3 member 3 5.39E-09 solute carrier family 25 (mitochondrial carrier; oxoglutarate SLC25A11 carrier), member 11 5.79E-05 solute carrier family 25 (mitochondrial carrier; adenine SLC25A4 nucleotide translocator), member 4 9.61E-09 SLC25A44 solute carrier family 25, member 44 5.70E-09 SLC25A46 solute carrier family 25, member 46 3.35E-06 SLC27A2 solute carrier family 27 (fatty acid transporter), member 2 7.24E-10 SLC30A3 solute carrier family 30 (zinc transporter), member 3 1.08E-03 SLC30A5 solute carrier family 30 (zinc transporter), member 5 1.10E-07 SLC30A9 solute carrier family 30 (zinc transporter), member 9 8.64E-07 SLC39A3 solute carrier family 39 (zinc transporter), member 3 8.81E-04 SLITRK1 SLIT and NTRK-like family, member 1 6.23E-04 SLITRK5 SLIT and NTRK-like family, member 5 2.02E-05 SMAD3 SMAD family member 3 6.87E-05 SMAP1 small ArfGAP 1 5.31E-04 SWI/SNF related, matrix associated, actin dependent regulator SMARCA2 of chromatin, subfamily a, member 2 6.37E-06 SMURF1 SMAD specific E3 ubiquitin protein 1 1.67E-04 SNAP25 synaptosomal-associated protein, 25kDa 4.94E-05 SNX3 sorting nexin 3 9.01E-05 SNX4 sorting nexin 4 1.08E-04 SOCS2 suppressor of cytokine signaling 2 3.00E-08 SOX11 SRY (sex determining region Y)-box 11 2.41E-06 SPA17 sperm autoantigenic protein 17 1.17E-06 SQLE squalene epoxidase 8.89E-04 steroid-5-alpha-reductase, alpha polypeptide 1 (3-oxo-5 alpha- SRD5A1 steroid delta 4-dehydrogenase alpha 1) 1.38E-06 SRPRB signal recognition particle receptor, B subunit 1.85E-04 SST somatostatin 4.88E-04 SSTR1 somatostatin receptor 1 3.14E-05 STAT1 signal transducer and activator of transcription 1, 91kDa 1.58E-04 STK24 serine/threonine kinase 24 (STE20 homolog, yeast) 1.11E-07 STK25 serine/threonine kinase 25 (STE20 homolog, yeast) 6.59E-04 STK32C serine/threonine kinase 32C 6.66E-09 STMN1 stathmin 1/oncoprotein 18 1.46E-06 STMN3 stathmin-like 3 7.05E-07 STOML1 stomatin (EPB72)-like 1 1.43E-05 STYK1 serine/threonine/tyrosine kinase 1 1.59E-05 SUB1 SUB1 homolog (S. cerevisiae) 9.41E-09 SUCLA2 succinate-CoA ligase, ADP-forming, beta subunit 2.52E-06 SUMO3 SMT3 suppressor of mif two 3 homolog 3 (S. cerevisiae) 1.26E-05 SVOP SV2 related protein homolog (rat) 1.38E-06 SYN2 synapsin II 7.70E-07

164

GeneSymbol GeneName Meta Q-value SYNGR3 synaptogyrin 3 1.28E-05 SYP synaptophysin 5.21E-04 SYT1 synaptotagmin I 1.29E-06 SYT11 synaptotagmin XI 1.62E-07 SYT5 synaptotagmin V 1.01E-03 TAC1 tachykinin, precursor 1 8.89E-06 TAC3 tachykinin 3 4.97E-06 tetratricopeptide repeat, ankyrin repeat and coiled-coil TANC2 containing 2 2.78E-07 TBC1D9 TBC1 domain family, member 9 (with GRAM domain) 5.63E-07 TCP11L1 t-complex 11 (mouse)-like 1 4.70E-06 TFCP2 transcription factor CP2 1.38E-05 TFRC transferrin receptor (p90, CD71) 8.30E-13 THOC7 THO complex 7 homolog (Drosophila) 1.63E-08 of inner mitochondrial membrane 13 homolog TIMM13 (yeast) 4.45E-07 translocase of inner mitochondrial membrane 17 homolog A TIMM17A (yeast) 2.16E-08 TLN2 talin 2 1.65E-06 TM2D2 TM2 domain containing 2 1.89E-06 TM2D3 TM2 domain containing 3 1.93E-06 TMEM121 transmembrane protein 121 9.86E-07 TMEM132B transmembrane protein 132B 5.98E-05 TMEM132D transmembrane protein 132D 2.94E-07 TMEM155 transmembrane protein 155 2.30E-05 TMEM158 transmembrane protein 158 6.72E-05 TMEM160 transmembrane protein 160 1.93E-07 TMEM169 transmembrane protein 169 8.37E-09 TMEM59L transmembrane protein 59-like 2.53E-06 TMEM65 transmembrane protein 65 1.52E-06 TMEM9 transmembrane protein 9 3.18E-05 TNFAIP1 tumor necrosis factor, alpha-induced protein 1 (endothelial) 6.47E-06 TOM1L2 target of myb1-like 2 (chicken) 2.67E-04 TOR1A torsin family 1, member A (torsin A) 9.19E-06 TOX3 TOX high mobility group box family member 3 2.39E-06 TPBG trophoblast glycoprotein 8.69E-05 TPM1 tropomyosin 1 (alpha) 3.18E-05 TRAF5 TNF receptor-associated factor 5 6.13E-05 TRAPPC2L trafficking protein particle complex 2-like 1.67E-04 TRIO triple functional domain (PTPRF interacting) 4.80E-05 transient receptor potential cation channel, subfamily C, TRPC1 member 1 7.19E-08 TSFM Ts translation elongation factor, mitochondrial 2.68E-05 TSPAN17 tetraspanin 17 1.10E-05 TSSC1 tumor suppressing subtransferable candidate 1 5.87E-07 TTC9B tetratricopeptide repeat domain 9B 3.65E-09 TUB tubby homolog (mouse) 2.90E-04 TUBA4A tubulin, alpha 4a 1.51E-05 TUBB2A tubulin, beta 2A 6.28E-05 TUSC3 tumor suppressor candidate 3 6.62E-07 165

GeneSymbol GeneName Meta Q-value UBE2A ubiquitin-conjugating enzyme E2A (RAD6 homolog) 8.42E-07 UBE2B ubiquitin-conjugating enzyme E2B (RAD6 homolog) 7.54E-07 UBE2D2 ubiquitin-conjugating enzyme E2D 2 (UBC4/5 homolog, yeast) 1.16E-05 UBE2G1 ubiquitin-conjugating enzyme E2G 1 (UBC7 homolog, yeast) 2.84E-06 UBE2J1 ubiquitin-conjugating enzyme E2, J1 (UBC6 homolog, yeast) 1.09E-04 UBE2N ubiquitin-conjugating enzyme E2N (UBC13 homolog, yeast) 5.62E-06 UBE2T ubiquitin-conjugating enzyme E2T (putative) 5.04E-05 UBE2V2 ubiquitin-conjugating enzyme E2 variant 2 1.05E-10 UBE3A ubiquitin protein ligase E3A 1.29E-05 UCHL1 ubiquitin carboxyl-terminal esterase L1 (ubiquitin thiolesterase) 5.16E-07 ULK2 unc-51-like kinase 2 (C. elegans) 8.89E-06 UROS uroporphyrinogen III synthase 4.79E-05 VAMP2 vesicle-associated membrane protein 2 (synaptobrevin 2) 7.44E-08 VIP vasoactive intestinal peptide 2.59E-04 VKORC1L1 vitamin K epoxide reductase complex, subunit 1-like 1 6.55E-06 VLDLR very low density lipoprotein receptor 5.95E-05 VSNL1 visinin-like 1 2.72E-09 WARS tryptophanyl-tRNA synthetase 9.35E-06 WDR13 WD repeat domain 13 1.33E-05 WDR23 WD repeat domain 23 2.85E-06 WDR86 WD repeat domain 86 5.45E-08 WIPI2 WD repeat domain, phosphoinositide interacting 2 2.20E-06 WSB2 WD repeat and SOCS box-containing 2 9.85E-06 XKR4 XK, Kell blood group complex subunit-related family, member 4 3.95E-05 YAF2 YY1 associated factor 2 2.24E-07 YARS tyrosyl-tRNA synthetase 2.00E-04 YKT6 YKT6 v-SNARE homolog (S. cerevisiae) 1.03E-04 tyrosine 3-/tryptophan 5-monooxygenase YWHAH activation protein, eta polypeptide 6.81E-05 YWHAZ.bApr07 unknown 5.94E-05 ZCCHC17 zinc finger, CCHC domain containing 17 6.57E-06 ZDHHC3 zinc finger, DHHC-type containing 3 1.86E-05 ZMAT2 zinc finger, matrin type 2 1.44E-04 ZNF263 zinc finger protein 263 1.62E-04 ZNF689 zinc finger protein 689 3.86E-04 ZNF711 zinc finger protein 711 3.08E-05 ZNRF1 zinc and ring finger 1 1.77E-04

Age Up-regulated

GeneSymbol GeneName Meta Q-value SPEN spen homolog, transcriptional regulator (Drosophila) 2.86E-05 ZAK sterile alpha motif and leucine zipper containing kinase AZK 1.17E-05 ANTXR1 anthrax toxin receptor 1 1.62E-06 ZBTB16 zinc finger and BTB domain containing 16 1.97E-04 ZFP36L1 zinc finger protein 36, C3H type-like 1 2.51E-05 SEPT9 septin 9 3.26E-04

166

GeneSymbol GeneName Meta Q-value ZXDC ZXD family zinc finger C 2.41E-04 AAAS achalasia, adrenocortical insufficiency, alacrimia (Allgrove, triple-A) 1.07E-03 ZIC2 Zic family member 2 (odd-paired homolog, Drosophila) 5.54E-04 IL28RA interleukin 28 receptor, alpha (interferon, lambda receptor) 1.35E-04 ZNF302 zinc finger protein 302 5.23E-05 ZBTB20 zinc finger and BTB domain containing 20 6.86E-04 ZNF423 zinc finger protein 423 1.31E-04 ZC3HAV1 zinc finger CCCH-type, antiviral 1 1.34E-04 SRRM2 serine/arginine repetitive matrix 2 1.38E-04 ZBTB7A zinc finger and BTB domain containing 7A 6.96E-04 CFH complement factor H 4.58E-06 KIAA0841 KIAA0841 6.90E-04 SEPT6 septin 6 2.00E-04 ZMYM6 zinc finger, MYM-type 6 2.59E-04 SIGLEC8 sialic acid binding Ig-like lectin 8 1.67E-04 DPF3 D4, zinc and double PHD fingers, family 3 1.08E-04 ZNF609 zinc finger protein 609 2.08E-04 ZFAND6 zinc finger, AN1-type domain 6 1.04E-05 ZBTB33 zinc finger and BTB domain containing 33 8.13E-04 IGFBP5 insulin-like growth factor binding protein 5 3.60E-06 SCN7A sodium channel, voltage-gated, type VII, alpha 6.82E-05 PHF3 PHD finger protein 3 2.17E-04 ARHGEF6 Rac/Cdc42 guanine nucleotide exchange factor (GEF) 6 2.31E-04 HBP1 HMG-box transcription factor 1 2.70E-04 AKAP1 A kinase (PRKA) anchor protein 1 3.87E-03 SWI/SNF related, matrix associated, actin dependent regulator of chromatin, SMARCD2 subfamily d, member 2 4.86E-04 ZNF235 zinc finger protein 235 2.96E-04 ITGB4 integrin, beta 4 8.21E-04 phosphoribosylaminoimidazole carboxylase, phosphoribosylaminoimidazole PAICS succinocarboxamide synthetase 3.21E-06 DTNA dystrobrevin, alpha 1.73E-03 EEF2K eukaryotic elongation factor-2 kinase 2.49E-03 SPON1 spondin 1, protein 1.11E-03 PBXIP1 pre-B-cell leukemia homeobox interacting protein 1 5.02E-04 MTSS1 metastasis suppressor 1 4.98E-04 SAFB2 scaffold attachment factor B2 3.51E-03 AHCYL1 S-adenosylhomocysteine hydrolase-like 1 4.59E-04 NUP160 nucleoporin 160kDa 7.59E-04 NXT2 nuclear transport factor 2-like export factor 2 4.13E-04 RHOBTB3 Rho-related BTB domain containing 3 4.78E-04 RGN regucalcin (senescence marker protein-30) 5.89E-05 SMAD5 SMAD family member 5 2.86E-05 GMPR guanosine monophosphate reductase 2.07E-04 CPNE3 copine III 2.35E-04 CD59 CD59 molecule, complement regulatory protein 1.08E-04 TRAF1 TNF receptor-associated factor 1 2.53E-04 PPP2R1B protein phosphatase 2 (formerly 2A), regulatory subunit A, beta isoform 2.10E-05 ADORA3 adenosine A3 receptor 4.57E-05 167

GeneSymbol GeneName Meta Q-value SMC1A structural maintenance of chromosomes 1A 5.88E-05 TMEM63A transmembrane protein 63A 8.82E-04 CALCOCO2 calcium binding and coiled-coil domain 2 4.95E-06 LRP10 low density lipoprotein receptor-related protein 10 3.21E-04 solute carrier family 13 (sodium-dependent dicarboxylate transporter), member SLC13A3 3 1.66E-03 SCAND2 SCAN domain containing 2 pseudogene 1.01E-05 PLIN perilipin 1.05E-06 VCAN versican 1.20E-04 JMJD2B jumonji domain containing 2B 1.96E-05 LPL lipoprotein lipase 1.93E-04 GPLD1 glycosylphosphatidylinositol specific phospholipase D1 3.71E-07 RSU1 Ras suppressor protein 1 2.00E-03 CALML4 calmodulin-like 4 2.43E-03 CPS1 carbamoyl-phosphate synthetase 1, mitochondrial 7.80E-06 STAG2 stromal antigen 2 1.41E-04 CHD6 chromodomain helicase DNA binding protein 6 2.44E-04 FGR Gardner-Rasheed feline sarcoma viral (v-fgr) oncogene homolog 5.43E-06 FYN FYN oncogene related to SRC, FGR, YES 9.37E-05 SMC5 structural maintenance of chromosomes 5 5.89E-05 PPP1R3C protein phosphatase 1, regulatory (inhibitor) subunit 3C 1.67E-04 IKBKB inhibitor of kappa light polypeptide gene enhancer in B-cells, kinase beta 2.18E-05 OSBPL2 oxysterol binding protein-like 2 2.52E-06 PPP1R3D protein phosphatase 1, regulatory (inhibitor) subunit 3D 2.06E-05 LAMA4 , alpha 4 2.01E-04 NTRK2 neurotrophic tyrosine kinase, receptor, type 2 7.68E-05 APOD apolipoprotein D 1.59E-04 OMD osteomodulin 5.23E-04 RHOQ ras homolog gene family, member Q 1.35E-04 KIAA0323 KIAA0323 4.89E-05 SGMS1 sphingomyelin synthase 1 3.53E-04 NPHP3 nephronophthisis 3 (adolescent) 9.58E-05 LSM14A LSM14A, SCD6 homolog A (S. cerevisiae) 2.82E-03 DDIT4 DNA-damage-inducible transcript 4 1.92E-04 PPP1R1B protein phosphatase 1, regulatory (inhibitor) subunit 1B 2.50E-04 ANGPT1 angiopoietin 1 1.75E-04 MAP3K6 mitogen-activated protein kinase kinase kinase 6 2.15E-04 TNK2 tyrosine kinase, non-receptor, 2 3.70E-04 SLC7A2 solute carrier family 7 (cationic amino acid transporter, y+ system), member 2 1.52E-05 ADD3 adducin 3 (gamma) 1.57E-03 SOS2 son of sevenless homolog 2 (Drosophila) 1.27E-07 ARHGEF10 Rho guanine nucleotide exchange factor (GEF) 10 1.21E-03 DBT dihydrolipoamide branched chain transacylase E2 4.73E-05 SIPA1L3 signal-induced proliferation-associated 1 like 3 3.71E-07 ADH1B alcohol dehydrogenase 1B (class I), beta polypeptide 4.02E-05 TRAM1 translocation associated membrane protein 1 3.94E-04 BRD8 bromodomain containing 8 7.76E-04 AFF1 AF4/FMR2 family, member 1 2.04E-04

168

GeneSymbol GeneName Meta Q-value RELA v-rel reticuloendotheliosis viral oncogene homolog A (avian) 4.00E-04 CD58 CD58 molecule 2.71E-04 PCGF2 polycomb group ring finger 2 4.37E-05 C19orf36 chromosome 19 open reading frame 36 3.87E-05 BCL2 B-cell CLL/lymphoma 2 1.12E-04 CALD1 caldesmon 1 1.21E-03 GFAP glial fibrillary acidic protein 1.43E-03 TIMP3 TIMP metallopeptidase inhibitor 3 5.23E-06 CADM1 cell adhesion molecule 1 3.02E-04 MTCP1 mature T-cell proliferation 1 2.54E-04 DAB2 disabled homolog 2, mitogen-responsive phosphoprotein (Drosophila) 9.45E-05 PDLIM3 PDZ and LIM domain 3 4.55E-04 CD22 CD22 molecule 1.72E-05 TNS1 tensin 1 5.20E-05 EPS8L1 EPS8-like 1 1.84E-04 AEBP2 AE binding protein 2 2.64E-04 ASPH aspartate beta-hydroxylase 6.94E-04 NKTR natural killer-tumor recognition sequence 2.14E-07 KTN1 kinectin 1 (kinesin receptor) 2.98E-04 SORBS1 sorbin and SH3 domain containing 1 1.28E-04 CXCR4 chemokine (C-X-C motif) receptor 4 6.22E-05 EMP3 epithelial membrane protein 3 4.28E-04 LEPR leptin receptor 9.30E-06 BAIAP3 BAI1-associated protein 3 4.53E-04 WWOX WW domain containing oxidoreductase 1.33E-06 SNAP23 synaptosomal-associated protein, 23kDa 3.50E-06 WNK1 WNK lysine deficient protein kinase 1 1.08E-04 RYR1 ryanodine receptor 1 (skeletal) 8.68E-05 COL6A1 , type VI, alpha 1 2.20E-03 STK3 serine/threonine kinase 3 (STE20 homolog, yeast) 1.03E-04 LATS2 LATS, large tumor suppressor, homolog 2 (Drosophila) 2.56E-04 CNOT2 CCR4-NOT transcription complex, subunit 2 3.74E-06 LAMA2 laminin, alpha 2 7.53E-04 RBL2 retinoblastoma-like 2 (p130) 7.45E-04 HEBP2 heme binding protein 2 6.29E-04 KIAA0240 KIAA0240 2.10E-05 TBL1X transducin (beta)-like 1X-linked 2.66E-07 AKAP13 A kinase (PRKA) anchor protein 13 2.08E-04 BCL6 B-cell CLL/lymphoma 6 3.94E-06 CD40 CD40 molecule, TNF receptor superfamily member 5 1.89E-04 ITPR2 inositol 1,4,5-triphosphate receptor, type 2 1.29E-06 MXI1 MAX interactor 1 1.84E-04 ARAF v-raf murine sarcoma 3611 viral oncogene homolog 5.43E-04 WHSC2 Wolf-Hirschhorn syndrome candidate 2 2.50E-04 CCDC69 coiled-coil domain containing 69 1.44E-04 CDKN2A cyclin-dependent kinase inhibitor 2A (melanoma, p16, inhibits CDK4) 8.38E-05 HLA-DPB1 major histocompatibility complex, class II, DP beta 1 1.18E-04 RNASE4 ribonuclease, RNase A family, 4 3.65E-04 169

GeneSymbol GeneName Meta Q-value IQCK IQ motif containing K 9.86E-04 CEP350 centrosomal protein 350kDa 5.68E-05 AHNAK AHNAK nucleoprotein 1.44E-04 CSF1 colony stimulating factor 1 (macrophage) 7.05E-04 BBS2 Bardet-Biedl syndrome 2 5.63E-04 ITGAV integrin, alpha V ( receptor, alpha polypeptide, antigen CD51) 1.89E-04 CAPN2 calpain 2, (m/II) large subunit 8.68E-05 EPOR erythropoietin receptor 1.58E-05 MBOAT5 unknown 1.04E-05 NR2C1 nuclear receptor subfamily 2, group C, member 1 9.73E-06 FMO3 flavin containing monooxygenase 3 6.11E-05 NEBL nebulette 4.02E-04 INSR insulin receptor 2.00E-03 OFD1 oral-facial-digital syndrome 1 1.03E-05 ITPKB inositol 1,4,5-trisphosphate 3-kinase B 1.17E-05 LGALS3 lectin, galactoside-binding, soluble, 3 2.94E-07 SSPN sarcospan (Kras oncogene-associated gene) 9.29E-06 TRIM4 tripartite motif-containing 4 1.50E-06 SPN sialophorin 1.87E-06 C9orf150 chromosome 9 open reading frame 150 1.33E-06 HSPBAP1 HSPB (heat shock 27kDa) associated protein 1 1.42E-03 MYST3 MYST histone acetyltransferase (monocytic leukemia) 3 1.97E-04 GGA2 golgi associated, gamma adaptin ear containing, ARF binding protein 2 3.05E-06 ACACB acetyl-Coenzyme A carboxylase beta 2.94E-07 CBFB core-binding factor, beta subunit 1.07E-03 TREX1 three prime repair exonuclease 1 3.96E-05 RBBP6 retinoblastoma binding protein 6 6.14E-06 EMP2 epithelial membrane protein 2 2.90E-03 ANXA4 annexin A4 2.94E-06 LPP LIM domain containing preferred translocation partner in lipoma 1.23E-04 MBD2 methyl-CpG binding domain protein 2 1.97E-04 SWAP70 SWAP-70 protein 1.41E-04 PTGER3 prostaglandin E receptor 3 (subtype EP3) 6.82E-05 SORBS3 sorbin and SH3 domain containing 3 5.88E-04 CAST calpastatin 3.76E-06 GYPC glycophorin C (Gerbich blood group) 1.07E-03 FER1L3 unknown 1.92E-05 FMNL2 formin-like 2 1.35E-04 AXL AXL receptor tyrosine kinase 1.34E-04 TPP1 tripeptidyl peptidase I 1.45E-05 VIM vimentin 1.27E-04 FZD7 frizzled homolog 7 (Drosophila) 1.35E-04 FAM107A family with sequence similarity 107, member A 1.17E-05 ECHDC2 enoyl Coenzyme A hydratase domain containing 2 3.55E-04 PRKCH protein kinase C, eta 6.17E-04 SMOX spermine oxidase 4.06E-05 MTM1 myotubularin 1 6.95E-05 EIF2AK2 eukaryotic translation initiation factor 2-alpha kinase 2 1.72E-04 170

GeneSymbol GeneName Meta Q-value BCAR3 breast cancer anti-estrogen resistance 3 3.35E-04 DGKG diacylglycerol kinase, gamma 90kDa 7.69E-04 PTRF polymerase I and transcript release factor 1.00E-03 TNRC6A trinucleotide repeat containing 6A 1.37E-04 EHD1 EH-domain containing 1 5.08E-04 ERG v-ets erythroblastosis virus E26 oncogene homolog (avian) 5.50E-06 MYST4 MYST histone acetyltransferase (monocytic leukemia) 4 2.08E-05 PGCP plasma glutamate carboxypeptidase 4.96E-04 RAB31 RAB31, member RAS oncogene family 3.45E-05 KIAA1627 KIAA1627 protein 1.52E-08 PHKA2 phosphorylase kinase, alpha 2 (liver) 8.70E-08 GPR125 G protein-coupled receptor 125 1.11E-04 MGST2 microsomal S-transferase 2 1.67E-04 TAZ tafazzin 4.58E-05 RBPMS RNA binding protein with multiple splicing 1.52E-08 BCL9 B-cell CLL/lymphoma 9 2.51E-06 USP54 ubiquitin specific peptidase 54 1.44E-04 EMCN endomucin 1.89E-04 N4BP1 NEDD4 binding protein 1 6.30E-05 TAF4 RNA polymerase II, TATA box binding protein (TBP)-associated factor, TAF4 135kDa 1.94E-09 BRD1 bromodomain containing 1 5.97E-04 C10orf104 chromosome 10 open reading frame 104 2.81E-04 KIAA0494 KIAA0494 2.44E-04 HLA-DPA1 major histocompatibility complex, class II, DP alpha 1 4.24E-04 ANP32E acidic (leucine-rich) nuclear phosphoprotein 32 family, member E 5.23E-05 RRBP1 ribosome binding protein 1 homolog 180kDa (dog) 9.45E-05 NFIA nuclear factor I/A 1.44E-03 STOM stomatin 1.62E-06 SLC22A5 solute carrier family 22 (organic cation/carnitine transporter), member 5 7.60E-08 IRF2 interferon regulatory factor 2 4.85E-04 transglutaminase 2 (C polypeptide, protein-glutamine-gamma- TGM2 glutamyltransferase) 3.38E-04 ASAHL unknown 6.95E-04 THOC2 THO complex 2 7.28E-04 PTGDS prostaglandin D2 synthase 21kDa (brain) 2.78E-04 EZH1 enhancer of zeste homolog 1 (Drosophila) 1.92E-04 PLEC1 plectin 1, intermediate filament binding protein 500kDa 2.04E-04 CHD1 chromodomain helicase DNA binding protein 1 1.53E-04 potassium intermediate/small conductance calcium-activated channel, subfamily KCNN3 N, member 3 5.00E-04 FGFR1 fibroblast growth factor receptor 1 2.66E-07 RASSF8 Ras association (RalGDS/AF-6) domain family (N-terminal) member 8 2.75E-05 CHST3 carbohydrate (chondroitin 6) sulfotransferase 3 8.31E-06 LOC441108 hypothetical gene supported by AK128882 1.71E-05 SPTBN1 spectrin, beta, non-erythrocytic 1 2.94E-07 C2orf24 chromosome 2 open reading frame 24 4.06E-05 TXNIP thioredoxin interacting protein 1.03E-04 NCOA3 nuclear receptor coactivator 3 4.53E-04

171

GeneSymbol GeneName Meta Q-value DDR2 discoidin domain receptor tyrosine kinase 2 3.07E-05 KIAA0913 KIAA0913 2.14E-04 ECM2 extracellular matrix protein 2, female organ and adipocyte specific 1.84E-04 GSTM5 glutathione S-transferase mu 5 4.55E-04 MED12 mediator complex subunit 12 5.42E-05 ACIN1 apoptotic chromatin condensation inducer 1 2.26E-04 SLC14A1 solute carrier family 14 (urea transporter), member 1 (Kidd blood group) 3.50E-06 NFATC3 nuclear factor of activated T-cells, cytoplasmic, calcineurin-dependent 3 4.93E-04 SOX13 SRY (sex determining region Y)-box 13 1.89E-04 ST7L suppression of tumorigenicity 7 like 1.71E-05 TFPI tissue factor pathway inhibitor (lipoprotein-associated coagulation inhibitor) 3.51E-05 CDC14A CDC14 cell division cycle 14 homolog A (S. cerevisiae) 6.95E-05 protein tyrosine phosphatase, receptor type, f polypeptide (PTPRF), interacting PPFIA1 protein (liprin), alpha 1 4.08E-04 CDC42BPA CDC42 binding protein kinase alpha (DMPK-like) 1.59E-03 GSN gelsolin (amyloidosis, Finnish type) 1.71E-05 ID4 inhibitor of DNA binding 4, dominant negative helix-loop-helix protein 3.05E-06 GPC4 glypican 4 2.32E-04 SDC2 syndecan 2 3.49E-04 MAX MYC associated factor X 1.69E-04 METTL7A methyltransferase like 7A 2.94E-05 SMAD4 SMAD family member 4 4.77E-04 STK10 serine/threonine kinase 10 6.11E-05 FGF2 fibroblast growth factor 2 (basic) 1.11E-03 LPIN1 lipin 1 6.70E-04 ALDH1L1 aldehyde dehydrogenase 1 family, member L1 1.72E-04 CCDC101 coiled-coil domain containing 101 1.90E-06 ASCL1 achaete-scute complex homolog 1 (Drosophila) 1.04E-05 ALDH6A1 aldehyde dehydrogenase 6 family, member A1 1.49E-03 GALM galactose mutarotase (aldose 1-epimerase) 3.34E-05 DVL2 dishevelled, dsh homolog 2 (Drosophila) 3.32E-05 C16orf35 chromosome 16 open reading frame 35 5.30E-04 PTBP1 polypyrimidine tract binding protein 1 2.28E-03 mannan-binding lectin serine peptidase 1 (C4/C2 activating component of Ra- MASP1 reactive factor) 8.82E-04 AEBP1 AE binding protein 1 1.22E-04 ALDH4A1 aldehyde dehydrogenase 4 family, member A1 1.20E-04 ALOX5 arachidonate 5-lipoxygenase 2.29E-04 BCAT2 branched chain aminotransferase 2, mitochondrial 9.83E-04 C1orf162 chromosome 1 open reading frame 162 3.38E-04 CANX calnexin 1.11E-03 CC2D1A coiled-coil and C2 domain containing 1A 1.29E-05 CCDC66 coiled-coil domain containing 66 3.07E-05 CDKN2C cyclin-dependent kinase inhibitor 2C (p18, inhibits CDK4) 2.74E-06 CFLAR CASP8 and FADD-like apoptosis regulator 3.35E-04 cystic fibrosis transmembrane conductance regulator (ATP-binding cassette CFTR sub-family C, member 7) 9.37E-05 CLN5 ceroid-lipofuscinosis, neuronal 5 8.68E-05 CNOT4 CCR4-NOT transcription complex, subunit 4 8.56E-04

172

GeneSymbol GeneName Meta Q-value FAM105B family with sequence similarity 105, member B 8.43E-04 GATM glycine amidinotransferase (L-arginine:glycine amidinotransferase) 6.05E-05 HFE hemochromatosis 3.02E-04 IFT140 intraflagellar transport 140 homolog (Chlamydomonas) 8.31E-06 IGF1R insulin-like growth factor 1 receptor 2.66E-05 IL13RA1 interleukin 13 receptor, alpha 1 2.54E-04 IQGAP1 IQ motif containing GTPase activating protein 1 8.21E-04 LIMK2 LIM domain kinase 2 9.73E-06 LMNA lamin A/C 4.65E-05 LSS lanosterol synthase (2,3-oxidosqualene-lanosterol cyclase) 8.82E-04 MGST1 microsomal glutathione S-transferase 1 2.41E-03 MYO1C myosin IC 1.37E-04 NEK3 NIMA (never in mitosis gene a)-related kinase 3 1.04E-05 NRP1 neuropilin 1 1.86E-04 NUPR1 nuclear protein 1 7.80E-06 PCBP2 poly(rC) binding protein 2 1.46E-05 PELI2 pellino homolog 2 (Drosophila) 1.74E-04 PHF1 PHD finger protein 1 6.17E-04 PLXNA2 plexin A2 4.65E-05 protein tyrosine phosphatase, non-receptor type 13 (APO-1/CD95 (Fas)- PTPN13 associated phosphatase) 7.97E-05 QKI quaking homolog, KH domain RNA binding (mouse) 2.52E-04 RIOK3 RIO kinase 3 (yeast) 2.35E-05 SLC15A1 solute carrier family 15 (oligopeptide transporter), member 1 5.20E-04 SLC16A9 solute carrier family 16, member 9 (monocarboxylic acid transporter 9) 9.46E-05 SWI/SNF related, matrix associated, actin dependent regulator of chromatin, SMARCC1 subfamily c, member 1 1.18E-04 SP100 SP100 nuclear antigen 2.41E-04 TCF3 transcription factor 3 (E2A immunoglobulin enhancer binding factors E12/E47) 9.58E-05 TIMP4 TIMP metallopeptidase inhibitor 4 2.56E-04 TLE1 transducin-like enhancer of split 1 (E(sp1) homolog, Drosophila) 3.55E-04 TOR1AIP1 torsin A interacting protein 1 1.22E-04

pH Down-regulated

GeneSymbol GeneName Meta Q-value MAPKAPK2 mitogen-activated protein kinase-activated protein kinase 2 2.45E-005

pH Up-regulated

GeneSymbol GeneName Meta Q-value LARGE like-glycosyltransferase 7.04E-04 HISPPD2A histidine acid phosphatase domain containing 2A 2.96E-04 potassium voltage-gated channel, shaker-related subfamily, beta KCNAB1 member 1 7.04E-04 173

GeneSymbol GeneName Meta Q-value RPS6KA3 ribosomal protein S6 kinase, 90kDa, polypeptide 3 9.71E-04 SYN2 synapsin II 7.04E-04 TUSC3 tumor suppressor candidate 3 1.39E-03 NNAT neuronatin 1.67E-03 IDH3B isocitrate dehydrogenase 3 (NAD+) beta 1.03E-03 SARS seryl-tRNA synthetase 7.04E-04 KIF3C kinesin family member 3C 1.94E-04 NPTX2 neuronal pentraxin II 5.98E-04 ENSA endosulfine alpha 7.04E-04 KATNB1 katanin p80 (WD repeat containing) subunit B 1 1.03E-03

PMI Down-regulated

GeneSymbol GeneName Meta Q-value BRD8 bromodomain containing 8 1.34E-04 ARHGEF7 Rho guanine nucleotide exchange factor (GEF) 7 8.64E-04 DAPK1 death-associated protein kinase 1 1.89E-03 CAMK2G calcium/calmodulin-dependent protein kinase II gamma 3.86E-03

PMI Up-regulated

GeneSymbol GeneName Meta Q-value GOSR2 golgi SNAP receptor complex member 2 5.96E-04 CYB5B cytochrome b5 type B (outer mitochondrial membrane) 8.63E-04 MAX MYC associated factor X 1.56E-03 MGMT O-6-methylguanine-DNA methyltransferase 1.17E-03 GRLF1 glucocorticoid receptor DNA binding factor 1 4.77E-04 PTGER3 prostaglandin E receptor 3 (subtype EP3) 1.18E-03 EXT1 exostoses (multiple) 1 1.01E-03 TPM1 tropomyosin 1 (alpha) 1.79E-03 syntrophin, beta 1 (dystrophin-associated protein A1, 59kDa, basic component SNTB1 1) 5.22E-03 EPHB2 EPH receptor B2 2.16E-03 CD44 CD44 molecule (Indian blood group) 1.01E-03 MAP2 microtubule-associated protein 2 2.24E-04 GRM7 glutamate receptor, metabotropic 7 1.92E-03 IDS iduronate 2-sulfatase 1.60E-03 SMAD3 SMAD family member 3 4.21E-04 NRG1 neuregulin 1 2.23E-04 SPTBN1 spectrin, beta, non-erythrocytic 1 2.22E-03 ATP2B4 ATPase, Ca++ transporting, plasma membrane 4 1.17E-03 RGS12 regulator of G-protein signaling 12 9.45E-04 C9orf116 chromosome 9 open reading frame 116 2.24E-04

174

GeneSymbol GeneName Meta Q-value FCAR Fc fragment of IgA, receptor for 1.45E-04 STAT1 signal transducer and activator of transcription 1, 91kDa 3.79E-03 PPP1R1A protein phosphatase 1, regulatory (inhibitor) subunit 1A 1.10E-04 UBE2D2 ubiquitin-conjugating enzyme E2D 2 (UBC4/5 homolog, yeast) 2.00E-03 GLP1R glucagon-like peptide 1 receptor 1.92E-03 POU6F1 POU class 6 homeobox 1 6.06E-05 CPSF6 cleavage and polyadenylation specific factor 6, 68kDa 2.33E-03 ARID4A AT rich interactive domain 4A (RBP1-like) 1.16E-03 RARRES1 retinoic acid receptor responder (tazarotene induced) 1 4.03E-04 CDK5R1 cyclin-dependent kinase 5, regulatory subunit 1 (p35) 5.10E-04 SFRP4 secreted frizzled-related protein 4 1.01E-03 dopachrome tautomerase (dopachrome delta-isomerase, tyrosine-related DCT protein 2) 6.38E-04 FZR1 fizzy/cell division cycle 20 related 1 (Drosophila) 3.65E-05 ABCC9 ATP-binding cassette, sub-family C (CFTR/MRP), member 9 7.67E-04 TRIM3 tripartite motif-containing 3 6.67E-03 CLSTN2 calsyntenin 2 4.47E-03 guanine nucleotide binding protein (G protein), alpha activating activity GNAO1 polypeptide O 8.64E-04 NUDCD3 NudC domain containing 3 1.85E-03 GPR161 G protein-coupled receptor 161 1.01E-03 MBNL3 muscleblind-like 3 (Drosophila) 6.12E-05 CCDC28B coiled-coil domain containing 28B 9.91E-04 RBPMS RNA binding protein with multiple splicing 3.83E-03 TNRC4 trinucleotide repeat containing 4 8.64E-04 NRF1 nuclear respiratory factor 1 8.58E-04 MCM4 minichromosome maintenance complex component 4 8.78E-04 BRAP BRCA1 associated protein 2.26E-04 DDX54 DEAD (Asp-Glu-Ala-Asp) box polypeptide 54 4.12E-04 GTSE1 G-2 and S-phase expressed 1 5.89E-04 ATP8A2 ATPase, aminophospholipid transporter-like, class I, type 8A, member 2 1.77E-03 ARTN artemin 7.67E-04 ADRA1A adrenergic, alpha-1A-, receptor 4.36E-03 MEF2B myocyte enhancer factor 2B 2.45E-04 CDH6 cadherin 6, type 2, K-cadherin (fetal kidney) 2.51E-03 NR4A2 nuclear receptor subfamily 4, group A, member 2 8.64E-04 MYB v-myb myeloblastosis viral oncogene homolog (avian) 1.17E-03 DIAPH2 diaphanous homolog 2 (Drosophila) 1.01E-03 IKZF1 IKAROS family zinc finger 1 (Ikaros) 2.24E-04 ACRV1 acrosomal vesicle protein 1 1.55E-03 ERG v-ets erythroblastosis virus E26 oncogene homolog (avian) 9.24E-05 ABAT 4-aminobutyrate aminotransferase 7.58E-04 WWOX WW domain containing oxidoreductase 6.12E-05 ITGB3 integrin, beta 3 (platelet glycoprotein IIIa, antigen CD61) 1.44E-04 ESR2 estrogen receptor 2 (ER beta) 8.78E-04 GRP gastrin-releasing peptide 1.92E-04 HES2 hairy and enhancer of split 2 (Drosophila) 1.43E-03 KLK11 kallikrein-related peptidase 11 3.14E-05 DYRK1A dual-specificity tyrosine-(Y)-phosphorylation regulated kinase 1A 2.24E-04 175

GeneSymbol GeneName Meta Q-value TOP3A topoisomerase (DNA) III alpha 1.54E-03 SCAND2 SCAN domain containing 2 pseudogene 1.16E-03 HPGD hydroxyprostaglandin dehydrogenase 15-(NAD) 3.34E-03 PIK3CG phosphoinositide-3-kinase, catalytic, gamma polypeptide 9.99E-04 BCL2 B-cell CLL/lymphoma 2 1.56E-03

Male Up-regulated

GeneSymbol GeneName Meta Q-value JARID1D jumonji, AT rich interactive domain 1D 5.05E-58 USP9Y ubiquitin specific peptidase 9, Y-linked (fat facets-like, Drosophila) 5.04E-35 EIF1AY eukaryotic translation initiation factor 1A, Y-linked 1.14E-96 CYorf15B chromosome Y open reading frame 15B 1.61E-138 DDX3Y DEAD (Asp-Glu-Ala-Asp) box polypeptide 3, Y-linked 8.84E-38 UTY ubiquitously transcribed tetratricopeptide repeat gene, Y-linked 3.49E-81 RPS4Y1 ribosomal protein S4, Y-linked 1 7.87E-05 TTTY15 testis-specific transcript, Y-linked 15 4.25E-04 CYorf15A chromosome Y open reading frame 15A 9.58E-107 ZFY zinc finger protein, Y-linked 4.80E-18 NBL1 neuroblastoma, suppression of tumorigenicity 1 2.75E-71 PRMT2 protein arginine methyltransferase 2 4.95E-83

Female Up-regulated

GeneSymbol GeneName Meta Q-value XIST X (inactive)-specific transcript (non-protein coding) 1.30E-03 HDHD1A haloacid dehalogenase-like hydrolase domain containing 1A 9.75E-04 UTX ubiquitously transcribed tetratricopeptide repeat, X chromosome 1.25E-14 JARID1C jumonji, AT rich interactive domain 1C 9.41E-95 USP9X ubiquitin specific peptidase 9, X-linked 5.06E-04 STS steroid sulfatase (microsomal), isozyme S 9.75E-04 ZFX zinc finger protein, X-linked 1.12E-04 PNPLA4 patatin-like phospholipase domain containing 4 3.52E-09 MAP2K3 mitogen-activated protein kinase kinase 3 1.99E-03 LYST lysosomal trafficking regulator 1.64E-10 SRRM2 serine/arginine repetitive matrix 2 9.31E-08 EMCN endomucin 1.72E-12 NKTR natural killer-tumor recognition sequence 3.10E-05 PCM1 pericentriolar material 1 1.38E-07 EPS8L1 EPS8-like 1 1.45E-05

176