META-ANALYSIS OF EXPRESSION IN MOUSE MODELS OF NEURODEGENERATIVE DISORDERS by

Cuili Zhuang

B.Sc. Biology, The University of British Columbia, 2009

A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF

MASTER OF SCIENCE in THE FACULTY OF GRADUATE AND POSTDOCTORAL STUDIES (Bioinformatics)

THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver)

April 2017

© Cuili Zhuang, 2017 Abstract

There is intense interest in understanding the molecular mechanisms that contribute to neurodegenerative disorders (NDs), which involve complex interplays of genetic and environmental factors. To catch early events involved in disease initiation requires investigation on pre-symptomatic brain samples. It is difficult to capture early molecular events using post- mortem human brain samples since these samples represent the late phase of the disorder with progressive brain damage and neurodegeneration. Disease mouse models are developed to study disease progression and pathophysiology. Here, I focus on two of the most studied NDs: Alzheimer’s disease (AD) and Huntington’s disease (HD). Mouse models developed for the disease (AD or HD) often share similar phenotypes mimicking human disease symptoms, which suggest potential common underlying mechanisms of disease initiation and progression across mouse models of the same disease. Investigation of profiles of pre-symptomatic animals from different mouse models may shed light on the mechanisms occurred in the early disease phase. Gene expression profiling analyses have been performed on mouse models and some of the studies investigate the molecular changes in pre-symptomatic phase of AD and HD respectively. However, their findings have not reached a clear consensus. To identify shared molecular changes across mouse models, I conducted a systematic meta-analysis of gene expression in mouse models of AD and HD, consisted of 369 gene expression profiles from 23 independent studies. The goal of this project is to identify transcriptional alterations shared among different mouse models of each disease respectively, especially changes during early disease phase that may link to disease-causing mechanisms, and potential common cross-disease changes. For both of the disorders, the results showed subtle but biologically interpretable changes shared across mouse models in the early disease phase that may contribute to the early disease progression: dysregulation of involved in cholesterol biosynthesis and complement system in AD mouse models and genes encoding mitochondrial respiratory chain complexes in HD mouse models. Cross-disease similarities in the late phase suggested that different brain regions may share mechanisms in response to neuronal loss and toxic aggregates.

ii

Preface

The idea to perform a meta-analysis of gene expression of neurodegenerative disorders was initiated by my supervisor Dr. Paul Pavlidis. With his guidance, I was responsible for the design and analysis of the research presented. The motivation for investigating cross-disease similarities was from the NeuroGEM project, our collaboration with Dr. Jörg Gsponer and his group. I was responsible for the writing of this thesis, with useful suggestions and editing from Dr. Paul Pavlidis and Dr. Lilah Toker. None of the chapters or combination of this research has been published yet. However, manuscripts are planned.

iii

Table of Contents

Abstract ...... ii Preface ...... iii Table of Contents ...... iv List of Tables ...... vi List of Figures ...... vii Glossary ...... viii Acknowledgements ...... ix Dedication ...... x Chapter 1: Introduction ...... 1 1.1 Alzheimer’s disease ...... 1 1.1.1 AD pathological hallmarks: plaques and neurofibrillary tangles ...... 1 1.1.2 AD genetic risk factors ...... 3 1.2 Huntington’s disease ...... 4 1.2.1 HD disease-causing gene: HTT ...... 4 1.2.2 HD genetic modifiers ...... 5 1.3 Common mechanisms in AD and HD...... 6 1.4 Investigate early events involved in AD and HD initiation with disease mouse models in pre-symptomatic phase...... 6 1.5 Mouse models of AD and HD ...... 7 1.5.1 AD mouse models ...... 7 1.5.2 HD mouse models ...... 8 1.6 Cell-type proportion changes in the brains of AD and HD...... 12 1.7 Transcriptomic analyses in AD and HD mouse models...... 12 1.7.1 Transcriptomic analyses in AD mouse models ...... 13 1.7.2 Transcriptomic analyses in HD mouse models ...... 13 1.8 Meta-analysis of gene expression ...... 13 1.8.1 Methods of meta-analysis ...... 14 1.8.2 Meta-analysis on gene expression in AD...... 14 1.8.3 Meta-analysis on gene expression in HD...... 15 iv

1.9 Motivations ...... 15 Chapter 2: Materials and Methods ...... 17 2.1 Data retrieval from GEO ...... 17 2.2 Data pre-processing and quality control ...... 19 2.3 Dividing samples into early and late disease phases and combining data sets...... 19 2.4 Estimate cell-type proportion changes ...... 20 2.5 Fitting linear mixed-effects models and applying jackknife procedure to rank genes . 21 2.5.1 Linear mixed-effects model to correct for between-study variations...... 22 2.5.2 Linear mixed-effects model to correct for between-study variations and cell-type proportion changes ...... 23 2.5.3 Jackknife procedure for gene ranking ...... 24 2.6 Functional enrichment analysis...... 25 Chapter 3: Results...... 29 3.1 Estimation of cell-type proportion changes ...... 30 3.1.1 Estimation of cell-type proportion changes in AD mouse models ...... 30 3.1.2 Estimation of cell-type proportion changes in HD mouse models ...... 31 3.2 Meta-analysis of gene expression in Alzheimer’s disease mouse models ...... 36 3.3 Meta-analysis of gene expression in Huntington’s disease mouse models ...... 40 3.4 Cross-disease comparison revealed similarities in the late phase of AD and HD...... 67 Chapter 4: Discussion and Conclusion ...... 68 4.1 Consistent transcriptomic alterations were identified across different mouse models for each disorder...... 68 4.2 Applying cell population proportion correction revealed transcriptional changes of cell-type specific regulatory events...... 71 4.3 Shared gene expression changes and implications in AD mouse models ...... 73 4.4 Shared gene expression changes and implications in HD mouse models ...... 76 4.5 Cross-disease commonalities in the late disease phase...... 77 4.6 Limitations and future work...... 79 4.7 Conclusion ...... 80 Bibliography ...... 81

v

List of Tables Table 1.1. Summary of Alzheimer’s disease mouse models analyzed...... 9 Table 1.2. Summary of Huntington’s disease mouse models analyzed...... 11 Table 2.1. Summary of selected gene expression profiling studies for AD and HD...... 18 Table 2.2. Samples removed from analysis in each study...... 26 Table 2.3. Summary of selected AD mouse model studies...... 27 Table 2.4. Summary of selected HD mouse model studies...... 28 Table 3.1. Top-ranked cell-type marker genes before and after marker gene profiles correction.44 Table 3.2. Top 50 up-regulated genes for AD mouse models in the early phase after marker gene profiles correction...... 46 Table 3.3. Top 50 down-regulated genes for AD mouse models in the early phase after marker gene profiles correction...... 48 Table 3.4. Top 50 up-regulated genes for AD mouse models in the late phase after marker gene profiles correction...... 50 Table 3.5. Top 50 down-regulated genes for AD mouse models in the late phase after marker gene profiles correction...... 53 Table 3.6. Top 50 up-regulated genes for HD mouse models in the early phase after marker gene profiles correction...... 55 Table 3.7. Top 50 down-regulated genes for HD mouse models in the early phase after marker gene profiles correction...... 57 Table 3.8. Top 50 up-regulated genes for HD mouse models in the late phase after marker gene profiles correction...... 60 Table 3.9. Top 50 down-regulated genes for HD mouse models in the late phase after marker gene profiles correction...... 62 Table 3.10. Comparisons between top genes and top hits reported in original studies for the late AD phase...... 64 Table 3.11. Comparisons between top genes and top hits reported in original studies for the early HD phase...... 65 Table 3.12. Comparisons between top genes and top hits reported in original studies for the late HD phase...... 66

vi

List of Figures

Figure 1.1. Amyloid-β precursor protein (APP) processing...... 3 Figure 1.2. Huntingtin gene processing...... 5 Figure 2.1. Overview of the workflow...... 17 Figure 3.1. Marker gene profiles of in AD mouse models...... 32 Figure 3.2. Marker gene profiles of glial cells in AD mouse models...... 33 Figure 3.3. Marker gene profiles of neurons in HD mouse models...... 34 Figure 3.4. Marker gene profiles of glial cells in HD mouse models...... 35 Figure 3.5. Number of differentially expressed genes (FDR < 0.05) in AD mouse models before and after marker gene profiles correction...... 37 Figure 3.6. Top 20 up and down-regulated genes for AD early phase after marker gene profiles correction...... 38 Figure 3.7. Top 20 up and down-regulated genes for AD late phase after marker gene profiles correction...... 39 Figure 3.8. Number of differentially expressed genes (FDR < 0.05) in HD mouse models before and after marker gene profiles correction...... 41 Figure 3.9. Top 20 up and down-regulated genes for HD early phase after marker gene profiles correction...... 42 Figure 3.10. Top 20 up and down-regulated genes for HD late phase after marker gene profiles correction...... 43 Figure 4.1. Expressions of Trem2 in the late AD phase before and after marker gene profiles correction...... 69 Figure 4.2. Expressions of Ddit4l in the late HD phase before and after marker gene profiles correction...... 70 Figure 4.3. Expressions of Msmo1 in the late AD phase before and after marker gene profiles correction...... 70 Figure 4.4. Expressions of Nrep in the late HD phase before and after marker gene profiles correction...... 71

vii

Glossary

AD Alzheimer's disease ANOVA analysis of variance BAC bacterial artificial DE differentially expressed FDR false discovery rate GEO Gene Expression Omnibus GO GWAS genome-wide association study KI knock-in KO knock-out HD Huntington's disease LMM linear mixed-effects model MSN medium spiny ND neurodegenerative disorder NFT neurofibrillary tangle polyQ polyglutamate PCA principal component analysis YAC yeast artificial chromosome

viii

Acknowledgements

First and foremost, I would like to express my special thanks to my supervisor, Dr. Paul Pavlidis, for his brilliant guidance and patience throughout the project. I am grateful for the valuable advice from my committee members, Dr. Weihong Song and Dr. Gabriela Cohen Freue. Many thanks to all the authors who contributed to the public data. I owe thanks to all lab members in the Pavlidis’ lab. To the Gemma curation team, especially James Liu and Nathaniel Lim, who helped me a lot to retrieve public data. To Ogan Mancarci and Dr. Lilah Toker, who helped me with the maker gene profiles estimation analysis. To Dr. Sanja Rogic, who took great care of me in the lab. To everyone else in the lab, for the inspiring discussions in science and all the fun activities. I would like to thank the Canadian Institutes of Health Research (CIHR) Strategic Training Program in Bioinformatics and Department of Psychiatry, UBC for their financial support. Finally, my deepest appreciation and thanks to my family.

ix

Dedication

To John Tse

x

Chapter 1: Introduction There is intense interest in understanding the molecular mechanisms that contribute to neurodegenerative disorders (NDs), which involve complex interplays of genetic and environmental factors. NDs are characterized by progressive loss of neurons in the human brain (i.e. neurodegeneration). Examples of ND include Alzheimer's diseases (AD), Huntington's disease (HD), Parkinson's disease, and amyotrophic lateral sclerosis [1]. In this thesis, I focus on AD and HD, which are two of the most studied NDs. Despite the discovery of several disease-causing genes of these two disorders, the initiating mechanisms are still poorly understood [2], [3]. My goal is to identify transcriptional alterations in mouse models of AD and HD separately, focusing on early disease phases, and to investigate potential cross- disease similarities. Because there are multiple mouse models for both AD and HD, I take a meta-analysis approach to integrate data from different studies for each disorder. To introduce these topics, in this introduction, I review the current understanding of neuropathology of AD and HD, cross-disease common mechanisms, and commonly used mouse models for AD and HD. Secondly, I review cellular composition changes in the brains of AD and HD. Lastly, I review transcriptomic analyses in AD and HD mouse models and meta-analysis methods for gene expression studies. 1.1 Alzheimer’s disease Alzheimer’s disease (AD) is the most common form of dementia [4]. It has been estimated that about 10% of North Americans over the age of 65 will develop AD and the burden on public healthcare system is rapidly increasing due to population ageing [5], [6]. Patients often present with memory impairment and progress to other cognitive and behavioural dysfunctions [3]. 1.1.1 AD pathological hallmarks: amyloid plaques and neurofibrillary tangles Accumulations of amyloid plaques and neurofibrillary tangles (NFTs) in the brain are the key pathological hallmarks of AD. The , a region crucial for memory and spatial navigation, is one of the first brain regions where amyloid plaques and NFT depositions are observed. The plaques and tangles spread to other brain regions as the disease progresses [3]. Amyloid plaques are extracellular deposits of insoluble forms of amyloid β-protein (Aβ). The majority of amyloid-β precursor protein (APP) is processed in the non-amyloidogenic pathway, which does not produce Aβ. In the amyloidogenic pathway, APP is cleaved by β- 1 secretase and subsequently by γ-secretase [7], [8] (Figure 1.1).The process produces primarily two Aβ isoforms: a 40-amino-acid amyloid β-peptide (Aβ40) and a 42-amino-acid (Aβ42); Aβ42 is the major species associated with the plaque [9]. β-secretase is encoded by beta-site amyloid-β precursor protein-cleaving 1 (BACE1) [7]. Mutations in the amyloid precursor protein gene (APP) at β-secretase cleavage site increase β-secretase cleavage and subsequently Aβ production. One such mutation, named the Swedish mutation, has been identified in some early- onset familial AD [10]. , which have two homologs: presenilin1 (PSEN1) and presenilin2 (PSEN2), are the catalytic components of γ-secretase complex [8]. Mutations in APP at γ-secretase cleavage site, increase production of Aβ42 [9], [11]. Several γ-secretase cleavage site mutations are discovered in familial AD cases, such as the Florida mutation and the London mutation [10]. Mutations in PSEN1 and PSEN2 found in familial AD also increase the ratio of Aβ42 [8]. Another AD pathological hallmark, NFTs are intracellular aggregates of hyperphosphorylated [9], [11]. Tau protein is encoded by the -associated protein tau gene (MAPT). Tau-associated are mainly found in axons, the output projection of a neuron [12]. Abnormal phosphorylation of tau in the AD brain increases its tendency to aggregate and leads to the accumulation of NFTs. NFTs can disrupt microtubule stabilization, and cause neuronal damage [13]. However, there are no known mutations in MAPT associated with AD so far. Though amyloid plaques and NFTs are AD pathological hallmarks, they are not unique to AD brains. Small amounts of plaques and NFTs can be observed in brains of normal ageing individuals [14]. This makes it difficult to distinguish pre-symptomatic AD brains from normal brains that also have accumulated small amounts of plaques and NFTs.

2

Figure 1.1. Amyloid-β precursor protein (APP) processing. The majority of APP is processed in the non-amyloidogenic pathway, which is cleaved by α- secretase and then by γ-secretase. This pathway produces P3 peptide, which does not aggregate into Aβ plaques. In the amyloidogenic pathway, APP is cleaved by β-secretase and subsequently by γ-secretase. Protein and fragment lengths are not up to scale.

1.1.2 AD genetic risk factors As mentioned, mutations in APP, PSEN1 and PSEN2 cause familial AD. However, less than 1% of AD patients carry these mutations. The rest are sporadic cases with age-of-onset around 65 with unclear causes [3]. Yet, these sporadic cases show an estimated heritability over 60% [15], which suggests other genetic factors can contribute to AD pathogenesis. The most established AD risk factor is apolipoprotein E (ApoE, encoded by APOE). ApoE mediates cholesterol metabolism in the brain and is found in Aβ plaques and NFTs [16], [17]. There are three major allelic variants: ApoE ɛ2 (low-risk), ApoE ɛ3 (neutral), and ApoE ɛ4 (high-risk), corresponding to isoforms APOE2, APOE3 and APOE4. ApoE ɛ4 allele is present in 25-30% of the general population and in 40-60% of the late-onset AD cases [15], [18]. It has been shown that the isoform APOE4 has lower efficiency in transporting cholesterol from astrocytes to neurons compared to APOE3, and lead to synaptic dysfunctions [19], [20]. A recent study

3 showed that ApoE enhances APP transcription in human and mouse neurons by activating mitogen-activated protein kinase (MAPK) cascade, which is cholesterol-independent [21]. There are several other risk factors suggested by genome-wide association studies (GWAS), such as triggering expressed on myeloid cells 2 (TREM2) [22], transmembrane protein CD33 [23], [24], clusterin (CLU) and phosphatidylinositol binding clathrin assembly protein (PICALM) [25]. Some of these risk genes could be related to Aβ clearance and neuroinflammation associated pathways [3], [22]. However, these risk genes can only explain a portion of the genetic risks and there is still intense interest to identify more risk genes. 1.2 Huntington’s disease Huntington’s disease (HD) is another ND associated with misfolded protein accumulations. HD is caused by alterations in the huntingtin gene (HTT) and characterized by motor symptoms, cognitive impairment and psychiatric abnormalities [26]. The prevalence of HD is around 1 over 10,000 in the Western populations, with a higher occurrence in populations of European descent compared to African and Asian descents [27]. Currently, there is no cure for the disease; however, treatments to manage the symptoms can improve patients’ quality of life [27]. The neuropathological changes are predominantly detected in the striatum, a region associated with cognitive and motor functions. HD is characterized by the selective loss of medium spiny neurons (MSNs) in the striatum, which leads to motor and cognitive symptoms [26]. 1.2.1 HD disease-causing gene: HTT The disease-causing gene in HD, HTT was discovered in the early 90’s. Expression of HTT can be found throughout the body, including neurons and glial cells in adult brain. The huntingtin protein interacts with many and has important roles in neuronal development, neuronal maintenance, and cell adhesion [26]. HD is caused by cytosine-adenine-guanine (CAG) trinucleotide expansion of exon 1 of HTT and the severity and age of onset are correlated with the length of the expansion [26], [27]. Patients with over 40 CAG repeats often have mid-life onset, while repeats of 36-39 show incomplete penetrance [27]. Mutant HTT (mHTT) encodes the full-length huntingtin protein with expanded polyglutamate (polyQ) and HTT exon1 fragment containing the polyQ expansion [27] (Figure 1.2). Aggregates of these HTT fragments are the major component of the intracellular inclusions observed in postmortem human brains 4 and in some HD mouse models [26]. It has been hypothesized that the expanded CAG repeats may result a gain of function, which alter protein interactions and disrupt some of the functions of normal HTT [28]. However, the mechanisms of how mutant huntingtin and the polyQ expansion cause HD are not yet well defined [26], [27].

Figure 1.2. Huntingtin gene processing. HTT: huntingtin; polyQ: polyglutamate. mHTT: mutant HTT. (1) Expression of HTT normally produces an HTT mRNA which translates into full-length HTT protein. (2) Mutant HTT also can produce a mRNA contains mHTT exon1 with expanded CAG repeats sequence only, which translates into mHTT exon1 fragment with expanded polyQ. (3) Post-translational proteolytic processing of full-length HTT protein produces products of various length and one of the products is the HTT exon1 fragment. Gene and protein lengths are not up to scale.

1.2.2 HD genetic modifiers Though HD is a monogenic disease, patients still show clinical variability. The repeat length can only explain a portion of the disease variation and the rest of the variation indicates some degrees of heritability. This suggests that other factors, previously referred to as genetic modifiers, can also influence the disease progression [28], [29]. Genetic modifiers are genes that can alter the disease process, which can affect age of onset and severity of the symptoms [28]. Identification of genetic modifiers and the processes involved can provide therapeutic targets for intervention, such as delay age of onset [28]. In GWAS, several candidate genes have been

5 proposed as genetic modifiers for HD. Examples include glutamate receptor, ionotropic kainate 2 (GRIK2) and peroxisome proliferator-activated receptor gamma coactivator 1-alpha (PPARGC1A). However, not all independent GWAS detect variants of these two candidate genes that are associated with HD clinical course [28]. So far, there is no well-established HD genetic modifiers and the search for bona fide modifiers continues [27]. 1.3 Common mechanisms in AD and HD. AD and HD are two clinically distinct NDs with different genetic components. However, abnormal accumulation and aggregation of misfolded proteins is a key pathological similarity between these two NDs [30] and suggests that common cross-disease mechanisms and pathways may exist in AD and HD. For example, mitochondrial dysfunction has been overserved in both AD and HD [31]. Naia et al. (2016) suggested the misfolded protein aggregates (Aβ and TAU in AD and mHTT in HD), induce the mitochondrial defects [31]. Another cross-disease commonality between AD and HD is increased induction of autophagy [32], [33]. Autophagy is an important process to remove damaged organelles and protein aggregates. However, in AD and HD, the autophagy process is defective and unable to efficiently eliminate misfolded proteins [32], [33]. Increased induction of autophagy could be neuroprotective in response to toxic protein aggregates and autophagy has been proposed as a therapeutic target for both AD and HD [32], [33]. It is still unknown whether these common changes across AD and HD occur in a disease phase dependent manner (i.e. changes happen in early phase only, late phase only, or both phases). Common cross-disease mechanisms in the early phase (if any) may imply important disease-causing mechanisms of AD and HD. Therefore, analyzing gene expression profiles of AD and HD mouse models and comparing the results from samples of early and late disease phases may provide insight to the question - are there any cross-disease mechanisms in the early and late phases? 1.4 Investigate early events involved in AD and HD initiation with disease mouse models in pre-symptomatic phase. In AD and HD, pathological changes in the brain often precede the occurrence of clinical symptoms [27], [34], [35]. For example, AD patients have a prolonged pre-symptomatic phase, and neuropathology changes precede clinical symptoms by years [14]. In pre-symptomatic HD patients, subtle white-matter atrophy and cognitive impairment occur before the onset of motor symptoms [36]. 6

Post-mortem brain samples of AD and HD patients are available to study molecular changes in the diseased brains, though the supply is limited. However, these post-mortem brain samples represent late phase of the disorder, and it would be challenging to distinguish causes from effects of neurodegeneration. Therefore, to investigate early events involved in disease initiation, pre-symptomatic brain samples are needed. The supply of HD pre-symptomatic brains samples is extremely limited (i.e. brain samples of pre-symptomatic mHTT carriers), and pre- symptomatic AD brains are difficult to be distinguished from normal ageing brains. Instead, studies are often done on mouse models to examine early changes of AD and HD, especially changes occurred before neurodegeneration and behavioural changes. Studying early molecular changes in mouse models improves our understanding of human disease pathogenesis. 1.5 Mouse models of AD and HD To catch early events involved in disease initiation requires investigation on pre- symptomatic brain samples, which can help to distinguish causes from effects of neurodegeneration. Most mouse models have a well-documented timeline of disease initiation and progression; therefore, it is possible to categorize early and late disease phase mouse samples of different models based on occurrence of specific phenotypes. Samples from pre- symptomatic mice can elucidate molecular changes before symptoms develop. Identifying the shared gene expression changes from different mouse models in the pre-symptomatic phase may reveal convergent molecular changes underlying the onset of the disease. 1.5.1 AD mouse models To elucidate the complex genetic factors that are associated with AD, various transgenic mouse models have been developed based on known mutations in familial AD cases and other AD related genes [35]. The most commonly used models overexpress one or more human transgenes (APP, PSEN1, PSEN2) that contain known AD associated mutations (amyloid transgenic models) [35]. Examples are Tg2576, J20, 5xFAD, TASTPM (Table 1.1). These models accumulate Aβ plaques but not all of them develop NFTs and neuronal loss [35]. Some of them develop cognitive impairment at a very early age (e.g. 1-2 month-old J20 mice) [35]. Though there are no known MAPT mutations that are associated with AD, researchers successfully introduced mutated human MAPT to induce NFTs in tau transgenic models. However, these models often do not accumulate Aβ plaques [37], [38]. A few knock-out (KO) models have been developed to investigate functions of APP and related genes, such as App KO 7 and amyloid precursor-like protein 2 (Aplp2) KO [39]. There are other models that introduce genes that are not known to directly contribute to AD, such as a transgenic model expressing neutralizing anti-nerve growth factor (NGF) immunoglobulins (anti-NGF AD11) [40]. This model develops age-related plaques and tangles in the brain as well as other AD-like phenotypes [40]. Despite the differences in mechanisms, many of these animal models show similar age- dependent progression in AD-related decline of cognitive functions as observed in human [35]. See Table 1.1 for AD mouse models analyzed in this thesis and their age-dependent phenotypes. 1.5.2 HD mouse models Early disease mechanisms and how mHTT and its fragments cause toxicity in the human brain still remain unclear. To study the underlying disease mechanisms, toxin-induced and genetic mouse models of HD have been developed. Toxin-induced models were developed before the discovery of the disease-causing gene, HTT. These models mimic some motor and cognitive symptoms of HD by injection of excitotoxins (e.g. quinolinic acid) or mitochondrial inhibitors (e.g. 3-nitropropionic acid) to induce targeted neuronal loss in the striatum. However, there is no gradual progression of neuronal loss in these models and therefore they are not effective models to study HD progression [41]. Transgenic and knock-in models, on the other hand, introduce full-length mHTT, mHTT exon1 fragment, or CAG repeats into the murine genome to replicate HD-like disease progression (see Table 1.2). These mouse models are able to recapitulate some of the changes observed in HD patients [41]. R6/2 is one of the best characterized and most commonly used transgenic HD mouse models. In this model, sequences of exon1 of human HTT (N-terminal fragment) are randomly inserted into the mouse genome and the model quickly develops HD-like phenotypes around 1 month and rarely survives over 4 months [42], [43]. Full-length transgenic models use vectors, such as yeast artificial chromosome (YAC) or bacterial artificial chromosome (BAC), to express the entire human HTT with various CAG repeat lengths [43]. N-terminal fragment and full-length transgenic models express both transgenes and murine Htt. Finally, in knock-in models, murine Htt exon1 is replaced with human mHTT exon1 or CAG repeats of various lengths (e.g. HdhQ lines and CAG knock-in lines). In these models, only the human mHTT is expressed. The progression of HD-like phenotype correlates with the number of CAG repeats [41]. Overall, N-terminal fragment and full-length transgenic models develop phenotype earlier than knock-in models. See Table 1.2 for HD mouse models analyzed in this thesis and their age-dependent phenotypes and repeat lengths. 8

Table 1.1. Summary of Alzheimer’s disease mouse models analyzed. Samples are grouped into early or late phase based on the appearance of cognitive impairment. --: Phenotype is not observed; NA: data not available; m: months; CI: Cognitive impairment; NL: Neuronal loss Phenotype Model Promoter Mouse Models Modification Data set Age Phase Pla- Types CI Tangles NL ques hamster GSE36237 5m early ------prion Tg2576 [44]– human APP with Swedish GSE1556 12m late yes yes -- -- protein [46] mutation KM670/671NL promoter GSE15056 17m late yes yes -- -- human APP with Swedish mutation KM670/671NL J20 [47], [48] GSE14499 7m late yes yes -- -- and Indiana mutation APP V717F human APP with Swedish GSE52022 4m late yes yes yes yes mutation KM670/671NL, Florida mutation APP 5xFAD [49] I716V, London mutation 14- Amyloid APP V717I, PSEN1 GSE50521 late yes yes yes yes murine 15m transgenic M146L, and PSEN1 Thy1 models L286V tissue- 2-4m early ------specific TAS10 [50], human APP with Swedish regulatory GSE64398 8m late yes ------[51] mutation KM670/671NL elements 18m late yes yes -- -- human PSEN1 with 2 - TPM [52] GSE64398 early NA -- -- NA mutation M146V 18m TASTPM [52], 2-4m early ------[53] GSE64398 human APP with Swedish 8-18m late yes yes -- -- (heterozygous) mutation KM670/671NL and human PSEN1 2m early ------TASTPM M146V GSE64398 4m late NA yes -- -- (homozygous) 8-18m late yes yes -- --

9

Phenotype Model Promoter Mouse Models Modification Data set Age Phase Pla- Types CI Tangles NL ques TAU [37], human MAPT with 2-4m early ------NA TAU GSE64398 CaMKII [50] mutation P301L 8-18m late yes -- yes NA transgenic promoter human MAPT with models rTg4510 [38] GSE53480 4m late yes -- yes -- mutation P301L

App KO [39] App homozygous null GSE48622 2m early ------

KO Aplp2 KO [39] Aplp2 homozygous null GSE48622 2m early ------models neuronal neuronal-specific rat N-dCKO [54] App/Aplp2 double- GSE48622 2m early ------promoter conditional KO human cytomegal 3-6m early ------ovirus neutralizing anti-nerve Other (CMV) AD11 [55] growth factor (NGF) GSE63617 early immunoglobulins 15m late yes yes yes yes region promoter

10

Table 1.2. Summary of Huntington’s disease mouse models analyzed. Samples are grouped into early or late phase based on the appearance of motor symptoms. --: Phenotype is not observed; NA: data not available; m: months Phenotype Model Repeat Promoter Mouse Models Data set Age Phase Motor Neuronal Weight Types size Symptoms Loss Loss 121-133 GSE62210 2m late yes -- -- human 2.5-3m late yes -- yes N-terminal R6/2 [42], [56], [57], <200 GSE48104 HTT transgenic 3m late yes -- NA promoter [58] 209 GSE9857 ~300 GSE26317 5m late yes NA yes 2m early -- NA -- BACHD-ΔN17 [59] 97 GSE64386 7m early minimal NA -- human Full-length 11m late yes NA yes HTT transgenic promoter 12m late yes yes weight YAC128 [60] 128 GSE19676 gain 24m late NA NA NA murine D9-N171-98Q [57], 98 GSE25232 14m late yes NA yes CAG repeat Darpp-32 [61] knock-in murine Hdh4_Q80 [62] 80 GSE9375 12m early ------Htt CHL2Q150 [63] 150 GSE10202 22m late yes -- yes 3m early -- NA -- HdhQ92 [64] 92 GSE7958 18m late NA NA NA GSE9038 1-2.5m early NA -- NA human HTT Q111 murine Hdh [64]–[66] 111 exon 1 GSE50379 3m early -- NA -- Htt knock-in 6m early -- NA -- HdhQ150 [67] 118-130 GSE32417 12m late yes NA -- 18m late yes NA yes

11

1.6 Cell-type proportion changes in the brains of AD and HD. Neurodegenerative disorders are generally characterized by neuronal loss and neuroinflammation [26], [68]. In AD patients and in some mouse models, loss of pyramidal neurons and other types of neurons in the hippocampus contributes to the neuronal loss [68], [69]. HD patients are characterized by progressive loss of MSNs in the striatum [26]. Therefore, cellular compositions can be very different between diseased and healthy brain samples as the disease progress. Most of the expression profiling studies that focus on AD and HD use bulk tissue of the affected region. Studies have pointed out that cell-type proportion changes are one of the driving forces for expression changes in bulk tissues [70], [71]. Therefore, both cellular composition differences and cell-type specific expression differences can contribute to the overall expression changes in bulk tissue samples between disease models and healthy controls. Some studies have assessed the cellular composition changes in the brains of AD and HD by examining the expression changes of cell-type specific marker genes. For example, a few studies have found significant expression changes of neuronal specific marker genes in whole brain tissues of human and mouse models of AD and HD and recognized expression profiles reflect cell population changes [70], [72], [73]. The transcriptome studies I used in this thesis are done on bulk tissue and cell counts of different cell types are not directly assessed on these brain samples. To address the cellular composition problem, methods for estimating cell-type proportions from bulk tissue transcriptome profiles have been developed, generally referred to as cell-type deconvolution. A commonly used method to deconvolute effects of cell-type proportion changes of bulk tissue is to estimate cell-type proportions by cell-type specific marker genes [74]. Recently, Mancarci et al. (2016) have assembled a comprehensive list of cell-type marker genes for 32 major brain cell types brain wide, and have developed a cell-type deconvolution method, which estimates marker gene profiles based on principal component analysis (PCA) of expression of the marker genes [75]. The marker gene profiles can be used as a proxy for cell-type proportions when cellular compositions are not directly measured. 1.7 Transcriptomic analyses in AD and HD mouse models. Transcriptomic analysis comparing disease mouse models and controls at different disease phases can reveal molecular and cellular changes specific to the disease phase. While there are multiple transcriptome profiling technologies (e.g. microarrays and RNA sequencing), 12 all of the studies in my project used microarrays. Microarray technology uses microscopic arrays of known probe sequences on glass slides to detect labeled sample RNA by hybridization followed by fluorescence microscopy [76]. With the rapid growth of microarray data, public repositories like Gene Expression Omnibus (GEO) and ArrayExpress were created to help researchers to efficiently share, search, and reuse valuable gene expression profile data sets [77], [78]. The availability of microarray data sets of independent but related studies enables meta- analysis of different AD and HD mouse models and test the hypothesis that gene expression profiles of different mouse models can reveal common expression changes. There are some expression profiling studies using brain samples of AD and HD mouse models in early disease phase. However, the findings reported in these studies are not completely concordant. More studies are available of AD and HD mouse models in the late phase, and these studies tend to have more similar findings based on what is reported in their associated publications. 1.7.1 Transcriptomic analyses in AD mouse models In expression profiling studies of AD mouse models in the early phase, up-regulation of genes in classical complement cascade and changes of synapse-related genes have been reported in early AD disease progression [52], [79], [80]. However, D’Onofrio et al. (2011) also report altered expressions of inflammation and immune response related genes, which are not observed in Hong et al. (2016) [79], [80]. Studies of AD mouse models in late phase often report expression changes of genes related to inflammation and immune system [50], [81], [82]. 1.7.2 Transcriptomic analyses in HD mouse models For the HD mouse models studies with early phase samples, one study found altered expressions of genes that are involved in energy metabolism, cell cycle, and other pathways in very early disease phase [65], while another study reported genes involved in chromatin structure

[83]. The mouse studies involved late phase samples reported dysregulated genes involved neuronal signaling, G-protein receptor signaling, transport, inflammation, insulin signaling among different HD mouse models [84]. 1.8 Meta-analysis of gene expression Meta-analysis, “the analysis of analyses”, is a process to use statistical techniques to combine and analyze information from multiple studies [85]. Meta-analysis has the benefits of enhanced statistical power and reliability of the results. It can overcome the limitations of a 13 single study (e.g. small sample size), resolve inconsistent results across studies, and discover new associations not detected in the original studies [86]. 1.8.1 Methods of meta-analysis One common approach used in meta-analysis is to combine summary results, such as ranks or p-values from individual studies [85], [87]. Another common meta-analysis approach is to integrate raw data from individual studies, and analyze them as a whole. This approach is also termed ‘mega-analysis’. Integrated microarray data sets are often treated the same way as a single study and can be analyzed by traditional microarray analysis methods [86]. Linear mixed- effects model (LMMs) methods are widely used in microarray analyses [88]. LMM can model multiple sources of variation, such as mouse model-specific effects, laboratory effects, and difference between disease models and controls [89]. LMM involves both fixed effects and random effects. Fixed effects generally refer to the factors that we are interested to estimate and it is assumed that each level of these factors has constant effects on the value measured (e.g. effects of treatment or sex). Random effects here refer to factors that contribute to the variation, but are not the subject of main interest. Levels of random effects are assumed to rise from normal distribution. For example, variations originated from different studies and microarray platforms can be modelled as random effects. One of my goals was to identify dysregulated genes during early disease phase shared across mouse models of the same disease by comparing gene expression levels of controls and disease models, which may implicate shared disease-causing mechanisms. The differences between control mice and disease models can be modelled as fixed effects. Since the disease samples came from different mouse models, the impact of model- specific effects that may be less reflective of the human disease should be minimized. The mouse-model-specific effects can be modelled as random effects in LMM. Therefore, I chose LMM as the meta-analysis method for my thesis. 1.8.2 Meta-analysis on gene expression in AD To the best of my knowledge, there is a lack of meta-analysis of gene expression in AD mouse models. A few meta-analyses have been performed on human samples. Winkler and Fox (2013) re-analyzed data from laser capture microdissected hippocampal neurons and grey matter of severe AD and controls from two studies by combining ranks of genes from each study. They identified dysregulated genes involved in transcription, translation, and cell death [90]. There is a

14 need to search for common gene expression changes across multiple AD mouse models to look for early molecular changes. 1.8.3 Meta-analysis on gene expression in HD Kuhn et al. (2007) performed a meta-analysis of transgenic and knock-in HD mouse models with striatal samples in early and late HD disease phases and reveal distinct signatures at different disease phases [72]. The signals detected in the early phase were reported to be involved in striatal signaling [72]. However, the meta-analysis was done a decade ago and there are recently published HD studies that used mouse models other than those listed in Kuhn et al. (2007) and with early mouse brain samples. Therefore, more HD mouse models can be incorporated for a more comprehensive meta-analysis, which may yield novel insights in HD pathogenesis. 1.9 Motivations While molecular changes revealed by investigations on human brain samples and mouse models in the late disease phase are of interest, they may reflect downstream effects and may not be easily related to early pathogenic processes. The research findings from the mouse models in early and late disease phases of the same disorder show some degree of agreement, which may indicate different mouse models share some pathological mechanisms. However, there are discrepancies in the findings, especially in the early phase. Differences in mouse models can be one of the contributing factors, but differences in experiment design, sample size, microarray platforms, and data processing methods can also influence the results. These observations motivated me to undertake a meta-analysis of gene expression of different mouse models, which can account for between-study differences. Meta-analysis of gene expression profiles of mouse models has the potential to identify cross-mouse-model and disease-phase-specific transcriptional changes. By examining early changes, we may infer disease-causing mechanisms while the late changes may reveal consequential changes in the diseased brain. Finally, cross- disease comparison may reveal shared mechanisms in AD and HD. In chapter 2, I will describe the selection and process of gene expression profiling data sets of AD and HD used in the meta-analysis, the application of linear mixed-effects models and marker gene profiles correction to prioritize genes in each disease phase of AD and HD respectively, and functional enrichment analysis of the gene rankings. I will describe the results in chapter 3. In chapter 4, I will discuss the findings of shared transcriptomic alterations in each 15 disorder and implications, especially early changes that may associate with pathogenesis, cross- disease commonalities, limitations and conclusions.

16

Chapter 2: Materials and Methods To identify expression alterations shared among different mouse models of AD and HD respectively and potential common changes shared between the two disorders, I conducted a systematic meta-analysis of gene expression in mouse models of AD and HD. In this chapter, I will present the materials and research methods. An overview is given in Figure 2.1. Analyses were performed in R version 3.3.1 [91]. R code used in the analysis is available from the author.

Figure 2.1. Overview of the workflow. AD: Alzheimer’s disease mouse models; HD: Huntington’s disease mouse models. Number of samples included samples of mouse models and controls.

2.1 Data retrieval from GEO I retrieved gene expression profiling studies of mice with disease keywords, “Alzheimer” or “Huntington” from Gene Expression Omnibus1 (GEO) and ArrayExpress2 [77], [78]. I further filtered the data sets based on the following three criteria: 1. Study has two or more mouse samples per group.

1 See http://www.ncbi.nlm.nih.gov/geo/, on February 1, 2016 2 See https://www.ebi.ac.uk/arrayexpress/, on February 1, 2016 17

2. Study contains mouse models of the disease, and proper wildtype samples for comparison. 3. Study must contain samples from brain regions where the first neuropathological changes occurred (i.e. hippocampal samples for AD, and striatal samples for HD). Initially 25 independent studies from GEO and ArrayExpress met the selection criteria. I downloaded expression data and experimental design meta data from GEO. Probeset and gene annotations of the corresponding Affymetrix, Illumina and Agilent platforms were obtained from Gemma, a resource for expression profiles re-use and meta-analysis [92]. Upon further examination, data sets GSE36981 and GSE18551 were removed, leaving a final selection of 23 data sets (Table 2.1). AD data set GSE36981 had samples processed in batches by mouse genotypes and therefore genotype effect was confounded by the batch effect. HD data sets GSE18551 and GSE19676 assessed the same striatal mRNA samples by different microarray platforms [93]. GSE18551 was discarded due to the number of outliers (see Section 2.2). For both AD and HD mouse models, most of the mouse models were used once in a specific study and were not used in other studies (18 different mouse models). Two mouse models were shared across two studies; one mouse model was used in three different studies; and one mouse model was used in four different studies. However, the studies used the same mouse model collected brain samples at different age (Table 1.1, Table 1.2). Overall, all of the studies analyzed used either unique mouse models, or mouse models of different age.

Table 2.1. Summary of selected gene expression profiling studies for AD and HD. Number of Number Total Number Number of Total Disease Phase Mouse of Disease Unique of Studies Controls Samples Models Samples Genes c AD early 4 9 47 69 116 10853 late 8 8 36 56 92 10366 HD early 6 5 28 28 56 13129 late 10 7 50 55 105 12454

Total 23 a 21 b 161 208 369 14524 c a Two AD studies and three HD studies are categorized in both early and late disease phases. b There are shared mouse models between early and late phases, and across studies. c Total unique genes are counts after removing genes with low expression values.

18

2.2 Data pre-processing and quality control To evaluate the array quality of the raw expression data of Affymetrix arrays obtained from GEO, I applied quality control procedures as described in [94]. Briefly, the quality of each microarray was assessed by the 3'/5' ratio for RNA quality, the scale factor, average background, and percent present using Bioconductor package simpleaffy [95]; then by relative log expression (RLE), normalized unscaled standard errors (NUSE) using Bioconductor package affyPLM [96]. Samples were removed if the arrays failed two or more of the above measurements. All samples from the targeted brain tissues using the Affymetrix arrays passed the quality control procedures. Such raw data quality control procedures were not available for Agilent and Illumina arrays. To standardize data processing, Affymetrix and Agilent arrays were Robust Multi-Array Average (RMA) background corrected by affy [97] and limma [98] R packages followed by quantile normalization and log2 transformation. Illumina arrays were quantile normalized and log2 transformed. Samples with brain tissues other than hippocampus or striatum and non- disease mouse models were discarded after normalization (Table 2.2). I applied further quality control by identifying samples that were two standard deviations away from the mean sample-to-sample Pearson correlation within a data set. Two 18-month-old TAU mice and one 2-month-old TAU mouse from data set GSE64398 were outliers. HD data set GSE19676 had one outlier (Table 2.2). Samples were removed when controls and disease model samples were batch-confounded or one of the batch only contained one sample (Table 2.2). After excluding all disqualified samples, I corrected batch effect for each data set by ComBat [99] if batch information was available. 2.3 Dividing samples into early and late disease phases and combining data sets. To define early and late phases of AD and HD, I used the time point when mouse models first develop phenotypes that are similar to the earliest clinical symptoms for diagnosis in AD and HD. The mouse phenotypes of AD and HD mouse models analyzed were based on the behavior data from original publications, or the publications cited in the original paper (Table 1.1 and Table 1.2). Mild cognitive decline is an early diagnostic symptom in AD patients [35]. Cognitive impairment is often assessed by water maze test in AD mouse models. Therefore, AD mouse samples that did not display impairment in memory and learning measured by water maze tests were categorized as early phase AD samples, while the rest as late phase AD samples (Table 1.1). In HD patients, motor symptoms such as involuntary movement are some of the 19 earliest symptoms for diagnosis [27]. Motor deficits are often measured by rotarod test and gait test in HD mouse models [41]. For HD mouse models, the absence/present of motor symptoms as measured by rotarod test and gait test defined the samples as early and late phase respectively (Table 1.2). Expression profiles of samples from different studies were then aggregated into four integrated groups by diseases and phases: samples of early AD, late AD, early HD, and late HD phases. To allow cross-platform comparison, within each data set, I removed non-specific probes (i.e. probes that mapped to multiple genes), probes that did not map to any genes, and probes that contained missing expression values in one or more samples. When more than one probe mapped to a gene, I retained only the probe with the highest median expression value to represent the mapped gene. Not all the genes are available on all platforms used by the studies; I selected genes that were present in more than ⅔ of the platforms as a compromise between maximizing the number of genes in the analysis and the requirement to have multiple measurements to perform a meta-analysis. For each disorder, two integrated data sets were created by combining samples across studies from each disease phase. Within each integrated data set, gene expression values were quantile normalized to harmonize scales across studies. I then filtered each data set to remove non-expressed genes. To set the threshold for filtering, I was guided by the expression level of sex-specific genes [100]. The signal for sex-specific genes in the non-expressing sex (e.g., Y-linked genes in females) can be taken as a rough indicator of background levels. The median expression value of non-expressed sex-specific genes from all samples was 5.2 and thus I filtered genes with expression value lower than 6 as a more stringent threshold. See Table 2.1 for the number of genes in each disease phase after gene filtering. Most of AD mouse models analyzed in this project were transgenic mouse models with transgenic genes under the control of murine Thy1 tissue-specific regulatory elements (Table 1.1). The microarray probesets mapped to these transgenes and the endogenous copy, and artificially increased the expression of Thy1 in transgenic mouse models. Therefore, Thy1 was removed from the meta-analysis in AD mouse models. 2.4 Estimate cell-type proportion changes To estimate relative cell-type proportions of disease models and controls, I applied the marker gene profiles estimation method described in Mancarci et al. (2016) [75], to the integrated expression profiles of each disease phase from AD and HD separately (total of four 20 sets). Mancarci et al. (2016) provide lists of marker genes for over 30 cell types for multiple brain regions (including hippocampus and striatum) and the marker gene profiles can be used as a proxy for cell-type proportion changes [75]. For each disorder, expressions of the pre-selected marker genes were first corrected for between-study variations for each disease phase and then used as input for marker gene profiles estimation (see Sec.2.5.1 for detail). I estimated marker gene profiles with the cell-type specific marker genes provided in Mancarci et al. (2016) [75]. For AD samples, profiles for three glial cell-types (microglia, astrocytes, oligodendrocytes) and three neuronal cell types (pyramidal cells, dentate granule cells, GABAergic cells) were estimated from markers specific to the hippocampus. For HD samples, profiles of the same three glial cell types and two neuronal cell types (cholinergic neurons and medium spiny neurons (MSNs)) were estimated from markers specific to the striatum. Marker gene profiles were normalized to a range between 0 and 1, where the sample with the highest profile was assigned to 1 and the lowest was assigned to 0. To test whether the profiles were significantly different between disease mouse models and controls for each cell type, I applied Wilcoxon rank sum test and estimated the false discovery rates (FDR) using Benjamini-Hochberg procedure [101]. 2.5 Fitting linear mixed-effects models and applying jackknife procedure to rank genes Linear mixed-effects models (LMMs) allow modelling multiple sources of variation, such as mouse model specific effects, laboratory effects, and difference between disease models and controls [89]. For each disease phase group (four groups in total), I fitted two linear mixed- effect models for each gene using the “lmer” function in R package lme4 version 1.1-12, via maximum likelihood estimation (lmer(REML = F) 3) [102]. The first LMM corrected for between-study variations without correction for the marker gene profiles; the second LMM corrected for both between-study variations and marker gene profiles. After fitting LMM, I performed an analysis of variance (ANOVA) to test for the significance of the fixed effect of disease state (i.e. disease and normal states) and obtained a p-value for each gene by the “anova” function in R package stats version 3.3.1 [103]. Ranking of up or down-regulated genes was based on p-values in ascending order and the direction of expression changes between

3 The lmer function uses restricted maximum likelihood (REML) estimation by default. By setting REML = F, the lmer function fits LMM with maximum likelihood, which allows comparison of LMMs that have different fixed effects. 21 mouse models and controls. FDR estimated by the Benjamini-Hochberg procedure was computed for each gene. Significantly differentially expressed (DE) genes had FDR < 0.05. A jackknife procedure was applied to yield more robust gene rankings. Details are given in the following sections. 2.5.1 Linear mixed-effects model to correct for between-study variations. I fitted a LMM on a gene-specific level to correct for between study variations, without considering the effects of cell-type proportion differences between disease mouse models and control animals. The model included a fixed effect estimating the expression difference of a gene between disease and normal states. This is the main effect that I am interested in as the expression changes of a gene between disease models and controls may be correlated with disease progression. Other than differences in disease states, source of variations also included experimental design and procedures, microarray platforms, mouse models, sample age, gender, and other unobserved source of variations. Such variations were not the subject of interest but influenced the expression level. Most data sets here contained unique mouse models sampled at specific age and some platforms were used in only one study (Table 2.3 and Table 2.4). These sources of variations were confounded with the study design and thus impossible to model as separate random effects. Instead, I estimated one combined random effect (effect of the study) to incorporate variations from the abovementioned sources, assuming studies are independent from each other. A random intercept model was applied to estimate separate intercepts for each study. Interaction effects between disease states and mouse models could not be modelled because study and mouse models were confounded. Therefore, I chose to use an additive model. For each gene in a specific disease phase the expression level (y) of sample i in study k is: Equation 1 푦푖푘 = 훽1푋푖푘 + 훼푘 + 휀푖푘

Where 훼푘 ~푁(0, 훿), 휀푖푘 ~푁(0, 휎) 1, 푑푖푠푒푎푠푒 푋 = { 푖푘 0, 푛표푟푚푎푙

훼푘 is the random effect representing the combined random effects from study k, modelled as following a normal distribution with variance 훿. Studies were assumed to be independent. 훽1 is

22 the effect size coefficient of the fixed effect of disease states. 휀푖푘 is the error term (normally distributed with variance 휎). The expressions of pre-selected marker genes were corrected for the study variations before they were used as input for cell-type proportion estimations (see Sec. 2.4). I adjusted the input expressions of each marker genes by subtracting the estimated 훼푘 from the observed expressions. 2.5.2 Linear mixed-effects model to correct for between-study variations and cell-type proportion changes Correcting for the cell-type proportion changes between disease and normal states may reveal underlying expression changes at transcriptional level rather than cellular composition differences. To estimate the fixed effect of the expression difference between disease mouse models and control animals of a gene, while correcting for effects of study and cell-type proportion changes estimated by marker gene profiles (MGP), I modelled estimated marker gene profiles of each cell type as fixed effects in additional to the random effect of study and fixed effect of disease states in the LMM. For AD mouse models, marker gene profiles of hippocampal glial cells (microglia, astrocytes and oligodendrocytes) and neurons (pyramidal cells, dentate granule cells, GABAergic cells) were included in the LMM. For each gene in a specific AD disease phase, the expression level (y) of sample i in study k is: Equation 2 푦푖푘 = 훽1푋푖푘 + 훼푘

+ 훽2푀푖푐푟표𝑔푙푖푎 푀퐺푃 + 훽3퐴푠푡푟표푐푦푡푒 푀퐺푃

+ 훽4푂푙푖𝑔표푑푒푛푑푟표푐푦푡푒 푀퐺푃

+ 훽5푃푦푟푎푚푖푑푎푙 푐푒푙푙 푀퐺푃

+ 훽6퐺퐴퐵퐴푒푟𝑔푖푐 푐푒푙푙 푀퐺푃

+ 훽7퐷푒푛푡푎푡푒 𝑔푟푎푛푢푙푒 푐푒푙푙 푀퐺푃 + 휀푖푘

Where 훼푘 ~푁(0, 훿), 휀푖푘 ~푁(0, 휎)

23

1, 푑푖푠푒푎푠푒 푋 = { 푖푘 0, 푛표푟푚푎푙

훼푘 is the random effect representing the combined random effects from study k, modelled as following a normal distribution with variance 훿. Studies were assumed to be independent. For HD mouse models, striatal neuronal cell types included cholinergic neurons and medium spiny neurons. For each gene in a specific AD disease phase, the expression level (y) of sample i in study k is: Equation 3 푦푖푘 = 훽1푋푖푘 + 훼푘

+ 훽2푀푖푐푟표𝑔푙푖푎 푀퐺푃 + 훽3퐴푠푡푟표푐푦푡푒 푀퐺푃

+ 훽4푂푙푖𝑔표푑푒푛푑푟표푐푦푡푒 푀퐺푃

+ 훽5퐶ℎ표푙푖푛푒푟𝑔푖푐 푛푒푢푟표푛 푀퐺푃

+ 훽6푀푒푑푖푢푚 푠푝푖푛푦 푛푒푢푟표푛 푀퐺푃 + 휀푖푘

Where 훼푘 ~푁(0, 훿), 휀푖푘 ~푁(0, 휎) 1, 푑푖푠푒푎푠푒 푋 = { 푖푘 0, 푛표푟푚푎푙

훽2 to 훽6 are the coefficients of the fixed effects of marker gene profiles of each cell type. 2.5.3 Jackknife procedure for gene ranking When a study has a strong influence on the gene ranking, the ranking would change considerably when the study is removed from the meta-analysis. To improve robustness against strong influence from a single study, I applied a jackknife procedure. Given k studies in a disease phase, the jackknife procedure involved repeating the meta-analysis k times. For each iteration, ranking of a gene was estimated by leaving out samples from the 푗푡ℎ study ( j ∈ {1…k}) (i.e. “leave-one-out” gene ranking), thus removing the influence of the 푗푡ℎ study. After k iterations, k sets of “leave-one-out” gene rankings (R푗, j ∈ {1…k}) estimated from each k-1 studies were

24 obtained and the average rank of a gene (R̅) was calculated: 푘 1 푅̅ = ∑ 푅 푘 푗 푗=1

The final gene rankings were assigned by ranking the average rank (R̅) from all genes in ascending order. 2.6 Functional enrichment analysis To infer functional roles of the ranked list of genes from the jackknife procedure (see Sec. 2.5.3), I performed functional enrichment analysis using ErmineJ version 3.0.2 [104] to determine enrichment of Gene Ontology (GO) terms for my ranked list. GO terms are hierarchical controlled vocabularies of gene products annotations in three categories: “biological process”, “molecular function”, “cellular component” [105]. I applied the precision-recall method in ErmineJ, which is a rank-based method scored by the average precision [106]. The input was the ranked up or down-regulated list of genes for each disease phase, and the background gene list was limited to the expressed genes of the specific brain region of the disease phase derived from the sample arrays. Each run tested against gene set sizes between 10 and 500 for 200000 iterations. ErmineJ can account for gene multifunctionality, which refers to when genes are annotated in multiple pathways or GO terms. Such “multifunctional” genes when highly ranked can impact the enrichment analysis [106]. Therefore, gene functional groups were prioritized by multifunctionality score (lower scoring-pathways were given higher priority).

25

Table 2.2. Samples removed from analysis in each study. Percentage of samples removed from total samples of the data set are shown in parentheses. Total Disease Study Samples Removed Removed Removed entorhinal cortex samples, entorhinal brain-derived neurotrophic factor treated AD GSE14499 20 (76.9%) samples, and wildtype mice underwent sham surgery. GSE36237 48 (75%) Removed prefrontal cortex samples and phosphodiesterase 9A treated samples. GSE53480 8 (50%) Removed transcription factor EB treated samples. Removed basal forebrain and cortex samples; hippocampal samples from 1-month-old mice GSE63617 89 (74.8%) were removed because genotype effect was confounded by the batch effect. Removed cerebellum and cortex samples. GSM1570447 was the only sample in batch 2013- GSE64398 226 (67.9%) 06-26 and was removed; two 18-month-old TAU mice (GSM1570269, GSM1570366) and one 2-month-old TAU mouse (GSM1570292) were outliers and removed. HD GSE9038 12 (50%) Removed cerebellum samples. GSE19676 1 (5.6%) GSM491339 was an outlier and removed. Removed cerebellum, cerebral cortex samples, and samples treated with histone deacetylase GSE26317 24 (51.1%) (HDAC) inhibitor, HDACi 4b. Downstream Regulatory Element Antagonist Modulator (DREAM) knockout mice were GSE48104 8 (50%) removed. GSE50379 6 (50%) Removed mGluR5 knockout mice. GSE62210 6 (50%) Removed p62 knockout mice.

26

Table 2.3. Summary of selected AD mouse model studies. Number of control and case samples are shown in parentheses (control/case). Genes refer to the number of unique genes mapped. Amyloid: Amyloid transgenic models, TAU: TAU transgenic models, KO: Knock-out models Model Types Mouse Model(s) Study (Data set) Phase(s) Samples Platform Genes

Amyloid Tg2576 GSE36237 [107] early 16 (8/8) GPL1261 18118 GSE1556 [45] late 4 (2/2) GPL81 8237 GSE15056 [108] late 4 (2/2) GPL7202 19459

5xFAD GSE52022 [109] late 4 (2/2) GPL1261 18118 GSE50521 [82] late 12 (6/6) GPL6096 16743

J20 GSE14499 [110] late 6 (2/4) GPL1261 18118 Amyloid TAS10, TPM, TASTPM GSE64398 [50], early, late 108(38/70) GPL6885 17339 [111] TAU TAU TAU rTg4510 GSE53480 [112] late 8 (4/4) GPL1261 18118 KO Aplp2 KO, App KO, App/Aplp2 double- GSE48622 [113] early 16 (4/12) GPL1261 18118 conditional KO (NdC- KO) Other Anti-NGF AD11 GSE63617 [79] early, late 30 (15/15) GPL7042, 19459 (AD11) GPL7202

27

Table 2.4. Summary of selected HD mouse model studies. Number of control and case samples are shown in parentheses (control/case). Genes refer to the number of unique genes mapped. N-terminal: N-terminal transgenic models, CAG KI: CAG repeat knock-in, exon 1 KI: Human HTT exon 1 knock-in, Full-length: Full-length transgenic models Model Mouse Study (Data set) Phase(s) Samples Platform Genes Types Model(s) N-terminal R6/2 GSE62210 [114] late 6 (3/3) GPL6246 20841 GSE48104 [56] late 8 (4/4) GPL1261 18118 GSE9857 [72] late 18 (9/9) GPL1261 18118 GSE26317 [58] late 8 (4/4) GPL6103 17544 CAG KI D9-N171-98Q GSE25232 [57] late 7 (3/4) GPL6885 17339 Hdh480Q GSE9375 [72] early 6 (3/3) GPL81 8237

CHL2Q150 GSE10202 [72] late 8 (4/4) GPL1261 18118

exon 1 KI HdhQ92 GSE7958 [72] early, late 12 (6/6) GPL1261 18118 HdhQ111 GSE9038 [65] early 12 (6/6) GPL1261 18118 GSE50379 [66] early 6 (3/3) GPL6246 20841

HdhQ150 GSE32417 [83] early, late 29 (14/15) GPL6246 20841 Full-length BACHD-ΔN17 GSE64386 [59] early, late 24 (12/12) GPL6885 17339 YAC128 GSE19676 [93] late 17 (7/10) GPL6333 20551

28

Chapter 3: Results I re-analyzed 369 gene expression profiles from 10 Alzheimer’s disease model (AD) data sets and 13 Huntington’s disease model (HD) data sets. After data pre-processing and quality control, samples were categorized into the early and late disease phases based on the reported appearance of cognitive impairment in AD mouse models and motor symptoms in HD models. Marker gene profiles were estimated by a set of pre-selected markers for different neuronal and glial cell types for each disease phase and used as a proxy for cell-type proportion changes. Differentially expressed (DE) genes between the disease states (i.e. disease and normal states) were identified by fitting linear mixed-effects models (LMMs), which were corrected for between-study variations and marker gene profiles. To improve robustness, I applied a jackknife procedure to prioritize genes (see Figure 2.1 for workflow and Chapter 2: Materials and Methods for details). With the goal to identify shared transcriptional alterations among different mouse models of each disease in the early and late phases, I conducted four separate meta-analyses: analysis on samples from early AD, late AD, early HD, and late HD phases. Each analysis yielded two ranked gene lists (up-regulated gene list and down-regulated gene list), followed by functional enrichment analysis on each gene list. The full results of ranked gene lists, enrichment analysis, and marker gene profiles are available from the author. The results of marker gene profiles estimations were largely consistent with the expected cell-type changes in the early and late phase mouse samples. The decrease in number of DE genes before and after marker gene profiles correction in the late phase (Figure 3.5 and Figure 3.8) suggested that the gene expression changes were at least substantially driven by the changes of marker gene profiles. In AD mouse models, the early changes were subtle and the late changes showed stronger signals in the top-ranked genes. The HD mouse models, in contrast, showed relatively strong signals in the top-ranked genes since the early phase. Within the data sets analyzed, there was little evidence indicating cross-disease similarity in the early phase. However, there were a few enriched Gene Ontology (GO) terms in top-ranked genes overlapped in both diseases in the late phase, such as “G-protein coupled receptor signaling pathway” and “regulation of MAPK cascade”.

29

3.1 Estimation of cell-type proportion changes Neurodegenerative disorders (NDs) are generally characterized by neuronal loss and neuroinflammation and cellular composition changes become more prominent as the disease progresses [26], [68]. Since bulk tissue transcriptome profiles represent overall average gene expressions for all cell types, cell-type proportion estimation is needed to estimate proportions of overall expression changes due to cellular composition changes. Overall, the marker gene profiles estimations were largely consistent with the expected changes in the early and late phase mouse samples: relative to controls, mouse model samples in the early phase were expected to show minimal or small cellular composition changes and late phase samples would show more significant changes. The marker gene profiles provided a reliable proxy for cell-type proportion estimations. The estimations can be used in LMMs to adjust for gene expression changes that can be explained by maker gene profiles changes. The number of significant DE genes (FDR < 0.05) in the late disease phase of AD and HD was dramatically reduced after marker gene profiles correction (Figure 3.5 and Figure 3.8). An explanation was that the gene expression changes in the late phase were at least substantially driven by the changes in cellular proportions as estimated by marker gene profiles. The LMMs that corrected for marker gene profiles had additional marker gene profile parameters (Equation 2,3), and were more complex compared to the LMM without marker gene profiles correction (Equation 1). The added complexity could reduce the power to detect DE genes and resulted lower number of DE genes after correction. However, the gene rankings changed considerably after the marker gene profiles correction in the late phase, which is unlikely the results of a more complex LMM. The Spearman correlation coefficients of before and after correction in the late phase is 0.38 and 0.48 for AD and HD gene rankings respectively, compared correlation coefficients in the early phase (0.78 and 0.71 for AD and HD respectively). The marker gene profiles correction affected both the number of DE genes and the gene rankings, which indicated that cellular composition changes (as measured by marker gene profiles) contributed to bulk tissue gene expression changes, especially in the late disease phase. 3.1.1 Estimation of cell-type proportion changes in AD mouse models During the early phase in AD, marker gene profiles estimations showed that there were no significant changes in neurons and glial cells (Figure 3.1 and Figure 3.2). During late phase, AD mouse models had estimated reduced dentate granule cells and pyramidal cells (Figure 3.1), 30 and increased astrocytes and microglia (Figure 3.2). The number of significant DE genes in the late phase was greatly reduced after marker gene profiles correction:13.73% of the total genes were DE genes before correction and dropped to 0.16% after correction (Figure 3.5). 3.1.2 Estimation of cell-type proportion changes in HD mouse models In contrast to AD, the estimations of HD mouse models in the early phase suggested significant changes in medium spiny neuron (MSN) profiles before motor symptoms (Figure 3.3). During late phase only, the estimated profiles indicated decreased expressions of MSN markers, cholinergic neuron markers and astrocyte markers, and increased expressions of oligodendrocyte markers (Figure 3.3 and Figure 3.4). In the late phase, 30.19% of the total genes were DE genes before correction, and 13.9% after correction (Figure 3.8). I will present my meta-analysis results in AD and HD in the next two sections. The results were based on ranked gene lists after marker gene profiles correction, unless specified.

31

Figure 3.1. Marker gene profiles of neurons in AD mouse models. Marker gene profiles are normalized to 0 and 1. FDR ** < 0.05, *** <0.01.

32

Figure 3.2. Marker gene profiles of glial cells in AD mouse models. Marker gene profiles are normalized to 0 and 1. FDR ** < 0.05, *** <0.01.

33

Figure 3.3. Marker gene profiles of neurons in HD mouse models. Marker gene profiles are normalized to 0 and 1. FDR ** < 0.05, *** <0.01.

34

Figure 3.4. Marker gene profiles of glial cells in HD mouse models. Marker gene profiles are normalized to 0 and 1. FDR ** < 0.05, *** < 0.01.

35

3.2 Meta-analysis of gene expression in Alzheimer’s disease mouse models The meta-analysis of 116 gene expression profiles of mouse models and controls in the early phase revealed small but consistent expression changes (Figure 3.6). Only four up- regulated and three down-regulated genes of the top 40 hits were significant at an FDR < 0.05 (Table 3.2 and Table 3.3). A few of the top hits were functionally related based on functional enrichment analysis. Three of the top up-regulated genes (Sqle, Msmo1, Nsdhl) are involved in cholesterol biosynthesis. C1qa, C1qb and C1qc, which are involved in the classical complement cascade, were among the top up-regulated gene in the early phase (Table 3.2). In contrast to the subtle changes in the early phase, the expression change signals of the top hits were stronger in the late phase (Figure 3.7). The same cholesterol-related genes observed as up-regulated in the early phase (Sqle, Msmo1, Nsdhl) were also among the top up-regulated genes in the late phase, suggesting cholesterol biosynthesis could be chronically up-regulated in AD mouse models since the early phase. The analysis also detected up-regulation of a known AD risk gene, Trem2, which had been reported in the original publications of two studies analyzed (Table 3.10). Several GO terms were found significantly enriched in down-regulated top genes, including “regulation of MAPK cascade”, “G-protein coupled receptor signaling pathway”, “dendrite”, “regulation of neurogenesis” and others. As expected, the marker gene profiles correction affected the ranking of marker genes, especially in the late phase. In the late phase, there were 16 up-regulated microglia markers, 2 up-regulated astrocyte markers, and 2 down-regulated neuron markers out of the top 100 dysregulated genes before correction (Table 3.1). After correcting for the marker gene profiles, the number of cell-type markers in the top hits were drastically reduced in the late phase (Table 3.1). The substantial drop of the number of DE genes in the late phase after marker gene profiles correction (from 13.73% of total gene down to less than 1%) suggested expression changes in the late AD were substantially driven by the changes of marker gene profiles (Figure 3.5). Interestingly, the enrichment analysis results before correcting for marker gene profiles showed that GO terms “immune response”, “inflammatory response” were significantly enriched in up- regulated top genes in the late phase. However, some of these “immune response”, “inflammatory response” annotated genes were used as microglia markers in the marker gene profiles estimation, and changed the enrichment analysis results after correction in the late phase (both GO terms had FDR > 0.9 after correction). 36

The gene rankings of early and late AD showed low correlation, indicating gene rankings may be phase specific. The Spearman correlation coefficient for gene ranking of all the genes between early and late phase was 0.25 before marker gene profiles correction and 0.28 after. There were very few genes overlapped among the top 100 dysregulated genes across disease phases. Specifically, only the top genes involved in cholesterol biosynthesis (Sqle, Msmo1, Nsdhl) were dysregulated in concordance in both phases (FDR < 0.05) for both before and after marker gene profiles correction gene rankings.

Figure 3.5. Number of differentially expressed genes (FDR < 0.05) in AD mouse models before and after marker gene profiles correction. On top of each bar are the number of differentially expressed (DE) genes and percentage of DE genes over the total genes tested.

37

Figure 3.6. Top 20 up and down-regulated genes for AD early phase after marker gene profiles correction. Expression values are corrected for studies and marker gene profiles. For display purposes, each row is z-score transformed expression of a gene. Grey cells represent missing values. Each column is a brain sample with control samples marked as red and AD mouse models marked as blue. The horizontal line separates the up-regulated genes (top) and the down-regulated genes (bottom).

38

Figure 3.7. Top 20 up and down-regulated genes for AD late phase after marker gene profiles correction. Expression values are corrected for studies and marker gene profiles. For display purposes, each row is z-score transformed expression of a gene. Grey cells represent missing values. Each column is a brain sample with control samples marked as red and AD mouse models marked as blue.

39

3.3 Meta-analysis of gene expression in Huntington’s disease mouse models Different from the subtle expression changes observed in early AD, top-ranked genes identified from 56 early HD mouse samples showed consistent and strong signals of differential expression after marker gene profiles correction (Figure 3.9). Mitochondrial genes were enriched in the down-regulated top genes. Some of these genes encode subunits of mitochondrial respiratory chain complexes, which are embedded in the inner mitochondrial membrane. Ndufa12 and Ndufs3 encode subunits of mitochondrial complex I and Coa3 encodes a subunit of mitochondrial complex IV [115], [116]. A few other mitochondrial-related genes were also down-regulated. Clpx encodes a AAA+ chaperone protein in the mitochondrial matrix, and Ptpmt1 produces a mitochondrial phosphatase that is localized within inner mitochondrial membrane [117], [118]. Down-regulation of mitochondrial membrane genes and related genes suggested that mitochondrial dysfunction may happen early in the HD disease progression in mouse models. Among the top 20 up-regulated genes, five of them (Prkcq, Sema3e, Dusp26, Myadm, Serpine2) are annotated with the GO term “regulation of cell adhesion”, and Rab11b from the Rab11 subfamily is linked to early synaptic dysfunction before the onset of HD [119]. Top genes in the late phase showed even stronger signals in expression changes when compared to early phase. Five of the top 20 up-regulated genes were annotated with GO term, “positive regulation of hormone secretion pathway”. Among these tops genes, Doc2b is a calcium sensor involved in neurotransmitter secretion [120]. Isl1 is expressed in striatal cholinergic interneurons and may play a role to maintain neuronal survival [121]. Tac1 encodes Substance P, a neuropeptide released by a subpopulation of MSNs [122]. Many GO terms were significantly enriched in the top down-regulated genes in late HD, including “neuron projection development”, “G-protein coupled receptor signaling pathway”, “regulation of MAPK cascade” and other terms. Similar to the results of changed marker gene rankings after marker gene profiles correction in AD meta-analysis (Section 3.2), top-ranked down-regulated MSN markers before correction were no longer among the top-ranked 100 genes after correction (Table 3.1). In the late HD phase, the marker gene profile changes were able to explain some of the gene expression changes between mouse models and controls, and reduced the number of DE genes by more than 50% (Figure 3.8). However, a large proportion (13.9%) of the genes were still significantly dysregulated after marker gene profiles correction (Figure 3.8). The gene rankings of early and 40 late HD showed Spearman’s correlation coefficient of 0.38 before marker gene profiles correction and 0.09 after. There were 6 overlapped top 100 dysregulated genes between early and late phase before correction (Dhrs1, Hdac1, Mapk8ip2, Smoc1, Ube2l3, Zyx), and 4 genes after correction (Enpp6, Hhip, Plxnb3, Wnt2). The results suggested gene rankings were likely phase specific and marker gene profiles correction significantly affected gene rankings in HD.

Figure 3.8. Number of differentially expressed genes (FDR < 0.05) in HD mouse models before and after marker gene profiles correction. On top of each bar are the number of differentially expressed (DE) genes and percentage of DE genes over the total genes tested.

41

Figure 3.9. Top 20 up and down-regulated genes for HD early phase after marker gene profiles correction. Expression values are corrected for studies and marker gene profiles. For display purposes, each row is z-score transformed expression of a gene. Grey cells represent missing values. Each column is a brain sample with control samples marked as red and HD mouse models marked as blue.

42

Figure 3.10. Top 20 up and down-regulated genes for HD late phase after marker gene profiles correction. Expression values are corrected for studies and marker gene profiles. For display purposes, each row is z-score transformed expression of a gene. Grey cells represent missing values. Each column is a brain sample with control samples marked as red and HD mouse models marked as blue.

43

Table 3.1. Top-ranked cell-type marker genes before and after marker gene profiles correction. Presence of cell-type markers in the top up-regulated genes and top 50 down-regulated genes in each early and late phase diseases before and after marker gene profiles correction. MGP: marker gene profiles. ●: Genes are not differentially expressed (DE) before correction. *: Genes are DE but not top 50 ranked before correction. Before MGP correction After MGP correction Disease Phase Cell type Regulation Count Markers Count Markers AD early Astrocytes up ------down 4 Cyp2d22, Hsd11b1, Osmr, Slc15a2 3 Cpq, Fgfr3, Lamb2 Microglia up -- -- 1 C1qc down 1 Irgm2 3 Cd52, Herpud1, Irgm2 Oligodendrocytes up -- -- 1 Ugt8a down ------Dentate granule up -- -- 1 Cdhr1 cells down 1 Trpc6 -- -- GABAergic cells up ------down 1 Entpd4 1 Entpd4 late Astrocytes up 2 Gfap, Slc14a1 1 Slc14a1 down 1 Fjx1 3 Atp1a2●, Mfge8, Slc1a3● Microglia up 16 Anxa3, C1qc, Cd52, Cd68, Ctsh, 2 Cd68, Tyrobp Ctsz, Cyba, Grn, Gusb, Havcr2, Ifngr1, Lag3, Ly86, Slc11a1, Tgfbr2, Tyrobp down -- -- 1 Crybb1● GABAergic cells up ------down 1 Cort -- -- Pyramidal cells up ------down 1 Nr4a2 -- --

44

Before MGP correction After MGP correction Disease Phase Cell type Regulation Count Markers Count Markers HD early Astrocytes up ------down 2 Gldc, Slc1a2 1 Slc1a2 Oligodendrocytes up 1 Gltp 1 Gltp down ------Cholinergic up ------neurons down 1 Trpc3 -- -- Medium spiny up -- -- 3 Chst15●, Lmo7●, Rbfox3● neurons down 5 Dmkn, Gabrd, Homer1, Pde10a, -- -- Pde1b late Astrocytes up ------down 2 Fads1, Mfge8 1 Fads1 Oligodendrocytes up ------down -- -- 4 Fa2h*, Mal*, Tspan2*, Ugt8a* Medium spiny up -- -- 2 Nsg2●, Rgs8* neurons down 8 Dmkn, Gabrd, Homer1, Itpka, Itpr1, -- -- Pde10a, Ppp1r1b, Rgs9

Rgs8 was down-regulated (FDR < 0.05) before correction and up-regulated after correction.

45

Table 3.2. Top 50 up-regulated genes for AD mouse models in the early phase after marker gene profiles correction. P-values, fixed effect coefficient of disease states (훽1) are calculated from the linear mixed- effects model that included all samples of the disease phase and adjusted for study and marker gene profiles. In the case of tied ranks, the averaged rank is assigned. SE: standard error. AD early phase: up-regulated top genes

Rank Gene Name P-value FDR 휷ퟏ 휷ퟏ SE 1 Prokr2 prokineticin receptor 2 1.80E-04 0.0835 0.2344 0.0608 2 Sqle squalene epoxidase 9.70E-06 0.0132 0.1754 0.0381 3 Msmo1 methylsterol monoxygenase 1 5.82E-06 0.0105 0.18 0.0381 4.5 Enkur enkurin, TRPC channel 1.83E-04 0.0835 0.2397 0.0623 interacting protein 4.5 Ring1 ring finger protein 1 1.15E-06 0.0031 0.1843 0.0361 6 Zfp330 zinc finger protein 330 3.57E-06 0.0078 0.1113 0.023 7 Nsdhl NAD(P) dependent steroid 1.98E-04 0.0835 0.1684 0.044 dehydrogenase-like 8 Bdh2 3-hydroxybutyrate 2.72E-04 0.1018 0.1764 0.0472 dehydrogenase, type 2 9 Fam229b family with sequence 1.85E-04 0.0835 0.1031 0.0268 similarity 229, member B 10 Smarcb1 SWI/SNF related, matrix 4.95E-04 0.1293 0.0693 0.0194 associated, dependent regulator of chromatin, subfamily b, member 1 11 2700094K13Ri RIKEN cDNA 2700094K13 6.21E-04 0.1414 0.108 0.0308 k gene 12 C1qb complement component 1, q 3.77E-04 0.1153 0.102 0.028 subcomponent, beta polypeptide 13 Slc29a1 solute carrier family 29 8.02E-04 0.1476 0.1073 0.0313 (nucleoside transporters), member 1 14 Rasal2 RAS protein activator like 2 4.04E-04 0.1153 0.1758 0.0464 15 Dynlrb1 light chain roadblock- 3.02E-03 0.227 0.0666 0.0221 type 1 16 Nckap1 NCK-associated protein 1 9.18E-04 0.1533 0.1249 0.0369 17 Ube2b ubiquitin-conjugating enzyme 9.80E-04 0.1564 0.0613 0.0182 E2B 18 Etnk1 ethanolamine kinase 1 9.60E-04 0.1564 0.1869 0.0535 19.5 Epha3 Eph receptor A3 2.46E-03 0.207 0.1245 0.0404 19.5 Dbnl drebrin-like 2.50E-03 0.207 0.0683 0.0222 21 Tecr trans-2,3-enoyl-CoA 5.15E-03 0.2732 0.0668 0.0235 reductase 22 Tonsl tonsoku-like, DNA repair 1.21E-03 0.1693 0.1248 0.0377 protein 23 Acat2 acetyl-Coenzyme A 1.37E-03 0.1746 0.1099 0.0335 acetyltransferase 2 24 Mocs3 molybdenum cofactor 2.26E-03 0.2045 0.1073 0.0335 synthesis 3

46

AD early phase: up-regulated top genes

Rank Gene Name P-value FDR 휷ퟏ 휷ퟏ SE 25 Pdzd3 PDZ domain containing 3 7.87E-04 0.1476 0.141 0.041 26 Dhx35 DEAH (Asp-Glu-Ala-His) 1.82E-03 0.1797 0.0957 0.0301 box polypeptide 35 27 Rnase1 ribonuclease, RNase A 5.74E-04 0.1385 0.2237 0.0633 family, 1 (pancreatic) 28 C1qa complement component 1, q 3.96E-04 0.1153 0.1087 0.0299 subcomponent, alpha polypeptide 29 C1qc complement component 1, q 1.18E-03 0.1688 0.0909 0.0274 subcomponent, C chain 30 Harbi1 harbinger transposase derived 1.89E-03 0.1818 0.1271 0.0389 1 31 Elovl6 ELOVL family member 6, 7.66E-03 0.3112 0.0821 0.0304 elongation of long chain fatty acids (yeast) 32 Gm561 predicted gene 561 1.73E-02 0.3908 0.0655 0.0271 33 Ugt8a UDP galactosyltransferase 8A 6.10E-03 0.2885 0.0811 0.0291 34 Dnah2 dynein, axonemal, heavy 1.07E-03 0.1622 0.218 0.0652 chain 2 35 Zfp810 zinc finger protein 810 2.30E-03 0.2055 0.1205 0.0388 36 D630045J12Ri RIKEN cDNA D630045J12 7.47E-04 0.1475 0.1149 0.0321 k gene 37 Chst3 carbohydrate (chondroitin 9.71E-03 0.3343 0.1071 0.0409 6/keratan) sulfotransferase 3 38 Tmem223 transmembrane protein 223 9.47E-03 0.3317 0.0669 0.0254 39 Comt catechol-O-methyltransferase 6.09E-04 0.1414 0.2056 0.0561 40 Cdk5rap1 CDK5 regulatory subunit 2.46E-03 0.207 0.144 0.0466 associated protein 1 41 Tmem43 transmembrane protein 43 3.93E-03 0.2538 0.0757 0.0258 42 Arhgap4 Rho GTPase activating 4.00E-04 0.1153 0.0936 0.0258 protein 4 43 Gm5617 predicted gene 5617 3.70E-04 0.1153 0.3359 0.0873 44 Cdc42bpa CDC42 binding protein 4.88E-03 0.273 0.1212 0.0413 kinase alpha 45 Slc25a26 solute carrier family 25 4.18E-03 0.254 0.0847 0.0291 (mitochondrial carrier, phosphate carrier), member 26 46 Tfrc transferrin receptor 6.18E-03 0.2885 0.1216 0.0438 47 Myadml2 myeloid-associated 1.16E-02 0.3648 0.1395 0.0533 differentiation marker-like 2 48 Pacs2 phosphofurin acidic cluster 8.51E-03 0.3222 0.0521 0.0195 sorting protein 2 49 Casp6 caspase 6 1.33E-02 0.3723 0.1104 0.044 50 Ctxn3 cortexin 3 2.73E-03 0.2176 0.4844 0.1541

47

Table 3.3. Top 50 down-regulated genes for AD mouse models in the early phase after marker gene profiles correction. P-values, fixed effect coefficient of disease states (훽1) are calculated from the linear mixed- effects model that included all samples of the disease phase and adjusted for study and marker gene profiles. In the case of tied ranks, the averaged rank is assigned. SE: standard error. AD early phase: down-regulated top genes

Ran 휷ퟏ k Gene Name P-value FDR 휷ퟏ SE 1 Ttyh2 tweety family member 2 3.57E-05 0.0378 -0.1427 0.033 3 2 Irgm2 immunity-related GTPase family 9.28E-06 0.0132 -0.2253 0.048 M member 2 8 3 Ddit3 DNA-damage inducible 1.40E-04 0.0835 -0.1366 0.034 transcript 3 9 4 Arap3 ArfGAP with RhoGAP domain, 4.59E-04 0.1246 -0.0878 0.024 repeat and PH domain 3 4 5 Eef2 eukaryotic translation elongation 6.38E-04 0.1414 -0.1527 0.043 factor 2 7 6 Tap2 transporter 2, ATP-binding 1.49E-03 0.1746 -0.1286 0.039 cassette, sub-family B 7 (MDR/TAP) 7 Nkain2 Na+/K+ transporting ATPase 1.67E-03 0.1779 -0.1501 0.046 interacting 2 8 8 Sema4b sema domain, immunoglobulin 9.04E-04 0.1533 -0.0952 0.028 domain (Ig), transmembrane domain (TM) and short cytoplasmic domain, (semaphorin) 4B 9 Orc2 origin recognition complex, 3.50E-03 0.239 -0.1043 0.035 subunit 2 1 10 Cgrrf1 cell growth regulator with ring 2.49E-04 0.0977 -0.1105 0.029 finger domain 1 4 11 Ecd ecdysoneless homolog 6.35E-05 0.0431 -0.0941 0.022 (Drosophila) 8 12 Cpq carboxypeptidase Q 1.53E-03 0.1746 -0.1231 0.038 1 13 Dctd dCMP deaminase 2.18E-03 0.1987 -0.0873 0.027 9 14 Smn1 survival motor neuron 1 1.51E-03 0.1746 -0.0741 0.022 8 15 Knop1 lysine rich nucleolar protein 1 2.04E-03 0.1908 -0.068 0.021 6 16 Matn4 matrilin 4 1.89E-03 0.1818 -0.11 0.034 7 17 Decr1 2,4-dienoyl CoA reductase 1, 2.45E-03 0.207 -0.1167 0.037 mitochondrial 8 18 Nipal3 NIPA-like domain containing 3 6.44E-03 0.2961 -0.1847 0.065 2

48

AD early phase: down-regulated top genes

Ran 휷ퟏ k Gene Name P-value FDR 휷ퟏ SE 19 Entpd4 ectonucleoside triphosphate 1.99E-04 0.0835 -0.1801 0.047 diphosphohydrolase 4 1 20 Ciart circadian associated repressor of 3.29E-03 0.2352 -0.1348 0.045 transcription 1 21 Wbp11 WW domain binding protein 11 3.66E-04 0.1153 -0.1516 0.041 5 22 Stk19 serine/threonine kinase 19 1.59E-03 0.1765 -0.0905 0.028 1 23 Cox6a2 cytochrome c oxidase subunit 5.05E-03 0.2732 -0.1909 0.067 VIa polypeptide 2 24 Psmd13 proteasome (prosome, 1.79E-03 0.1794 -0.0676 0.021 macropain) 26S subunit, non- 2 ATPase, 13 25 Itpr2 inositol 1,4,5-triphosphate 3.01E-03 0.227 -0.0811 0.026 receptor 2 8 26 Tmem248 transmembrane protein 248 6.68E-04 0.1417 -0.1419 0.038 9 27.5 Slc7a5 solute carrier family 7 (cationic 2.52E-03 0.207 -0.1094 0.035 amino acid transporter, y+ 5 system), member 5 27.5 Herpud1 homocysteine-inducible, 2.68E-03 0.2168 -0.082 0.026 endoplasmic reticulum stress- 8 inducible, ubiquitin-like domain member 1 29 Creld1 cysteine-rich with EGF-like 4.69E-03 0.2686 -0.0651 0.022 domains 1 7 30 Tatdn2 TatD DNase domain containing 1.29E-03 0.1733 -0.1078 0.032 2 8 31 Fgfr3 fibroblast growth factor receptor 5.60E-03 0.2827 -0.1313 0.046 3 7 32 Pik3r3 phosphatidylinositol 3 kinase, 7.61E-03 0.3112 -0.1138 0.042 regulatory subunit, polypeptide 3 (p55) 33 Nudcd3 NudC domain containing 3 8.83E-04 0.1533 -0.091 0.026 8 34 Ica1 islet cell autoantigen 1 7.49E-03 0.3104 -0.0826 0.030 5 35 Ecsit ECSIT signalling integrator 5.69E-04 0.1385 -0.1291 0.036 6 36 B3gat3 beta-1,3-glucuronyltransferase 3 7.31E-04 0.147 -0.0823 0.023 (glucuronosyltransferase I) 8 37 Slc25a10 solute carrier family 25 3.50E-03 0.239 -0.0824 0.027 (mitochondrial carrier, 7 dicarboxylate transporter), member 10 38 Cd52 CD52 antigen 1.06E-02 0.3514 -0.0936 0.036 2 49

AD early phase: down-regulated top genes

Ran 휷ퟏ k Gene Name P-value FDR 휷ퟏ SE 39 Eno2 enolase 2, gamma neuronal 3.77E-03 0.2496 -0.0461 0.015 6 40 Bcl7b B cell CLL/lymphoma 7B 5.29E-05 0.0409 -0.1354 0.032 4 41 Crhbp corticotropin releasing hormone 1.58E-04 0.0835 -0.1937 0.049 binding protein 8 42 Rbm3 RNA binding motif protein 3 8.47E-04 0.1508 -0.204 0.059 7 43 Hes5 hairy and enhancer of split 5 1.46E-03 0.1746 -0.2071 0.063 (Drosophila) 8 44 Dbp D site albumin promoter binding 4.66E-03 0.2686 -0.158 0.055 protein 45 Aplp2 (A4) precursor-like 4.70E-03 0.2686 -0.1521 0.053 protein 2 46 Bag6 BCL2-associated athanogene 6 7.68E-03 0.3112 -0.0526 0.019 5 47 1810026J23Ri RIKEN cDNA 1810026J23 gene 6.55E-03 0.2963 -0.0598 0.021 k 7 48 Ripk3 receptor-interacting serine- 1.44E-03 0.1746 -0.1275 0.039 threonine kinase 3 2 49 Lamb2 laminin, beta 2 6.99E-03 0.3013 -0.1013 0.037 50 Atp13a3 ATPase type 13A3 1.78E-03 0.1794 -0.1649 0.050 1 Table 3.4. Top 50 up-regulated genes for AD mouse models in the late phase after marker gene profiles correction. P-values, fixed effect coefficient of disease states (훽1) are calculated from the linear mixed- effects model that included all samples of the disease phase and adjusted for study and marker gene profiles. In the case of tied ranks, the averaged rank is assigned. SE: standard error. AD late phase: up-regulated top genes

Rank Gene Name P-value FDR 휷ퟏ 휷ퟏ SE 1 Man2b1 mannosidase 2, alpha B1 3.17E-07 0.0016 0.2284 0.0417 2 Idh1 isocitrate dehydrogenase 1 4.24E-06 0.0063 0.233 0.0479 (NADP+), soluble 3 Cd68 CD68 antigen 2.84E-06 0.0058 0.4269 0.0861 4 Msmo1 methylsterol monoxygenase 1 4.53E-08 0.0005 0.2841 0.048 5 Sqle squalene epoxidase 3.36E-06 0.0058 0.2747 0.0559 6 Cidea cell death-inducing DNA 1.23E-06 0.0042 0.3191 0.0618 fragmentation factor, alpha subunit-like effector A 7 Nsdhl NAD(P) dependent steroid 4.07E-05 0.0325 0.2612 0.0609 dehydrogenase-like 8 Fdps farnesyl diphosphate synthetase 6.26E-05 0.0433 0.2492 0.0597 9 Wdr82 WD repeat domain containing 82 5.35E-06 0.0069 0.2533 0.0465

50

AD late phase: up-regulated top genes

Rank Gene Name P-value FDR 휷ퟏ 휷ퟏ SE 10 Trem2 triggering receptor expressed on 1.05E-04 0.0537 0.3969 0.0982 myeloid cells 2 11 Reep1 receptor accessory protein 1 6.92E-05 0.0448 0.1133 0.0273 12 Slc22a4 solute carrier family 22 (organic 1.74E-04 0.0596 0.2392 0.0613 cation transporter), member 4 13 Slc14a1 solute carrier family 14 (urea 9.81E-05 0.0537 0.1982 0.0488 transporter), member 1 14 Twsg1 twisted gastrulation BMP signaling 2.32E-04 0.0634 0.1317 0.0345 modulator 1 15 Tpst1 protein-tyrosine sulfotransferase 1 4.28E-04 0.0715 0.1077 0.0296 16 Brix1 BRX1, biogenesis of ribosomes 3.34E-04 0.0676 0.1521 0.041 17 Tyrobp TYRO protein tyrosine kinase 2.29E-04 0.0634 0.2988 0.0783 binding protein 18 Fahd1 fumarylacetoacetate hydrolase 3.85E-04 0.0687 0.1838 0.0501 domain containing 1 19 0610007P RIKEN cDNA 0610007P14 gene 2.72E-04 0.0672 0.1151 0.0305 14Rik 20 Rab6a RAB6A, member RAS oncogene 4.87E-04 0.0764 0.1744 0.0484 family 21 Syap1 synapse associated protein 1 3.42E-04 0.0676 0.0997 0.0269 22 Fbxo21 F-box protein 21 6.36E-04 0.0879 0.1169 0.0332 23 Kctd6 tetramerisation 1.23E-03 0.1263 0.1042 0.0314 domain containing 6 24 Gtf2e1 general transcription factor II E, 1.17E-03 0.1232 0.1184 0.0355 polypeptide 1 (alpha subunit) 25 Ufsp1 UFM1-specific peptidase 1 4.44E-04 0.0731 0.1261 0.0347 26 Polr2h polymerase (RNA) II (DNA 1.31E-03 0.1295 0.1103 0.0334 directed) polypeptide H 27 Amph amphiphysin 1.54E-04 0.0596 0.1359 0.0345 28 Cntn1 contactin 1 9.16E-04 0.1164 0.1452 0.0426 29 C1qa complement component 1, q 3.10E-04 0.0676 0.2389 0.064 subcomponent, alpha polypeptide 30 Arcn1 archain 1 2.96E-04 0.0676 0.244 0.0622 31 Stambpl1 STAM binding protein like 1 5.48E-04 0.0834 0.107 0.03 32 Ttc39c tetratricopeptide repeat domain 1.95E-04 0.0596 0.2062 0.0533 39C 33 Zswim7 zinc finger SWIM-type containing 1.82E-03 0.1449 0.2038 0.0601 7 34 Mvk mevalonate kinase 5.55E-04 0.0834 0.2306 0.062 35 Hmgcs1 3-hydroxy-3-methylglutaryl- 3.71E-04 0.0676 0.1471 0.0398 Coenzyme A synthase 1

51

AD late phase: up-regulated top genes

Rank Gene Name P-value FDR 휷ퟏ 휷ퟏ SE 36 Ostm1 osteopetrosis associated 8.10E-04 0.1049 0.0853 0.0247 transmembrane protein 1 37 Nipsnap3b nipsnap homolog 3B (C. elegans) 1.64E-03 0.1406 0.1466 0.0454 38 Smyd4 SET and MYND domain 5.31E-04 0.0821 0.3059 0.0855 containing 4 39 Cpne7 copine VII 2.88E-03 0.1655 0.288 0.0944 40 Fam210b family with sequence similarity 3.49E-03 0.1818 0.1481 0.0496 210, member B 41 Telo2 telomere maintenance 2 3.22E-03 0.1753 0.1995 0.0661 42 Fam13c family with sequence similarity 13, 2.49E-03 0.1603 0.1254 0.0405 member C 43 Zfp655 zinc finger protein 655 9.90E-04 0.1181 0.0809 0.0239 44 Zbtb8b zinc finger and BTB domain 1.25E-04 0.0587 0.1745 0.0437 containing 8b 45 Phospho2 phosphatase, orphan 2 2.70E-03 0.1613 0.1244 0.0405 46 Med7 mediator complex subunit 7 9.38E-04 0.1172 0.1187 0.0348 47 Msh6 mutS homolog 6 2.60E-03 0.1603 0.0904 0.0293 48 Fam155a family with sequence similarity 4.20E-04 0.0715 0.11 0.0302 155, member A 49 Glrx glutaredoxin 2.35E-03 0.1554 0.1044 0.0335 50 Spryd7 SPRY domain containing 7 3.40E-03 0.1779 0.2081 0.0695

52

Table 3.5. Top 50 down-regulated genes for AD mouse models in the late phase after marker gene profiles correction. P-values, fixed effect coefficient of disease states (훽1) are calculated from the linear mixed- effects model that included all samples of the disease phase and adjusted for study and marker gene profiles. In the case of tied ranks, the averaged rank is assigned. SE: standard error. AD late phase: down-regulated top genes

Rank Gene Name P-value FDR 휷ퟏ 휷ퟏ SE 1 Inppl1 inositol polyphosphate 7.86E-06 0.0091 -0.1551 0.0329 phosphatase-like 1 2 Vars2 valyl-tRNA synthetase 2, 1.06E-05 0.011 -0.2622 0.0565 mitochondrial 3 Map3k5 mitogen-activated protein kinase 5.09E-05 0.0377 -0.274 0.0608 kinase kinase 5 4 Gucy1a3 guanylate cyclase 1, soluble, 8.41E-05 0.0513 -0.2699 0.0658 alpha 3 5 Sema5b sema domain, seven 1.94E-04 0.0596 -0.2149 0.0556 thrombospondin repeats (type 1 and type 1-like), transmembrane domain (TM) and short cytoplasmic domain, (semaphorin) 5B 6 Camkk2 calcium/calmodulin-dependent 3.51E-04 0.0676 -0.2235 0.0604 protein kinase kinase 2, beta 7 Crybb1 crystallin, beta B1 1.78E-04 0.0596 -0.1706 0.0438 8 Sppl2b signal peptide peptidase like 2B 2.51E-04 0.0634 -0.1217 0.0321 9 Meis2 Meis homeobox 2 3.62E-04 0.0676 -0.268 0.0727 10 Sfxn1 sideroflexin 1 2.50E-04 0.0634 -0.094 0.0248 11 Matn4 matrilin 4 5.87E-04 0.0856 -0.1627 0.0459 12 Atrnl1 attractin like 1 1.09E-04 0.0537 -0.3929 0.0915 13 Galt galactose-1-phosphate uridyl 5.67E-04 0.0839 -0.1355 0.0381 transferase 14 Smagp small cell adhesion glycoprotein 1.77E-04 0.0596 -0.1804 0.0462 15 Fzd9 frizzled class receptor 9 3.37E-04 0.0676 -0.2851 0.0712 16 Mfge8 milk fat globule-EGF factor 8 2.39E-04 0.0634 -0.1632 0.0429 protein 17 Ang angiogenin, ribonuclease, RNase 3.08E-04 0.0676 -0.2007 0.0535 A family, 5 18 Arhgef19 Rho guanine nucleotide 6.17E-04 0.0876 -0.1302 0.0368 exchange factor (GEF) 19 19 Cox8a cytochrome c oxidase subunit 4.14E-04 0.0715 -0.1093 0.0298 VIIIa 20 Slc1a3 solute carrier family 1 (glial high 1.95E-04 0.0596 -0.1146 0.0296 affinity glutamate transporter), member 3 21 Pdyn prodynorphin 1.02E-04 0.0537 -0.2716 0.0672 22 Extl3 exostoses (multiple)-like 3 7.92E-04 0.1039 -0.1061 0.0307 23 Il17d interleukin 17D 3.47E-04 0.0676 -0.2309 0.0624 24 Ptprd protein tyrosine phosphatase, 1.13E-03 0.1224 -0.1224 0.0366 receptor type, D

53

AD late phase: down-regulated top genes

Rank Gene Name P-value FDR 휷ퟏ 휷ퟏ SE 25 Fam120a family with sequence similarity 5.95E-04 0.0856 -0.2096 0.0551 120, member A 26 Atp1a2 ATPase, Na+/K+ transporting, 4.54E-04 0.0734 -0.166 0.0458 alpha 2 polypeptide 27 Cadm4 cell adhesion molecule 4 1.30E-03 0.1295 -0.3197 0.0927 28 Fam163a family with sequence similarity 1.91E-03 0.1458 -0.2477 0.0778 163, member A 29 Vcp valosin containing protein 2.34E-03 0.1554 -0.1013 0.0324 30 Trmt1 tRNA methyltransferase 1 1.36E-03 0.1313 -0.1408 0.0428 31 Nrxn3 neurexin III 3.69E-04 0.0676 -0.1736 0.0471 32 Parp14 poly (ADP-ribose) polymerase 2.28E-03 0.1554 -0.1913 0.0612 family, member 14 33 Fstl4 follistatin-like 4 2.35E-03 0.1554 -0.1849 0.0593 34 Phactr1 phosphatase and actin regulator 1 3.60E-03 0.1833 -0.142 0.0477 35 Stt3b STT3, subunit of the 9.92E-04 0.1181 -0.0706 0.0208 oligosaccharyltransferase complex, homolog B (S. cerevisiae) 36 Gpt glutamic pyruvic transaminase, 2.89E-03 0.1655 -0.0961 0.0315 soluble 37 Bub1b BUB1B, mitotic checkpoint 5.30E-03 0.2032 -0.1877 0.0659 serine/threonine kinase 38 Uaca uveal autoantigen with coiled- 2.33E-03 0.1554 -0.1847 0.0592 coil domains and ankyrin repeats 39 Ddx17 DEAD (Asp-Glu-Ala-Asp) box 2.20E-03 0.1554 -0.0756 0.0241 polypeptide 17 40 Dus1l dihydrouridine synthase 1-like 2.71E-03 0.1613 -0.1041 0.0339 (S. cerevisiae) 41 Pramef8 PRAME family member 8 2.21E-03 0.1554 -0.1342 0.0428 42 Tap2 transporter 2, ATP-binding 2.99E-03 0.1705 -0.175 0.0576 cassette, sub-family B (MDR/TAP) 43 Cdh4 cadherin 4 1.41E-03 0.1337 -0.228 0.0695 44 Dot1l DOT1-like, histone H3 2.18E-03 0.1554 -0.1774 0.0565 methyltransferase (S. cerevisiae) 45 Lrrk1 leucine-rich repeat kinase 1 2.00E-03 0.1493 -0.1681 0.053 46 Tmem248 transmembrane protein 248 6.78E-04 0.0925 -0.2133 0.058 47 Tjap1 tight junction associated protein 2.79E-03 0.165 -0.1534 0.0501 1 48 Hspb11 heat shock protein family B 2.80E-03 0.165 -0.2485 0.0787 (small), member 11 49 Slc12a6 solute carrier family 12, member 1.03E-03 0.1181 -0.1144 0.0337 6 50 Ift27 intraflagellar transport 27 6.99E-04 0.0942 -0.1397 0.04

54

Table 3.6. Top 50 up-regulated genes for HD mouse models in the early phase after marker gene profiles correction. P-values, fixed effect coefficient of disease states (훽1) are calculated from the linear mixed- effects model that included all samples of the disease phase and adjusted for study and marker gene profiles. In the case of tied ranks, the averaged rank is assigned. SE: standard error. HD early phase: up-regulated top genes

Rank Gene Name P-value FDR 휷ퟏ 휷ퟏ SE 1 Prkcq protein kinase C, theta 2.37E-08 0.0003 0.1332 0.0203 2 Tmem59l transmembrane protein 59-like 3.12E-06 0.0158 0.3497 0.0661 3 Scmh1 sex comb on midleg homolog 1 3.69E-06 0.0158 0.1295 0.025 4 Sema3e sema domain, immunoglobulin 2.14E-05 0.0235 0.1428 0.0306 domain (Ig), short basic domain, secreted, (semaphorin) 3E 5 Dusp26 dual specificity phosphatase 26 3.89E-05 0.0291 0.1513 0.0334 (putative) 6 Myadm myeloid-associated differentiation 1.63E-05 0.0235 0.1365 0.0288 marker 7 Olfml3 olfactomedin-like 3 1.03E-04 0.0422 0.2501 0.0591 8 Uba7 ubiquitin-like modifier activating 4.21E-05 0.0291 0.0806 0.0181 enzyme 7 9 Lmo7 LIM domain only 7 6.78E-05 0.0356 0.1937 0.0432 10 Serpine2 serine (or cysteine) peptidase 7.62E-05 0.0371 0.1068 0.024 inhibitor, clade E, member 2 11 Klf3 Kruppel-like factor 3 (basic) 1.70E-04 0.0513 0.1616 0.0388 12 Rbfox3 RNA binding protein, fox-1 3.27E-05 0.0291 0.1449 0.0316 homolog (C. elegans) 3 13 Ipo13 importin 13 6.04E-05 0.0348 0.086 0.0195 14 Ctf1 cardiotrophin 1 1.64E-04 0.0513 0.1197 0.0295 15 Arsb arylsulfatase B 4.12E-05 0.0291 0.1235 0.0273 16 Inpp5d inositol polyphosphate-5- 7.10E-05 0.0359 0.12 0.0278 phosphatase D 17 Abhd14a abhydrolase domain containing 14A 8.75E-05 0.0398 0.2583 0.0608

18 Rab11b RAB11B, member RAS oncogene 1.23E-04 0.0472 0.0944 0.0228 family 19 Sema5b sema domain, seven 2.37E-04 0.0574 0.1653 0.042 thrombospondin repeats (type 1 and type 1-like), transmembrane domain (TM) and short cytoplasmic domain, (semaphorin) 5B

20 Enpp6 ectonucleotide 4.69E-05 0.0308 0.2174 0.0486 pyrophosphatase/phosphodiesterase 6 21 Lrrtm3 leucine rich repeat transmembrane 2.16E-04 0.0574 0.1348 0.0337 neuronal 3 22 Cd82 CD82 antigen 3.77E-04 0.0701 0.125 0.033 23 Gltp glycolipid transfer protein 8.79E-05 0.0398 0.1499 0.0353 24 Tmem119 transmembrane protein 119 3.89E-04 0.0701 0.1251 0.0328

55

HD early phase: up-regulated top genes

Rank Gene Name P-value FDR 휷ퟏ 휷ퟏ SE 25 Igsf21 immunoglobulin superfamily, 2.70E-04 0.0599 0.1666 0.0424 member 21 26 Rasl10b RAS-like, family 10, member B 2.78E-04 0.0599 0.1617 0.0413 27 Plxnb3 plexin B3 1.61E-05 0.0235 0.1527 0.0318 28 Epor erythropoietin receptor 2.75E-04 0.0599 0.1126 0.0289 29 Exd2 exonuclease 3'-5' domain containing 3.88E-04 0.0701 0.1038 0.0272 2 30 Zmiz1 zinc finger, MIZ-type containing 1 7.82E-04 0.0839 0.0973 0.0272 31 Rxra retinoid X receptor alpha 5.45E-04 0.0753 0.1427 0.0388 32 Wnt2 wingless-type MMTV integration 4.16E-04 0.0738 0.1671 0.0441 site family, member 2 33 Wdr81 WD repeat domain 81 6.68E-04 0.0775 0.0905 0.0244 34 Dxo decapping exoribonuclease 1.07E-03 0.0885 0.0695 0.0201 35 U2af2 U2 small nuclear ribonucleoprotein 8.38E-04 0.0846 0.0968 0.0274 auxiliary factor (U2AF) 2

36 Deptor DEP domain containing MTOR- 3.40E-04 0.0687 0.1952 0.0507 interacting protein 37 Kctd20 potassium channel tetramerisation 2.35E-04 0.0574 0.1003 0.0252 domain containing 20 38 Magi1 membrane associated guanylate 5.07E-04 0.0747 0.1839 0.0494 kinase, WW and PDZ domain containing 1 39 Hhip Hedgehog-interacting protein 4.29E-04 0.0742 0.174 0.0461 40 Dpf1 D4, zinc and double PHD fingers 4.92E-04 0.0747 0.1843 0.0494 family 1 41 Ihh Indian hedgehog 3.00E-03 0.1317 0.1618 0.0521 42.5 Fscn1 fascin actin-bundling protein 1 6.16E-04 0.0775 0.1227 0.0337 42.5 Nono non-POU-domain-containing, 5.99E-05 0.0348 0.1404 0.0322 octamer binding protein 44 Zyx zyxin 4.87E-04 0.0747 0.1136 0.0304 45 Hlx H2.0-like homeobox 2.12E-03 0.1139 0.1337 0.0415 46 Lrrc4c leucine rich repeat containing 4C 1.96E-04 0.0548 0.1962 0.0487 47 Gigyf2 GRB10 interacting GYF protein 2 4.84E-04 0.0747 0.085 0.0227 48 Pitpnm3 PITPNM family member 3 7.06E-04 0.0799 0.1469 0.0406 49 L3mbtl2 l(3)mbt-like 2 (Drosophila) 1.43E-04 0.0506 0.0865 0.0209 50 Chst15 carbohydrate (N- 4.64E-04 0.0747 0.1026 0.0273 acetylgalactosamine 4-sulfate 6-O) sulfotransferase 15

56

Table 3.7. Top 50 down-regulated genes for HD mouse models in the early phase after marker gene profiles correction. P-values, fixed effect coefficient of disease states (훽1) are calculated from the linear mixed- effects model that included all samples of the disease phase and adjusted for study and marker gene profiles. In the case of tied ranks, the averaged rank is assigned. SE: standard error. HD early phase: down-regulated top genes

Rank Gene Name P-value FDR 휷ퟏ 휷ퟏ SE 1 Pcnxl4 pecanex homolog 4 2.11E-05 0.0235 -0.1356 0.0287 2 Coa3 cytochrome C oxidase assembly 2.94E-05 0.0291 -0.1276 0.0279 factor 3 3 Yipf4 Yip1 domain family, member 4 9.96E-06 0.0235 -0.2116 0.0433 4 Ndufa12 NADH dehydrogenase 1.92E-05 0.0235 -0.1896 0.04 (ubiquinone) 1 alpha subcomplex, 12 5 Atraid all-trans retinoic acid induced 4.81E-06 0.0158 -0.2188 0.0429 differentiation factor 6 Tctex1d2 Tctex1 domain containing 2 6.10E-05 0.0348 -0.1446 0.0329 7 Ndufs3 NADH dehydrogenase 3.59E-05 0.0291 -0.1125 0.0249 (ubiquinone) Fe-S protein 3 8 Slc1a2 solute carrier family 1 (glial 1.77E-05 0.0235 -0.1391 0.0295 high affinity glutamate transporter), member 2 9 Invs inversin 3.60E-05 0.0291 -0.3328 0.0738 10 Blzf1 basic leucine zipper nuclear 1.69E-04 0.0513 -0.1193 0.0286 factor 1 11 Riok1 RIO kinase 1 (yeast) 9.77E-05 0.0422 -0.1222 0.029 12 Pnp purine-nucleoside 1.57E-04 0.0513 -0.187 0.0456 phosphorylase 13 Stx3 syntaxin 3 1.27E-05 0.0235 -0.1961 0.0407 14 Arpc3 actin related protein 2/3 2.35E-04 0.0574 -0.1034 0.0262 complex, subunit 3 15 Ptpmt1 protein tyrosine phosphatase, 1.63E-04 0.0513 -0.1293 0.0316 mitochondrial 1 16 Sfxn5 sideroflexin 5 2.49E-04 0.0583 -0.1365 0.0345 17 Fam107a family with sequence similarity 1.60E-04 0.0513 -0.2356 0.0575 107, member A 18 1700019D0 RIKEN cDNA 1700019D03 2.40E-04 0.0574 -0.2224 0.0561 3Rik gene 19 Prkrip1 Prkr interacting protein 1 (IL11 2.04E-04 0.0557 -0.1078 0.0268 inducible) 20 Cdc7 cell division cycle 7 (S. 4.77E-04 0.0747 -0.1176 0.0316 cerevisiae) 21 Pfkfb2 6-phosphofructo-2- 2.72E-04 0.0599 -0.1043 0.0268 kinase/fructose-2,6- biphosphatase 2 22 Zfp146 zinc finger protein 146 1.03E-04 0.0422 -0.1822 0.042

57

HD early phase: down-regulated top genes

Rank Gene Name P-value FDR 휷ퟏ 휷ퟏ SE 23 Clpx caseinolytic mitochondrial 8.01E-04 0.084 -0.1246 0.0349 matrix peptidase chaperone subunit 24 Insig1 insulin induced gene 1 1.16E-04 0.0463 -0.1928 0.046 25 Acot13 acyl-CoA thioesterase 13 4.93E-04 0.0747 -0.1146 0.0309 26 Pglyrp1 peptidoglycan recognition 6.51E-04 0.0775 -0.1331 0.0368 protein 1 27 Specc1 sperm antigen with calponin 5.97E-04 0.0775 -0.1204 0.0322 homology and coiled-coil domains 1 28.5 Idh3g isocitrate dehydrogenase 3 1.26E-04 0.0472 -0.0849 0.0205 (NAD+), gamma 28.5 Uqcrq ubiquinol-cytochrome c 4.98E-04 0.0747 -0.1032 0.0271 reductase, complex III subunit VII 30 Smim19 small integral membrane protein 4.26E-04 0.0742 -0.1029 0.0272 19 31.5 Chchd7 coiled-coil-helix-coiled-coil- 5.20E-04 0.0747 -0.1477 0.04 helix domain containing 7 31.5 Mzt1 mitotic spindle organizing 3.88E-04 0.0701 -0.1985 0.0521 protein 1 33 Snrnp25 small nuclear ribonucleoprotein 1.72E-04 0.0513 -0.1199 0.0297 25 (U11/U12) 34 Mreg melanoregulin 3.22E-04 0.0675 -0.0924 0.0239 35 Uba5 ubiquitin-like modifier 3.24E-04 0.0675 -0.1969 0.0513 activating enzyme 5 36 Alkbh3 alkB homolog 3, alpha- 6.49E-04 0.0775 -0.0983 0.027 ketoglutarate-dependent dioxygenase 37 Rbks ribokinase 7.70E-04 0.0839 -0.0923 0.0259 38 Cds2 CDP-diacylglycerol synthase 6.21E-04 0.0775 -0.0739 0.0203 (phosphatidate cytidylyltransferase) 2 39 Arl5b ADP-ribosylation factor-like 5B 5.55E-04 0.0758 -0.1272 0.0344 40 Igfbp3 insulin-like growth factor 5.29E-04 0.0747 -0.1169 0.0317 binding protein 3 41 Acly ATP citrate lyase 3.90E-04 0.0701 -0.1148 0.0301 42 Ahi1 Abelson helper integration site 1 7.47E-04 0.0831 -0.1525 0.0427 43 Deb1 differentially expressed in 4.38E-04 0.0747 -0.0953 0.0255 B16F10 1 44 Rpl10a ribosomal protein L10A 3.69E-04 0.0701 -0.1151 0.0294 45 Ralgapa1 Ral GTPase activating protein, 1.36E-03 0.097 -0.1947 0.0573 alpha subunit 1 46 Cldn5 claudin 5 2.29E-04 0.0574 -0.2275 0.0576 47 1500012F0 zinc finger, NFX1-type 6.77E-04 0.0775 -0.122 0.0336 1Rik containing 1, antisense RNA 1

58

HD early phase: down-regulated top genes

Rank Gene Name P-value FDR 휷ퟏ 휷ퟏ SE 48 Gas5 growth arrest specific 5 5.81E-04 0.0775 -0.336 0.0919 49 Entpd5 ectonucleoside triphosphate 6.67E-04 0.0775 -0.1008 0.0279 diphosphohydrolase 5 50 Car2 carbonic anhydrase 2 1.42E-04 0.0506 -0.2292 0.0559

59

Table 3.8. Top 50 up-regulated genes for HD mouse models in the late phase after marker gene profiles correction. P-values, fixed effect coefficient of disease states (훽1) are calculated from the linear mixed- effects model that included all samples of the disease phase and adjusted for study and marker gene profiles. In the case of tied ranks, the averaged rank is assigned. SE: standard error. HD late phase: up-regulated top genes

Rank Gene Name P-value FDR 휷ퟏ 휷ퟏ SE 1 Doc2b double C2, beta 2.67E-11 0 0.3815 0.0507 2 Lrrn3 leucine rich repeat protein 3, 1.63E-11 0 0.4242 0.0557 neuronal 3 Isl1 ISL1 transcription factor, 5.08E-11 0 0.4225 0.0572 LIM/homeodomain 4 Cbx8 chromobox 8 6.75E-10 0 0.4239 0.062 5 Smoc1 SPARC related modular calcium 6.95E-11 0 0.6008 0.0821 binding 1 6 Klhl13 kelch-like 13 6.51E-09 0 0.4019 0.0632 7 Nagk N-acetylglucosamine kinase 1.76E-09 0 0.2843 0.0429 8 P2ry1 purinergic receptor P2Y, G- 2.50E-10 0 0.454 0.0644 protein coupled 1 9 Robo1 roundabout guidance receptor 1 1.56E-08 0 0.3393 0.0551 10 Meis1 Meis homeobox 1 5.89E-09 0 0.4798 0.0753 11 Sap130 Sin3A associated protein 2.13E-08 0 0.267 0.0438 12 Pcdhb2 protocadherin beta 22 3.08E-08 0 0.4765 0.0793 2 13 Fam19 family with sequence similarity 3.48E-08 0 0.4795 0.0788 6b 196, member B 14 Sertad4 SERTA domain containing 4 4.97E-09 0 0.3839 0.0599 15 Tac1 tachykinin 1 1.23E-09 0 0.4127 0.0614 16 Wbp5 WW domain binding protein 5 1.11E-08 0 0.2324 0.0372 17 Psme1 proteasome (prosome, macropain) 2.96E-08 0 0.3097 0.0515 activator subunit 1 (PA28 alpha) 18 Hexim hexamethylene bis-acetamide 1.48E-07 0 0.3734 0.0659 1 inducible 1 19 Dcc deleted in colorectal carcinoma 3.87E-08 0 0.347 0.0574 20 Prkch protein kinase C, eta 2.35E-07 0 0.4606 0.0829 21 Jade1 jade family PHD finger 1 1.18E-07 0 0.2315 0.0406 22 Slc27a solute carrier family 27 (fatty acid 3.22E-07 0.0001 0.3624 0.0662 2 transporter), member 2 23 Aff1 AF4/FMR2 family, member 1 1.31E-07 0 0.3629 0.0638 24.5 Msrb2 methionine sulfoxide reductase B2 2.83E-08 0 0.3383 0.0561

24.5 Zfp667 zinc finger protein 667 2.20E-08 0 0.3963 0.0646 26 Tle4 transducin-like enhancer of split 4 3.30E-08 0 0.2903 0.0484 60

HD late phase: up-regulated top genes

Rank Gene Name P-value FDR 휷ퟏ 휷ퟏ SE 27 Tmub2 transmembrane and ubiquitin-like 1.17E-06 0.0001 0.1978 0.0378 domain containing 2 28 Dlg3 discs, large homolog 3 2.74E-07 0 0.1681 0.0305 (Drosophila) 29 Abcb4 ATP-binding cassette, sub-family 1.08E-06 0.0001 0.3022 0.0582 B (MDR/TAP), member 4 30 B3galt UDP-Gal:betaGlcNAc beta 1,3- 1.47E-07 0 0.3138 0.0554 5 galactosyltransferase, polypeptide 5 31 Tmub1 transmembrane and ubiquitin-like 7.61E-07 0.0001 0.313 0.0593 domain containing 1 32 Sgcb sarcoglycan, beta (- 6.55E-07 0.0001 0.2305 0.0434 associated glycoprotein) 33 Tbca tubulin cofactor A 2.01E-07 0 0.1849 0.0331 34 Lpl lipoprotein lipase 5.23E-07 0.0001 0.3534 0.0658 35 Dcbld1 discoidin, CUB and LCCL 3.19E-07 0.0001 0.2357 0.043 domain containing 1 36 Spin2c spindlin family, member 2C 7.28E-07 0.0001 0.3957 0.0745 37 Arpc1a actin related protein 2/3 complex, 2.58E-07 0 0.1999 0.0362 subunit 1A 38 Pcdh10 protocadherin 10 1.10E-06 0.0001 0.3192 0.0614 39 Wt1 Wilms tumor 1 homolog 1.78E-06 0.0002 0.8033 0.1584 40 Lgalsl lectin, galactoside binding-like 1.46E-06 0.0001 0.211 0.0412 41 Tbrg1 transforming growth factor beta 2.94E-06 0.0002 0.4088 0.0824 regulated gene 1 42 Gcnt2 glucosaminyl (N-acetyl) 3.18E-07 0.0001 0.2617 0.0477 transferase 2, I-branching enzyme 43 Noa1 nitric oxide associated 1 2.49E-06 0.0002 0.2623 0.0525 44 Slc35d solute carrier family 35, member 1.10E-06 0.0001 0.3298 0.0636 3 D3 45 Krtcap keratinocyte associated protein 2 2.55E-06 0.0002 0.2173 0.0436 2 46 Triqk triple QxxK/R motif containing 2.95E-06 0.0002 0.2797 0.0565 47 Rab9b RAB9B, member RAS oncogene 1.63E-06 0.0002 0.2574 0.0505 family 48 Sub1 SUB1 homolog (S. cerevisiae) 4.69E-07 0.0001 0.2438 0.0452 49 Slitrk5 SLIT and NTRK-like family, 1.82E-06 0.0002 0.2191 0.0432 member 5 50 Trib2 tribbles pseudokinase 2 4.15E-07 0.0001 0.3077 0.0568

61

Table 3.9. Top 50 down-regulated genes for HD mouse models in the late phase after marker gene profiles correction. P-values, fixed effect coefficient of disease states (훽1) are calculated from the linear mixed- effects model that included all samples of the disease phase and adjusted for study and marker gene profiles. In the case of tied ranks, the averaged rank is assigned. SE: standard error. HD late phase: down-regulated top genes

Rank Gene Name P-value FDR 휷ퟏ 휷ퟏ SE 1 Ddit4l DNA-damage-inducible 3.67E-13 0 -0.8432 0.1002 transcript 4-like 2 Fa2h fatty acid 2-hydroxylase 9.93E-11 0 -0.4009 0.0552 3 Nrep neuronal regeneration related 5.41E-10 0 -0.4735 0.0686 protein 4 Elavl4 ELAV (embryonic lethal, 5.40E-11 0 -0.6985 0.0947 abnormal vision, Drosophila)-like 4 (Hu antigen D) 5 Rgs14 regulator of G-protein signaling 1.15E-09 0 -0.4247 0.0631 14 6 Kcnc2 potassium voltage gated channel, 2.48E-09 0 -0.9837 0.1475 Shaw-related subfamily, member 2 7 Chgb chromogranin B 7.09E-09 0 -0.4102 0.0647 8 Il33 interleukin 33 1.05E-10 0 -0.4881 0.0674 9 Cpne9 copine family member IX 6.54E-09 0 -0.8065 0.1267 10 Tmem116 transmembrane protein 116 3.62E-09 0 -0.3996 0.0617 11 Slmap sarcolemma associated protein 2.70E-08 0 -0.1927 0.0319 12 Sorcs3 sortilin-related VPS10 domain 3.58E-08 0 -0.3619 0.0605 containing receptor 3 13 Penk preproenkephalin 1.32E-08 0 -0.3622 0.0576 14.5 Adcy8 adenylate cyclase 8 1.79E-08 0 -0.4908 0.0799 14.5 Efhd1 EF hand domain containing 1 5.09E-08 0 -0.3469 0.0588 16 Chil1 chitinase-like 1 1.36E-08 0 -0.4032 0.065 17 Tspan2 2 7.56E-08 0 -0.3844 0.0662 18 Syt13 synaptotagmin XIII 1.45E-07 0 -0.6568 0.1161 19 Camkk1 calcium/calmodulin-dependent 2.49E-08 0 -0.3064 0.0506 protein kinase kinase 1, alpha 20 Plk5 polo-like kinase 5 4.39E-10 0 -0.5617 0.0795 21 Elovl1 elongation of very long chain 7.89E-08 0 -0.3214 0.0554 fatty acids (FEN1/Elo2, SUR4/Elo3, yeast)-like 1 22 Gpcpd1 glycerophosphocholine 6.61E-08 0 -0.3621 0.0619 phosphodiesterase 1 23 Mal and lymphocyte protein, 1.40E-07 0 -0.3442 0.0607 T cell differentiation protein 62

HD late phase: down-regulated top genes

Rank Gene Name P-value FDR 휷ퟏ 휷ퟏ SE 24 Oxr1 oxidation resistance 1 3.89E-09 0 -0.5216 0.0807 25 Ssbp3 single-stranded DNA binding 2.56E-07 0 -0.2393 0.0432 protein 3 26 Ano8 anoctamin 8 1.51E-07 0 -0.4574 0.0793 27 Tango2 transport and golgi organization 2 7.44E-08 0 -0.2316 0.0398 28 Rgs11 regulator of G-protein signaling 1.94E-07 0 -0.5627 0.0992 11 29 Arel1 apoptosis resistant E3 ubiquitin 5.51E-07 0.0001 -0.1664 0.0311 protein ligase 1 30 Clmn calmin 4.90E-07 0.0001 -0.3263 0.0607 31 Plxna4 plexin A4 3.18E-07 0.0001 -0.3678 0.0671 32 Unc5b unc-5 netrin receptor B 3.00E-07 0.0001 -0.232 0.0422 33 Dusp18 dual specificity phosphatase 18 1.75E-07 0 -0.4506 0.0801 34 Map3k12 mitogen-activated protein kinase 1.80E-07 0 -0.2077 0.037 kinase kinase 12 35 Ndst4 N-deacetylase/N-sulfotransferase 9.72E-09 0 -0.9642 0.1537 (heparin glucosaminyl) 4 36 Ugt8a UDP galactosyltransferase 8A 1.13E-07 0 -0.3131 0.0548 37 Prss23 protease, serine 23 9.21E-07 0.0001 -0.4843 0.0926 38 Fads1 fatty acid desaturase 1 6.09E-07 0.0001 -0.1646 0.0309 39 Olfm1 olfactomedin 1 2.43E-07 0 -0.7562 0.1361 40 Adrbk2 adrenergic receptor kinase, beta 2 1.07E-06 0.0001 -0.4639 0.0893 41 Nrip3 nuclear receptor interacting 9.61E-07 0.0001 -0.5077 0.0972 protein 3 42 St3gal1 ST3 beta-galactoside alpha-2,3- 2.53E-08 0 -0.5205 0.0859 sialyltransferase 1 43 Pls3 plastin 3 (T-isoform) 6.39E-07 0.0001 -0.3103 0.0583 44 Fam129b family with sequence similarity 5.97E-07 0.0001 -0.2335 0.0437 129, member B 45 Coro2a coronin, actin binding protein 2A 3.51E-07 0.0001 -0.5353 0.098 46 Zmynd8 zinc finger, MYND-type 3.35E-07 0.0001 -0.2948 0.0539 containing 8 47 C1ql3 C1q-like 3 6.96E-07 0.0001 -0.9124 0.1723 48 Pmp22 peripheral myelin protein 22 2.75E-07 0 -0.398 0.0721 49 Fbxo2 F-box protein 2 4.64E-07 0.0001 -0.2915 0.054 50.5 Crtac1 cartilage acidic protein 1 6.09E-07 0.0001 -0.3918 0.0733 50.5 Atp1b2 ATPase, Na+/K+ transporting, 7.80E-07 0.0001 -0.2922 0.0554 beta 2 polypeptide

63

Table 3.10. Comparisons between top genes and top hits reported in original studies for the late AD phase. Top genes are top 20 up and top 20 down-regulated genes identified in this study after marker gene profiles correction. Total reported genes are the top differentially expressed genes reported in the original studies. Study of data set GSE53480 [112] has not reported top genes. --: Genes are not included in the original study.

✓: Overlapped genes with same change of direction.

Top

Top Genes

Genes

GSE14499 GSE15056 GSE1556 GSE50521 GSE52022 GSE63617 GSE64398 Total Studies GSE14499 GSE15056 GSE1556 GSE50521 GSE52022 GSE63617 GSE64398 Total Studies

Man2b1 ✓ 1 Inppl1 - Idh1 - Vars2 -- - Cd68 ✓ 1 Map3k5 -- - Msmo1 - Gucy1a3 -- - Sqle ✓ 1 Sema5b - Cidea - Camkk2 -- -

Nsdhl ✓ 1 Crybb1 -- -

Fdps ✓ ✓ 2 Sppl2b - Wdr82 -- -- - Meis2 -

Trem2 -- ✓ ✓ 2 Sfxn1 - regulated

Reep1 - - Matn4 -

regulated -

(Late AD) (Late Slc22a4 -- - Atrnl1 -- -- - Up

Slc14a1 -- ✓ 1 down Galt - Twsg1 - Smagp -- - Tpst1 - Fzd9 -- -- - Brix1 - Mfge8 - Tyrobp ✓ 1 Ang -- - Fahd1 - Arhgef19 -- - 0610007P14Rik - Cox8a -- -- - Rab6a ✓ 1 Slc1a3 -- - Total reported genes 57 8 64 88 8 11 13 84 0 24 0 2 1 11

64

Table 3.11. Comparisons between top genes and top hits reported in original studies for the early HD phase. Top genes are top 20 up and top 20 down-regulated genes after marker gene profiles correction. Total reported genes are the top differentially expressed genes reported in the original studies. Kuhn et al. (2007) [72] report top genes from the meta-analysis of the early phase samples of GSE7958, GSE9375, GSE9857, and GSE10202. Top hits from study GSE50379 [66] are derived from mGluR5 knockout mice and therefore not comparable. --: Genes were not included in the original study.

✓: Overlapped genes with same change of direction.

Top

Top Genes

Genes

GSE32417 GSE64386 GSE9038 al. et Kuhn Total Studies GSE32417 GSE64386 GSE9038 al. et Kuhn Total Studies

Prkcq ✓ 1 Pcnxl4 - Tmem59l ✓ 1 Coa3 - Scmh1 - Yipf4 - Sema3e - Ndufa12 - Dusp26 - Atraid - Myadm ✓ 1 Tctex1d2 -

Olfml3 - Ndufs3 ✓ 1

Uba7 - Slc1a2 ✓ 1

Lmo7 -- - Invs -

Serpine2 - Blzf1 -- - regulated

Klf3 -- - - Riok1 -

regulated regulated -

(early HD) (early Rbfox3 - Pnp - Up

Ipo13 - Down Stx3 - Ctf1 - Arpc3 - Arsb - Ptpmt1 - Inpp5d - Sfxn5 - Abhd14a - Fam107a - Rab11b - 1700019D03Rik - Sema5b - Prkrip1 - Enpp6 ✓ 1 Cdc7 - Total reported genes 16 206 101 50 24 435 210 50

65

Table 3.12. Comparisons between top genes and top hits reported in original studies for the late HD phase. Top genes are top 20 up and top 20 down-regulated genes after marker gene profiles correction. Total reported genes are the top differentially expressed genes reported in the original study. Kuhn et al. (2007) [72] report top genes from the meta-analysis of the late phase samples of GSE7958, GSE9857, and GSE10202. Study of data set GSE62210 [114] report 585 up-regulated and 484 down- regulated genes (FDR < 0.05). However, list of gene names is not available. --: Genes are not included in the original study. ✓: Overlapped genes with same change of direction.

✓-: Overlapped genes with opposite change of direction.

Top Top

Genes Genes

GSE19676 GSE25232 GSE26317 GSE32417 GSE48104 GSE64386 al. et Kuhn Total Studies GSE19676 GSE25232 GSE26317 GSE32417 GSE48104 GSE64386 al. et Kuhn Total Studies

Doc2b ✓- 1 Ddit4l ✓ ✓ ✓ ✓ 4 Lrrn3 ✓ ✓ 2 Fa2h ✓ ✓ 2 Isl1 ✓ 1 Nrep - Cbx8 ✓ ✓ ✓ 3 Elavl4 - Smoc1 ✓ ✓ ✓ ✓ ✓ 5 Rgs14 ✓ ✓ ✓ 3 Klhl13 - Kcnc2 -- -- -

Nagk ✓ ✓ ✓ ✓ 4 Chgb ✓ ✓ 2

P2ry1 - Il33 ✓ 1

Robo1 ✓ 1 Cpne9 -

Meis1 ✓ 1 Tmem116 - regulated

Sap130 - - Slmap ✓ ✓ ✓ 3

regulated regulated -

(Late HD) (Late Pcdhb22 - Sorcs3 ✓ 1 Up

Fam196b -- ✓ 1 Down Penk -- -- ✓ 1 Sertad4 - Adcy8 - Tac1 - Efhd1 ✓ 1 Wbp5 ✓ ✓ 2 Chil1 - Psme1 ✓ ✓ ✓ ✓ 4 Tspan2 ✓ 1 Hexim1 ✓ 1 Syt13 - Dcc -- -- - Camkk1 ✓ ✓ 2 Prkch - Plk5 -- ✓ -- 1 Total reported 25 116 417 23 148 466 50 15 167 498 31 275 656 50 genes 66

3.4 Cross-disease comparison revealed similarities in the late phase of AD and HD. The pathological similarity between AD and HD, such as abnormal accumulation and aggregation of misfolded proteins, suggests that cross-disease mechanisms and pathways may exist. To identify cross-disease transcriptional alterations and potential mechanisms, I compared the gene rankings after marker gene profiles correction and functional enrichment results of these two disorders. On the gene level comparison, the Spearman’s correlation coefficient was 0.01 between the gene rankings of AD and HD in the early phase, and 0.09 in the late phase. Among the top 200 ranked dysregulated genes, there was no overlap in concordance between the two diseases in the early and late phases. The results indicated the association of gene rankings between AD and HD were very low, especially in the early phase. While there was no overlapped enriched GO terms in the early phase, a few GO terms were significantly enriched in down-regulated top genes across diseases in the late phase, including “G-protein coupled receptor signaling pathway” and “regulation of MAPK cascade”. However, there was no overlap of top down-regulated genes annotated with these two GO terms across diseases. Based on my results, it is inconclusive whether AD and HD mouse models share mechanisms in the early phase. The cross-disease similarities in the late phase could be effects of neurodegeneration in AD and HD.

67

Chapter 4: Discussion and Conclusion My meta-analysis of gene expression in mouse models of AD and HD identified expression changes across mouse models in the early and late disease phase of each disease. These transcriptional changes were largely disease phase-specific and were not shared across AD and HD. The consistent early changes may reveal early disease mechanisms. Changes in the late phase were stronger compared to the early changes, but more associated with predicted changes in cellular populations, as might be expected if neurodegeneration was taking place. In this chapter, I will discuss my findings in AD and HD mouse models, differences and similarities between AD and HD, limitations and future work. 4.1 Consistent transcriptomic alterations were identified across different mouse models for each disorder. With the meta-analysis of gene expression in 12 AD mouse models, and 9 HD mouse models respectively from 23 independent studies, consistent transcriptomic alterations were identified across mouse models for each disorder, regardless of the variability of these mouse models. By categorizing samples into early and late disease phases, shared gene expression changes specific to the disease phase were revealed. The results showed subtle but biologically interpretable changes shared across mouse models in the early disease phase, while changes in the late disease phase had relatively stronger signals. The top-ranked genes in the early phase were not always affected in the late phase, and vice versa, indicating phase-specific expression changes. The results from the original studies are often inconsistent, even though some genes are repeatedly reported in some of the original studies. The inconsistency may stem from the experimental designs, mouse models and other factors as discussed in Section 1.8. Unfortunately, studies with AD mouse models in the early phase did not report gene hit list and comparison with my results was not possible. For the late AD phase and both phases of HD, there were overlap of top-ranked genes with the reported genes in the original studies, which suggested the meta-analysis was able to capture signals detected in the individual study (Table 3.10, Table 3.11, Table 3.12). Though some genes in my top list had been reported as top hits in some but not all studies, there were consistent signals in the other studies that did not report the genes. For example, 2 out of 7 studies in the late AD phase reported the AD risk gene, Trem2, in their hit lists (Table 3.10). However, consistent down-regulated changes were also observed in other 68 studies (Figure 4.1). Similarity, one of my top down-regulated gene in the late HD phase, Ddit4l was reported in 4 out of 8 studies, and the rest of the studies also showed down-regulated changes (Figure 4.2). These original studies used different methods and thresholds to prioritize genes, and therefore may not report all of the genes that showed signals. In addition, novel genes that had not been reported in the original studies were also identified, which could be a result of increased statistical power of the meta-analysis. Examples included top up-regulated gene, Msmo1 in the late AD phase, and top down-regulated gene, Nrep in the late HD phase (Figure 4.3, Figure 4.4).

Figure 4.1. Expressions of Trem2 in the late AD phase before and after marker gene profiles correction. Control samples (WT) are marked as red and AD mouse models marked as blue. Expression values after marker gene profiles (MGP) correction are corrected for studies and MGP. Studies (data sets) that are highlighted by a red box have reported the gene as top hit in the original publications.

69

Figure 4.2. Expressions of Ddit4l in the late HD phase before and after marker gene profiles correction. Control samples (WT) are marked as red and AD mouse models marked as blue. Expression values after marker gene profiles (MGP) correction are corrected for studies and MGP. Studies (data sets) that are highlighted by a red box have reported the gene as top hit in the original publications.

Figure 4.3. Expressions of Msmo1 in the late AD phase before and after marker gene profiles correction. Control samples (WT) are marked as red and AD mouse models marked as blue. Expression values after marker gene profiles (MGP) correction are corrected for studies and MGP. 70

Figure 4.4. Expressions of Nrep in the late HD phase before and after marker gene profiles correction. Control samples (WT) are marked as red and HD mouse models marked as blue. Expression values after marker gene profiles (MGP) correction are corrected for studies and MGP.

4.2 Applying cell population proportion correction revealed transcriptional changes of cell-type specific regulatory events. As ND progresses, which is generally characterized by neuronal loss and neuroinflammation, cell population proportion changes in the brain become more prominent [26], [68]. These cellular composition changes contribute to the gene expression changes of bulk tissue transcriptome profiles, which represent overall average gene expressions for all cell types [71]. To better understand cell-type specific transcriptional changes that contribute to disease pathophysiology, I estimated cellular composition changes of mouse models and control samples by estimating marker gene profiles of neurons and glial cells. My results of marker gene profiles estimations were largely consistent with known cellular composition changes in human and mouse models of AD and HD. The marker gene profiles estimations were then used in LMMs to adjust for expression changes due to cellular composition changes. The cell-type proportion changes as estimated by marker gene profiles in AD mouse models were consistent with the previous mouse and human studies [68], [73], [123], and indicated neuronal loss and gliosis in the hippocampus in the late phase (Figure 3.1, Figure 3.2). The lack of significant proportion changes of neurons and glial cells before the occurrence of 71 cognitive impairment in AD mouse models (Figure 3.1, Figure 3.2) also agreed with a previous study, which concludes that cognitive impairment, neuronal loss and gliosis occur concurrently in the hippocampus of AD mouse model J20 [48]. However, I found a signature consistent with a decreased proportion of MSNs in the early HD phase, before HD mouse models developed motor symptoms (Figure 3.3). The result is consistent with significantly reduced volume of dorsal striatum samples observed in preclinical mHTT carriers [124]. Estimated increase of oligodendrocytes in the late phase was as expected in the late HD phase [125]. Interestingly, the marker gene profiles suggested there was small yet significant decrease of expressions of astrocyte markers in the late HD phase (FDR < 0.05). Astrogliosis has been observed in both HD mouse models and HD patients [125], [126]. However, the observations were made based on increased numbers of glial fibrillary acidic protein (GFAP) positive astrocytes [125], [126]. GFAP is a marker for reactive astrocytes but is not expressed by all astrocytes [127]; therefore it is not an effective marker for overall astrocyte population. It is possible that reactive astrocytes were increased in the striatum of HD mouse models but other astrocyte subtype populations were decreased. On the other hand, the astrocyte markers used for marker gene profiling are derived from normal astrocytes[75]. Mutant HTT (mHTT) affects normal astrocyte functions and is known to alter gene expressions, such as decreases expressions of Slc1a3 and Slc1a2 (genes encode glutamate transporters) [126]. Some of these genes were defined as marker genes for profiling (e.g. Slc1a3 and Slc1a2), thus expression patterns of a subset of the astrocyte markers could have been altered in HD mouse models and influenced the estimation. Further research is needed to determine cell population changes of subtypes of astrocytes in HD and develop disease-specific astrocyte markers. The profiles estimation suggested decrease of gene expressions of cholinergic neuron markers in the late HD phase (Figure 3.3). Cholinergic neurons is believed to be less susceptible to cell loss compared to MSNs [128]. Smith et al. (2006) found that brain samples of HD patient and mouse models in the late phase had decreased expression of a cholinergic neuron marker, Slc18a3 (a.k.a VAChT) without loss of cholinergic neurons [129]. However, in the marker gene profiles estimation, the majority of the cholinergic neuron markers indicated decreased expressions in the late HD phase. These marker genes, which were derived from normally functioning cells, may not be able to precisely identify cell populations that undergo dramatic changes in cell morphology and cell functions. It is unclear that if the expression changes were caused by loss of normal functions without cell loss, cell loss, 72 or the combination of cellular dysfunction and cell loss. Further investigation and HD disease- specific cell-type markers are needed. The high presence of cell-type markers in the top-ranked genes before marker gene profiles correction was consistent with the significant marker gene profiles changes of neurons and glial cells in both AD and HD (Table 3.1). With the significantly increased marker gene profiles of microglia in the late AD disease phase, and decreased MSNs marker gene profiles in both phases of HD, it is not surprising to see many of the microglia markers were among the top up-regulated genes in late AD, and MSN markers among the top down-regulated genes in both early and late HD before marker gene profiles correction (Table 3.1). After applying marker gene profiles correction, the number of DE genes and top-ranked cell-type markers were greatly reduced (Figure 3.5, Figure 3.8, Table 3.1). Some of the top-ranked marker genes after correction were not differentially expressed (FDR > 0.05) before correction, such as down-regulated astrocyte markers Atpla2 and Slc1a3 in the late AD phase and up-regulated MSN markers (Chst15, Lmo7, Rbfox3) in the early HD phase. One interesting case was that MSN marker Rgs8 was DE down-regulated before correction and became top-ranked up-regulated after correction in the late HD phase (Table 3.1). The correction of gene rankings and decreased DE genes after marker gene profiles correction implied that the gene expression changes, especially in the late disease phase, were at least substantially driven by the changes of cell-type proportions. Adjusting marker gene profiles in LMMs can reveal cell-type specific transcriptional changes. Some of the markers remained top-ranked of the same direction of regulation even after correction, such as top-up-ranked microglia markers Cd68 and Tyrobp, and astrocyte marker Slc14a1 in the late AD phase remained top-up-ranked after correction (Table 3.1). Similarly, in HD models, astrocyte marker Fads1 remained top-down-regulated in the late HD phase (Table 3.1). The dysregulation of these markers genes cannot be fully explained by the cell-type population changes, and therefore may indicate changes at transcriptional regulation level within certain cell types that contribute to disease pathophysiology. 4.3 Shared gene expression changes and implications in AD mouse models In AD mouse models, my meta-analysis of the early phase revealed up-regulation of genes in cholesterol biosynthesis and classical complement cascade. The changes were subtle in the early phase compared to the late phase. However, the results had some biological coherence

73 as suggested by the GO enrichment analysis, and may link cholesterol level to AD disease- initiating mechanisms. The enrichment of genes in the cholesterol biosynthesis in both early and late phases suggested chronic dysregulation in cholesterol biosynthesis. Up-regulation of genes in cholesterol biosynthesis has been also observed in an AD amyloid transgenic mouse model, APP23 [130]. Several lines of evidence have linked cholesterol to Aβ generation and deposition [131], [132]. In mouse and cell culture studies, decreased brain cholesterol levels can reduce Aβ abundance [132], [133]. Cleavage of APP by β-secretase and γ-secretase (the amyloidogenic pathway) mainly occurs in lipid rafts of the plasma membrane, whereas α-secretase of the non- amyloidogenic pathway tend to localize at the non-lipid-raft sites [134]. Lipid rafts have high concentration of cholesterol [134]. Increased level of cholesterol enhances localization of APP, β-secretase and γ-secretase to the lipid rafts, and subsequently promotes Aβ production [134], [135]. However, the mechanisms that initiate the increased gene expression of cholesterol biosynthesis genes in mouse models are still unclear. The majority of samples analyzed here were amyloid transgenic models, which express human transgenes (APP, PSEN1, PSEN2) with known AD associated mutations that promote APP processing through the amyloidogenic pathway. Expression of these transgenes could play a role in promoting the expression of cholesterol biosynthesis genes, which lead to increased cholesterol level that can accelerate Aβ production. Nevertheless, my results added evidence that cholesterol may play an important role in AD initiation and progression in mouse models. Complement pathway genes, C1qa, C1qb and C1qc, were among the top-ranked up- regulated genes in the early phase, suggested up-regulation of genes in the C1q pathway in early disease progression (Table 3.2). The up-regulation of C1q pathway, which initiates classical complement cascade, has been linked to early synaptic loss before Aβ accumulation, and is a response to injury in mouse model, and could be neuroprotective against misfolded proteins [80], [136]. C1q pathway genes were still differentially up-regulated in the late phase, however, they were no longer among the top-ranked genes after marker gene profiles correction. The results may indicate that cellular composition changes may weigh more than transcriptional changes of the expressions of C1q pathway genes in the late phase and thus affected the gene ranking after marker gene profiles correction.

74

In the late disease phase, GO term “regulation of neurogenesis” was enriched in the top down-regulated genes. The result was consistent with previous mouse studies [50], [109], reflecting the neuronal dysfunction in the AD mouse brain. Interestingly, GO terms related inflammation and immune system were not enriched in up-regulated genes. These results were contrary to other studies, which report up-regulations of genes related to inflammation and immune system [50], [81], [82]. This seeming discrepancy can be explained as a combination of effects of cell-type proportion considerations and gene function annotations. Without marker gene profiles correction, my results did indicate up-regulation of genes in the immune response pathways. However, closer inspection revealed that some of the microglia markers are annotated with GO terms, “immune response” and “inflammatory response”. This confound complicates separating cell-type proportion changes from immune system changes. On the other hand, the dysregulation of a few astrocytes and microglia markers could not be fully explained by the cell- type population changes, and remained top-ranked dysregulated genes after marker gene profiles correction (Table 3.1). For example, Cd68, a marker for microglial activation and is correlated with Aβ42 load [137], was up-regulated in the late phase. Another microglia marker, TYRO protein tyrosine kinase-binding protein gene (Tyrobp) and its receptor Trem2 were among the top up-regulated genes. Kobayashi et al. (2016) demonstrate that TYROBP binds to TREM2 and promotes microglial activation [138]. These results may provide some evidence of microglial activation at the cell-type specific level in AD mouse models. The meta-analysis of gene expression of multiple AD mouse models identified a few known AD risk genes among the top ranked genes in the late disease phase, such as Trem2. The expressions of App, Psen1, Psen2 were not altered as expected, since the microarray probesets detect the murine endogenous copies but not the human transgenic copies. In conclusion, early gene expression changes indicated up-regulation of genes in cholesterol biosynthesis and complement pathway in AD mouse models, which may link cholesterol and classical complement cascade to AD pathogenesis. Altered expressions of genes involved in neurogenesis were only detected in the late phase. The early changes are more likely to reveal disease pathogenesis, while changes in the late phase are likely consequential as a response to neuronal damage.

75

4.4 Shared gene expression changes and implications in HD mouse models Overall transcriptional changes in HD mouse models were stronger compared to AD mouse models. Significantly down-regulated mitochondrial and related genes suggested early MSN loss maybe due to malfunction in energy supply to neurons. In the late phase, large proportion of DE genes was possibly a consequential response to mHTT toxicity. Across HD mouse models in the early phase, genes encoding subunits of mitochondrial respiratory chain complexes and other mitochondria related genes were consistently down- regulated, which suggested mitochondrial dysfunction may play a critical role on the onset of HD. Compared to other cell types, neurons heavily depend on mitochondria for adenosine triphosphate (ATP) production and are especially susceptible to mitochondrial malfunction [117]. Therefore, mitochondrial dysfunction in the early phase could contribute to the loss of MSNs in the early disease progression. Interestingly, these mitochondrial and related genes were not differentially expressed in the late phase. Disease phase specific changes in mitochondrial complex II subunit have been observed in HD mouse models R6/2 and R6/1, where expressions of related genes are decreased in the early phase and no significant changes in the late phase [139]. The identification of changes in mitochondrial gene expression specific to the early phase may help to understanding initiation of MSN loss in HD mouse models. Consistent with the results from human samples [70], a large proportion (13.7%) of the genes were significantly dysregulated across different models during the late HD phase (Figure 3.8), suggesting brain functions could be severely affected in the striatum. Many GO terms, such as “neuron projection development”, “G-protein coupled receptor signaling pathway”, “regulation of MAPK cascade”, were significantly enriched in the top down-regulated genes. HTT is known to interact with many proteins and many of these interactions are altered by mHTT [26], [29]. Mutant HTT fragments can bind to transcription factors and genomic DNA directly to alter transcription in the R6/2 mouse model [140]. However, restoration of some of the dysregulated genes to normal level do not have an impact on HD phenotypes in mouse models [84]. Rather than causative factors responsible for the disease progression, these transcriptional dysregulations could be a consequential response to the combinations of accumulated mHTT toxicity and significant MSN loss. Marker gene profiles correction adjusted rankings of previously top-ranked neuron and glial cell markers and removed a lot of the marker genes from the top hits (Table 3.1). The 76 results suggested that the marker gene expression changes were at least substantially driven by cellular composition changes. For example, a MSN marker gene, regulator of G-protein signaling 8 (Rgs8), was significantly down-regulated before marker gene profiles correction and was revealed up-regulated and among the top-ranked genes after correction in the late phase. The estimated loss of MSNs could significantly contribute to the “decreased” expression of Rgs8 in the bulk tissue, whereas the marker gene profiles correction could reveal the expression changes in the remaining MSNs. RGS proteins interact with G-proteins and other proteins to regulate neurotransmitter release and synaptic plasticity [141]. The up-regulation of Rgs8 could be a compensatory mechanism to improve neuronal functions in the remaining MSNs. However, the expression changes of a few marker genes could not be fully explained by the composition changes and may reveal changes at cell-type specific level. An astrocyte marker, solute carrier family 1 member 2 (Slc1a2), remained a top down-regulated gene after correction in the early phase. This gene encodes a glutamate transporter, down-regulation of which may lead to neuronal damage due to insufficient glutamate clearance [126]. Marker gene profiles correction also adjusted rankings of non-cell-type-marker genes that were found differentially expressed before correction. For example, my results showed that Cnr1 and Adora2a were down-regulated before marker gene profiles correction. These two genes, which are involved in neurotransmitter receptors, have been reported down-regulated in the previous studies of post-mortem HD patient brains and HD mouse models [84]. However, after marker gene profiles correction, Cnr1 and Adora2a were no longer significantly dysregulated, suggesting expression changes of these two genes were at least substantially driven by cellular composition changes. In conclusion, my results suggested that down-regulation of mitochondrial and related genes in the early phase may link malfunction in energy supply to early neuronal loss; changes in the late phase could be consequential response to mHTT toxicity and neurodegeneration. Further investigation in the dysregulated mitochondrial and related genes in the early phase may reveal mechanisms that initiate neuronal loss during the early disease phase. 4.5 Cross-disease commonalities in the late disease phase The cross-disease comparison between AD and HD on the gene level did not suggest evidence of shared mechanisms during early disease progression. The lack of overlap between AD and HD in the early phase does not prove there is no commonality in disease mechanisms; 77 these just may have been undetectable using the data at hand. For example, the samples came from two distinctive brain tissues: hippocampus of AD mouse models and striatum of HD mouse models, and therefore may exhibit tissue-specific gene expression patterns obscuring commonalties. Similarities could also exist in molecular species other than mRNA. Alternatively, it could be that these two disorders do not have common mechanisms and pathways during early disease progression. In contrast to early phase, there were some similarities across AD and HD in the late phase. A few GO terms that were enriched in top-ranked down-regulated genes were overlapped in the late phase between AD and HD, including “regulation of MAPK cascade” and “G-protein coupled receptor signaling pathway”. MAPK signaling pathways contribute to the regulation of neuronal apoptosis [142]. The enrichment of down-regulated genes annotated with “regulation of MAPK cascade” may suggest shared compensatory mechanisms to counteract apoptosis in AD and HD. G-protein coupled receptors (GPCRs) mediate cellular response to neurotransmitters, hormones and other signals [143]. GPCRs are involved in APP processing by regulating α-, β- and γ-secretases and Aβ may affect functions of GPCRs [144]. In HD, reduced expressions of GPCRs have been observed in human and mouse samples, and mHTT may play a role in the transcriptional inhibition of certain GPCRs [145]. Modification in GPCR signaling pathway could be a potential shared mechanism of AD and HD in response to toxic protein aggregates. Mitochondrial dysfunction has been suggested as a common cross-disease mechanism of AD and HD [31]. Expression changes in genes related to mitochondrial complexes I and IV were identified in the early phase of HD. However, the analysis was not able to detect differential expressions of mitochondrial related genes in AD mouse models. Rhein et al. (2009) found reduced protein and activity level of mitochondrial complexes I and IV, which lead to mitochondrial defects in an AD mouse model that develops Aβ plaques and NFTs [146]. It is possible that these changes mainly happen at protein and activity level, and not distinguishable at gene expression level, and transcriptomic data are not sufficient to capture these events. The analysis had not identified dysregulated genes related to autophagy, another proposed cross-disease mechanism of AD and HD. Autophagy is transiently induced, and changes of autophagy protein levels occur primarily in oligodendrocytes [33]. Since the microarrays analyzed here only measures mRNA expression levels of a bulk tissue sample at a particular moment, it is possible that these autophagy events were not captured or not 78 distinguishable in bulk tissues. Overall, cross-disease and cross-tissue common changes in the late phase may indicate common mechanisms in response to neuronal loss and toxic protein aggregates. 4.6 Limitations and future work My meta-analysis revealed consistent and disease phase-specific transcriptional changes in the early phase of AD and HD, which were biologically interpretable and may link to pathogenesis. However, it has some limitations and generates opportunities for future work. Both cellular composition changes and cell-type specific changes contribute to the bulk tissue gene expression changes, and brain samples of AD and HD mouse models are known to have cell population changes [71]. Without marker gene profiles correction, misinterpretation can happen when cellular composition changes are treated as transcriptional changes of cell-type specific regulatory events. Validation of cell-type measurements of the samples is often not available and the cell count may not be accurate due to RNA degradation of specific cell types, thus computational approaches are often employed to estimate cell population proportions [74]. Even though multiple markers were used to estimate cell-type proportions, it was still possible that expression changes of some marker genes could be underestimated due to correction. Also, AD and HD progression can change the expression of some marker genes at the cell-type level and markers may not be able identify cells that are morphologically and functionally altered. These limitations would influence the cell population estimation. Disease specific cell-type markers are needed for more accurate estimation. One of the limitations of gene expression profiling is that it measures alterations in RNA expression, and is not able to detect other types of modification that do not affect transcription level, such as post-transcriptional modification and changes in protein activity level [86], [147]. For example, -wide association studies identified susceptibility loci in CD33, CLU and PICALM for AD [23]–[25]; however, these genes did not show expression changes in my analysis in AD mouse models. Cowan et al. (2008) reported increased calpain activity in a HD mouse model in the early phase but my results did not show RNA expression changes of calpains. These findings are not necessarily in conflict, since the changes may not involve in gene expression level alteration. I used data from a wide range of mouse models that are commonly used in AD and HD research. These mouse models recapitulate some changes observed in humans. However, these 79 models often represent an accelerated model mimicking human disease. HD mouse models often have much longer CAG repeats ( >70 repeats) than HD patients (~ 40 repeats) and transgenic AD mouse models are based on known mutations in familial AD cases, which only account for a small portion of all AD cases [27], [35]. Caution must be used in translating results from mouse studies to human. The identification of shared gene expression changes in the early phase increases our understanding of disease initiation and progression, and in my opinion these are good candidate genes to investigate early disease mechanisms. Future work can explore genomic alterations at different time points in the early disease phase, especially HD mouse samples before neuronal loss. Functional validation of candidate genes involved in the early changes are necessary to understand their roles and potentially causative mechanisms in AD and HD. 4.7 Conclusion My meta-analysis of gene expression in mouse models revealed consistent and disease phase-specific transcriptional changes, despite the considerable heterogeneity of the mouse models of AD and HD. The changes in the early phase were biologically interpretable and may link to pathogenesis. Prioritized top-ranked genes in the early disease phase can be candidate genes to study mechanisms in disease initiation and worth further investigation. Cross-disease common changes only occurred in the late phase and may indicate common mechanisms in response to neuronal loss and toxic protein aggregates in AD and HD.

80

Bibliography [1] S. Ghavami et al., “Autophagy and apoptosis dysfunction in neurodegenerative disorders,” Prog. Neurobiol., vol. 112, pp. 24–49, Jan. 2014. [2] S. Brooks, G. Higgs, N. Janghra, L. Jones, and S. B. Dunnett, “Longitudinal analysis of the behavioural phenotype in YAC128 (C57BL/6J) Huntington’s disease transgenic mice,” Brain Res. Bull., vol. 88, no. 2–3, pp. 113–120, Jun. 2012. [3] C. L. Masters, R. Bateman, K. Blennow, C. C. Rowe, R. A. Sperling, and J. L. Cummings, “Alzheimer’s disease,” Nat. Rev. Dis. Primer, vol. 1, p. 15073, Oct. 2015. [4] C. Reitz, C. Brayne, and R. Mayeux, “Epidemiology of Alzheimer disease,” Nat. Rev. Neurol., vol. 7, no. 3, pp. 137–152, Mar. 2011. [5] A. Omoumi, A. Fok, T. Greenwood, A. D. Sadovnick, H. H. Feldman, and G.-Y. R. Hsiung, “Evaluation of late-onset Alzheimer disease genetic susceptibility risks in a Canadian population,” Neurobiol. Aging, vol. 35, no. 4, p. 936.e5-12, Apr. 2014. [6] C. Takizawa, P. L. Thompson, A. van Walsem, C. Faure, and W. C. Maier, “Epidemiological and economic burden of Alzheimer’s disease: a systematic literature review of data across Europe and the United States of America,” J. Alzheimers Dis. JAD, vol. 43, no. 4, pp. 1271–1284, 2015. [7] X. Sun, K. Bromley-Brits, and W. Song, “Regulation of β-site APP-cleaving enzyme 1 gene expression and its role in Alzheimer’s Disease,” J. Neurochem., vol. 120, pp. 62–70, Jan. 2012. [8] S. Zhang, M. Zhang, F. Cai, and W. Song, “Biological function of and its role in AD pathogenesis,” Transl. Neurodegener., vol. 2, p. 15, Jul. 2013. [9] D. J. Selkoe, “Alzheimer’s disease: genes, proteins, and therapy,” Physiol. Rev., vol. 81, no. 2, pp. 741–766, Apr. 2001. [10] O. Philipson, A. Lord, A. Gumucio, P. O’Callaghan, L. Lannfelt, and L. N. G. Nilsson, “Animal models of amyloid-β-related pathologies in Alzheimer’s disease,” FEBS J., vol. 277, no. 6, pp. 1389–1409, Mar. 2010. [11] M. B. Mahoney et al., “Presenilin-Based Genetic Screens in Drosophila melanogaster Identify Novel Notch Pathway Modifiers,” Genetics, vol. 172, no. 4, pp. 2309–2324, Apr. 2006. [12] C. Conde and A. Cáceres, “Microtubule assembly, organization and dynamics in axons and dendrites,” Nat. Rev. Neurosci., vol. 10, no. 5, pp. 319–332, May 2009. [13] R. A. Quintanilla, R. von Bernhardi, J. A. Godoy, N. C. Inestrosa, and G. V. W. Johnson, “Phosphorylated tau potentiates Aβ-induced mitochondrial damage in mature neurons,” Neurobiol. Dis., vol. 71, pp. 260–269, Nov. 2014. [14] K. T. S. Wirz, S. Keitel, D. F. Swaab, J. Verhaagen, and K. Bossers, “Early molecular changes in Alzheimer disease: can we catch the disease in its presymptomatic phase?,” J. Alzheimers Dis. JAD, vol. 38, no. 4, pp. 719–740, 2014. [15] C. Van Cauwenberghe, C. Van Broeckhoven, and K. Sleegers, “The genetic landscape of Alzheimer disease: clinical implications and perspectives,” Genet. Med., vol. 18, no. 5, pp. 421–430, May 2016. [16] Y. Yang and W. Song, “Molecular links between Alzheimer’s disease and diabetes mellitus,” Neuroscience, vol. 250, pp. 140–150, Oct. 2013. [17] M. Ries and M. Sastre, “Mechanisms of Aβ Clearance and Degradation by Glial Cells,” Front. Aging Neurosci., vol. 8, Jul. 2016.

81

[18] J. C. Breitner et al., “APOE-epsilon4 count predicts age when prevalence of AD increases, then declines: the Cache County Study,” Neurology, vol. 53, no. 2, pp. 321–331, Jul. 1999. [19] G. Bu, “Apolipoprotein E and its receptors in Alzheimer’s disease: pathways, pathogenesis and therapy,” Nat. Rev. Neurosci., vol. 10, no. 5, pp. 333–344, May 2009. [20] V. Leoni, A. Solomon, and M. Kivipelto, “Links between ApoE, brain cholesterol metabolism, tau and amyloid β-peptide in patients with cognitive impairment,” Biochem. Soc. Trans., vol. 38, no. 4, pp. 1021–1025, Aug. 2010. [21] Y.-W. A. Huang, B. Zhou, M. Wernig, and T. C. Südhof, “ApoE2, ApoE3, and ApoE4 Differentially Stimulate APP Transcription and Aβ Secretion,” Cell, vol. 168, no. 3, p. 427–441.e21, Jan. 2017. [22] R. Guerreiro et al., “TREM2 variants in Alzheimer’s disease,” N. Engl. J. Med., vol. 368, no. 2, pp. 117–127, Jan. 2013. [23] A. Griciuc et al., “Alzheimer’s disease risk gene CD33 inhibits microglial uptake of amyloid beta,” Neuron, vol. 78, no. 4, pp. 631–643, May 2013. [24] P. Hollingworth et al., “Common variants at ABCA7, MS4A6A/MS4A4E, EPHA1, CD33 and CD2AP are associated with Alzheimer’s disease,” Nat. Genet., vol. 43, no. 5, pp. 429– 435, May 2011. [25] G. Jun et al., “Meta-analysis confirms CR1, CLU, and PICALM as alzheimer disease risk loci and reveals interactions with APOE genotypes,” Arch. Neurol., vol. 67, no. 12, pp. 1473–1484, Dec. 2010. [26] D. Bano, F. Zanetti, Y. Mende, and P. Nicotera, “Neurodegenerative processes in Huntington’s disease,” Cell Death Dis., vol. 2, no. 11, p. e228, Nov. 2011. [27] G. P. Bates et al., “Huntington disease,” Nat. Rev. Dis. Primer, p. 15005, Apr. 2015. [28] J. F. Gusella, M. E. MacDonald, and J.-M. Lee, “Genetic modifiers of Huntington’s disease,” Mov. Disord. Off. J. Mov. Disord. Soc., vol. 29, no. 11, pp. 1359–1365, Sep. 2014. [29] A. J. Milnerwood and L. A. Raymond, “Early synaptic pathophysiology in neurodegeneration: insights from Huntington’s disease,” Trends Neurosci., vol. 33, no. 11, pp. 513–523, Nov. 2010. [30] D. E. Ehrnhoefer, B. K. Y. Wong, and M. R. Hayden, “Convergent pathogenic pathways in Alzheimer’s and Huntington disease: Shared targets for drug development,” Nat. Rev. Drug Discov., vol. 10, no. 11, pp. 853–867, Oct. 2011. [31] L. Naia, I. L. Ferreira, E. Ferreiro, and A. C. Rego, “Mitochondrial Ca2+ handling in Huntington’s and Alzheimer’s diseases – Role of ER-mitochondria crosstalk,” Biochem. Biophys. Res. Commun., 2016. [32] R. A. Nixon, “The role of autophagy in neurodegenerative disease,” Nat. Med., vol. 19, no. 8, pp. 983–997, Aug. 2013. [33] D. D. O. Martin, S. Ladha, D. E. Ehrnhoefer, and M. R. Hayden, “Autophagy in Huntington disease and huntingtin in autophagy,” Trends Neurosci., vol. 38, no. 1, pp. 26– 35, Jan. 2015. [34] C. Pennanen et al., “Hippocampus and entorhinal cortex in mild cognitive impairment and early AD,” Neurobiol. Aging, vol. 25, no. 3, pp. 303–310, Mar. 2004. [35] S. J. Webster, A. D. Bachstetter, P. T. Nelson, F. A. Schmitt, and L. J. Van Eldik, “Using mice to model Alzheimer’s dementia: an overview of the clinical disease and the preclinical behavioral changes in 10 mouse models,” Front. Genet., vol. 5, p. 88, 2014.

82

[36] S. J. Tabrizi et al., “Biological and clinical changes in premanifest and early stage Huntington’s disease in the TRACK-HD study: the 12-month longitudinal analysis,” Lancet Neurol., vol. 10, no. 1, pp. 31–42, Jan. 2011. [37] H. Maurin et al., “Early structural and functional defects in synapses and myelinated axons in stratum lacunosum moleculare in two preclinical models for tauopathy,” PloS One, vol. 9, no. 2, p. e87605, 2014. [38] K. Santacruz et al., “Tau suppression in a neurodegenerative mouse model improves memory function,” Science, vol. 309, no. 5733, pp. 476–481, Jul. 2005. [39] C. S. von Koch et al., “Generation of APLP2 KO mice and early postnatal lethality in APLP2/APP double KO mice,” Neurobiol. Aging, vol. 18, no. 6, pp. 661–669, Dec. 1997. [40] S. Capsoni, G. Ugolini, A. Comparini, F. Ruberti, N. Berardi, and A. Cattaneo, “Alzheimer-like neurodegeneration in aged antinerve growth factor transgenic mice,” Proc. Natl. Acad. Sci. U. S. A., vol. 97, no. 12, pp. 6826–6831, Jun. 2000. [41] S. Ramaswamy, J. L. McBride, and J. H. Kordower, “Animal Models of Huntington’s Disease,” ILAR J., vol. 48, no. 4, pp. 356–373, Jan. 2007. [42] L. Mangiarini et al., “Exon 1 of the HD Gene with an Expanded CAG Repeat Is Sufficient to Cause a Progressive Neurological Phenotype in Transgenic Mice,” Cell, vol. 87, no. 3, pp. 493–506, Nov. 1996. [43] C. Cepeda, D. M. Cummings, V. M. André, S. M. Holley, and M. S. Levine, “Genetic mouse models of Huntington’s disease: focus on electrophysiological mechanisms,” ASN NEURO, vol. 2, no. 2, Apr. 2010. [44] K. Hsiao et al., “Correlative memory deficits, Abeta elevation, and amyloid plaques in transgenic mice,” Science, vol. 274, no. 5284, pp. 99–102, Oct. 1996. [45] T. D. Stein, N. J. Anders, C. DeCarli, S. L. Chan, M. P. Mattson, and J. A. Johnson, “Neutralization of transthyretin reverses the neuroprotective effects of secreted amyloid precursor protein (APP) in APPSW mice resulting in tau phosphorylation and loss of hippocampal neurons: support for the amyloid hypothesis,” J. Neurosci. Off. J. Soc. Neurosci., vol. 24, no. 35, pp. 7707–7717, Sep. 2004. [46] S. A. Frautschy et al., “Microglial response to amyloid plaques in APPsw transgenic mice,” Am. J. Pathol., vol. 152, no. 1, pp. 307–317, Jan. 1998. [47] J. J. Palop, J. Chin, and L. Mucke, “A network dysfunction perspective on neurodegenerative diseases,” Nature, vol. 443, no. 7113, pp. 768–773, Oct. 2006. [48] A. L. Wright et al., “Neuroinflammation and neuronal loss precede Aβ plaque deposition in the hAPP-J20 mouse model of Alzheimer’s disease,” PloS One, vol. 8, no. 4, p. e59586, 2013. [49] H. Oakley et al., “Intraneuronal β-Amyloid Aggregates, Neurodegeneration, and Neuron Loss in Transgenic Mice with Five Familial Alzheimer’s Disease Mutations: Potential Factors in Amyloid Plaque Formation,” J. Neurosci., vol. 26, no. 40, pp. 10129–10140, Oct. 2006. [50] M. Matarin et al., “A genome-wide gene-expression analysis and database in transgenic mice during development of amyloid or tau pathology,” Cell Rep., vol. 10, no. 4, pp. 633– 644, Feb. 2015. [51] J. C. Richardson et al., “Ultrastructural and behavioural changes precede amyloid deposition in a transgenic model of Alzheimer’s disease,” Neuroscience, vol. 122, no. 1, pp. 213–228, Nov. 2003.

83

[52] M. Matarin et al., “A Genome-wide Gene-Expression Analysis and Database in Transgenic Mice during Development of Amyloid or Tau Pathology,” Cell Rep., vol. 10, no. 4, pp. 633–644, Feb. 2015. [53] D. R. Howlett et al., “Abeta deposition and related pathology in an APP x PS1 transgenic mouse model of Alzheimer’s disease,” Histol. Histopathol., vol. 23, no. 1, pp. 67–76, Jan. 2008. [54] J. H. Caldwell, M. Klevanski, M. Saar, and U. C. Müller, “Roles of the amyloid precursor protein family in the peripheral nervous system,” Mech. Dev., vol. 130, no. 6–8, pp. 433– 446, Jun. 2013. [55] I. Arisi et al., “Gene expression biomarkers in the brain of a mouse model for Alzheimer’s disease: mining of microarray data by logic classification and feature selection,” J. Alzheimers Dis. JAD, vol. 24, no. 4, pp. 721–738, 2011. [56] J. R. Naranjo et al., “Activating transcription factor 6 derepression mediates neuroprotection in Huntington disease,” J. Clin. Invest., vol. 126, no. 2, pp. 627–638, Feb. 2016. [57] E. A. Thomas et al., “In vivo cell-autonomous transcriptional abnormalities revealed in mice expressing mutant huntingtin in striatal but not cortical neurons,” Hum. Mol. Genet., vol. 20, no. 6, pp. 1049–1060, Mar. 2011. [58] E. A. Thomas et al., “The HDAC inhibitor 4b ameliorates the disease phenotype and transcriptional abnormalities in Huntington’s disease transgenic mice,” Proc. Natl. Acad. Sci. U. S. A., vol. 105, no. 40, pp. 15564–15569, Oct. 2008. [59] X. Gu et al., “N17 Modifies Mutant Huntingtin Nuclear Pathogenesis and Severity of Disease in HD BAC Transgenic Mice,” Neuron, vol. 85, no. 4, pp. 726–741, Feb. 2015. [60] E. J. Slow et al., “Selective striatal neuronal loss in a YAC128 mouse model of Huntington disease,” Hum. Mol. Genet., vol. 12, no. 13, pp. 1555–1567, Jul. 2003. [61] T. B. Brown, A. I. Bogush, and M. E. Ehrlich, “Neocortical expression of mutant huntingtin is not required for alterations in striatal gene expression or motor dysfunction in a transgenic mouse,” Hum. Mol. Genet., vol. 17, no. 20, pp. 3095–3104, Oct. 2008. [62] P. F. Shelbourne et al., “A Huntington’s Disease CAG Expansion at the Murine Hdh Locus Is Unstable and Associated with Behavioural Abnormalities in Mice,” Hum. Mol. Genet., vol. 8, no. 5, pp. 763–774, May 1999. [63] C.-H. Lin et al., “Neurological abnormalities in a knock-in mouse model of Huntington’s disease,” Hum. Mol. Genet., vol. 10, no. 2, pp. 137–144, Jan. 2001. [64] V. C. Wheeler et al., “Long glutamine tracts cause nuclear localization of a novel form of huntingtin in medium spiny striatal neurons in HdhQ92 and HdhQ111 knock-in mice,” Hum. Mol. Genet., vol. 9, no. 4, pp. 503–513, Mar. 2000. [65] E. Fossale et al., “Differential effects of the Huntington’s disease CAG mutation in striatum and cerebellum are quantitative not qualitative,” Hum. Mol. Genet., vol. 20, no. 21, pp. 4258–4267, Nov. 2011. [66] F. M. Ribeiro et al., “Metabotropic glutamate receptor 5 knockout promotes motor and biochemical alterations in a mouse model of Huntington’s disease,” Hum. Mol. Genet., vol. 23, no. 8, pp. 2030–2042, Apr. 2014. [67] S. Brooks, G. Higgs, L. Jones, and S. B. Dunnett, “Longitudinal analysis of the behavioural phenotype in Hdh(CAG)150 Huntington’s disease knock-in mice,” Brain Res. Bull., vol. 88, no. 2–3, pp. 182–188, Jun. 2012.

84

[68] A. Serrano-Pozo, M. P. Frosch, E. Masliah, and B. T. Hyman, “Neuropathological Alterations in Alzheimer Disease,” Cold Spring Harb. Perspect. Med., vol. 1, no. 1, p. a006189, Sep. 2011. [69] S. Saxena and P. Caroni, “Selective Neuronal Vulnerability in Neurodegenerative Diseases: from Stressor Thresholds to Degeneration,” Neuron, vol. 71, no. 1, pp. 35–48, Jul. 2011. [70] A. Hodges et al., “Regional and cellular gene expression changes in human Huntington’s disease brain,” Hum. Mol. Genet., vol. 15, no. 6, pp. 965–977, Mar. 2006. [71] K. Srinivasan et al., “Untangling the brain’s neuroinflammatory and neurodegenerative transcriptional responses,” Nat. Commun., vol. 7, p. 11295, Apr. 2016. [72] A. Kuhn et al., “Mutant huntingtin’s effects on striatal gene expression in mice recapitulate changes observed in human Huntington’s disease brain and do not differ with mutant huntingtin length or wild-type huntingtin dosage,” Hum. Mol. Genet., vol. 16, no. 15, pp. 1845–1861, Aug. 2007. [73] M. Hokama et al., “Altered Expression of Diabetes-Related Genes in Alzheimer’s Disease Brains: The Hisayama Study,” Cereb. Cortex, vol. 24, no. 9, pp. 2476–2488, Sep. 2014. [74] M. Chikina, E. Zaslavsky, and S. C. Sealfon, “CellCODE: A robust latent variable approach to differential expression analysis for heterogeneous cell populations,” Bioinforma. Oxf. Engl., Jan. 2015. [75] B. O. Mancarci et al., “NeuroExpresso: A cross-laboratory database of brain cell-type expression profiles with applications to marker gene identification and bulk brain tissue transcriptome interpretation,” bioRxiv, p. 89219, Nov. 2016. [76] M. Schena, D. Shalon, R. W. Davis, and P. O. Brown, “Quantitative monitoring of gene expression patterns with a complementary DNA microarray,” Science, vol. 270, no. 5235, pp. 467–470, Oct. 1995. [77] T. Barrett et al., “NCBI GEO: archive for functional genomics data sets—update,” Nucleic Acids Res., vol. 41, no. D1, pp. D991–D995, Jan. 2013. [78] N. Kolesnikov et al., “ArrayExpress update--simplifying data submissions,” Nucleic Acids Res., vol. 43, no. Database issue, pp. D1113-1116, Jan. 2015. [79] M. D’Onofrio et al., “Early inflammation and immune response mRNAs in the brain of AD11 anti-NGF mice,” Neurobiol. Aging, vol. 32, no. 6, pp. 1007–1022, Jun. 2011. [80] S. Hong et al., “Complement and microglia mediate early synapse loss in Alzheimer mouse models,” Science, vol. 352, no. 6286, pp. 712–716, May 2016. [81] C. A. Saura, A. Parra-Damas, and L. Enriquez-Barreto, “Gene expression parallels synaptic excitability and plasticity changes in Alzheimer’s disease,” Front. Cell. Neurosci., vol. 9, Aug. 2015. [82] K. Paesler et al., “Limited effects of an eIF2αS51A allele on neurological impairments in the 5xFAD mouse model of Alzheimer’s disease,” Neural Plast., vol. 2015, p. 825157, 2015. [83] P. Giles, L. Elliston, G. V. Higgs, S. P. Brooks, and S. B. Dunnett, “Longitudinal analysis of gene expression and behaviour in the HdhQ150 mouse model of Huntington’s disease,” Brain Res. Bull., vol. 88, no. 2–3, pp. 199–209, Jun. 2012. [84] T. Seredenina and R. Luthi-Carter, “What have we learned from gene expression profiles in Huntington’s disease?,” Neurobiol. Dis., vol. 45, no. 1, pp. 83–98, Jan. 2012.

85

[85] A. Ramasamy, A. Mondry, C. C. Holmes, and D. G. Altman, “Key Issues in Conducting a Meta-Analysis of Gene Expression Microarray Datasets,” PLoS Med., vol. 5, no. 9, Sep. 2008. [86] J. Feichtinger, G. G. Thallinger, R. J. McFarlane, and L. D. Larcombe, “Microarray Meta- Analysis: From Data to Expression to Biological Relationships,” in Computational Medicine, Z. Trajanoski, Ed. Springer Vienna, 2012, pp. 59–77. [87] F. Hong, R. Breitling, C. W. McEntee, B. S. Wittner, J. L. Nemhauser, and J. Chory, “RankProd: a bioconductor package for detecting differentially expressed genes in meta- analysis,” Bioinformatics, vol. 22, no. 22, pp. 2825–2827, Nov. 2006. [88] U. Siangphoe and K. J. Archer, “Estimation of random effects and identifying heterogeneous genes in meta-analysis of gene expression studies,” Brief. Bioinform., Jun. 2016. [89] C. Y. Demirkale, D. Nettleton, and T. Maiti, “Linear mixed model selection for false discovery rate control in microarray data analysis,” Biometrics, vol. 66, no. 2, pp. 621– 629, Jun. 2010. [90] J. M. Winkler and H. S. Fox, “Transcriptome meta-analysis reveals a central role for sex steroids in the degeneration of hippocampal neurons in Alzheimer’s disease,” BMC Syst. Biol., vol. 7, no. 1, p. 51, Jun. 2013. [91] R Core Team, “R: A language and environment for statistical computing,” R Found. Stat. Comput. Vienna Austria, 2016. [92] A. Zoubarev et al., “Gemma: a resource for the reuse, sharing and meta-analysis of expression profiling data,” Bioinformatics, vol. 28, no. 17, pp. 2272–2273, Sep. 2012. [93] K. Becanovic et al., “Transcriptional changes in Huntington disease identified using genome-wide expression profiling and cross-platform analysis,” Hum. Mol. Genet., vol. 19, no. 8, pp. 1438–1452, Apr. 2010. [94] S. Rogic, A. Wong, and P. Pavlidis, “Meta-Analysis of Gene Expression Patterns in Animal Models of Prenatal Alcohol Exposure Suggests Role for Protein Synthesis Inhibition and Chromatin Remodeling,” Alcohol. Clin. Exp. Res., vol. 40, no. 4, pp. 717– 727, Apr. 2016. [95] C. Miller, “simpleaffy: Very simple high level analysis of Affymetrix data.,” Bioconductor. [Online]. Available: http://bioinformatics.picr.man.ac.uk/simpleaffy. [Accessed: 29-Mar-2017]. [96] J. Brettschneider, F. Collin, B. M. Bolstad, and T. P. Speed, “Quality assessment for short oligonucleotide microarray data,” ArXiv07100178 Stat, Sep. 2007. [97] L. Gautier, L. Cope, B. M. Bolstad, and R. A. Irizarry, “affy--analysis of Affymetrix GeneChip data at the probe level,” Bioinforma. Oxf. Engl., vol. 20, no. 3, pp. 307–315, Feb. 2004. [98] M. E. Ritchie et al., “limma powers differential expression analyses for RNA-sequencing and microarray studies,” Nucleic Acids Res., Jan. 2015. [99] W. E. Johnson, C. Li, and A. Rabinovic, “Adjusting batch effects in microarray expression data using empirical Bayes methods,” Biostatistics, vol. 8, no. 1, pp. 118–127, Jan. 2007. [100] L. Toker, M. Feng, and P. Pavlidis, “Whose sample is it anyway? Widespread misannotation of samples in transcriptomics studies,” F1000Research, vol. 5, Sep. 2016. [101] Y. Benjamini and Y. Hochberg, “Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing,” J. R. Stat. Soc. Ser. B Methodol., vol. 57, no. 1, pp. 289–300, 1995. 86

[102] D. Bates, M. Mächler, B. Bolker, and S. Walker, “Fitting Linear Mixed-Effects Models using lme4,” ArXiv14065823 Stat, Jun. 2014. [103] J. M. Chambers and T. J. Hastie, Statistical Models in S. Boca Raton, Fla.: Chapman and Hall/CRC, 1991. [104] J. Gillis, M. Mistry, and P. Pavlidis, “Gene function analysis in complex data sets using ErmineJ,” Nat. Protoc., vol. 5, no. 6, pp. 1148–1159, Jun. 2010. [105] The Reference Genome Group of the Gene Ontology Consortium, “The Gene Ontology’s Reference Genome Project: A Unified Framework for Functional Annotation across Species,” PLoS Comput Biol, vol. 5, no. 7, p. e1000431, Jul. 2009. [106] S. Ballouz, P. Pavlidis, and J. Gillis, “Using predictive specificity to determine when gene set analysis is biologically meaningful,” Nucleic Acids Res., p. gkw957, Oct. 2016. [107] R. J. Kleiman et al., “Dendritic spine density deficits in the hippocampal CA1 region of young Tg2576 mice are ameliorated with the PDE9A inhibitor PF-04447943,” Alzheimers Dement. J. Alzheimers Assoc., vol. 6, no. 4, pp. S563–S564, Jul. 2010. [108] S. Pereson et al., “Progranulin expression correlates with dense-core amyloid plaque burden in Alzheimer disease mouse models,” J. Pathol., vol. 219, no. 2, pp. 173–181, Oct. 2009. [109] H. Noh, C. Park, S. Park, Y. S. Lee, S. Y. Cho, and H. Seo, “Prediction of miRNA-mRNA associations in Alzheimer’s disease mice using network topology,” BMC Genomics, vol. 15, p. 644, 2014. [110] A. H. Nagahara et al., “Neuroprotective effects of brain-derived neurotrophic factor in rodent and primate models of Alzheimer’s disease,” Nat. Med., vol. 15, no. 3, pp. 331– 337, Mar. 2009. [111] D. M. Cummings et al., “First effects of rising amyloid-β in transgenic mouse brain: synaptic transmission and gene expression,” Brain J. Neurol., vol. 138, no. Pt 7, pp. 1992– 2004, Jul. 2015. [112] V. A. Polito et al., “Selective clearance of aberrant tau proteins and rescue of neurotoxicity by transcription factor EB,” EMBO Mol. Med., vol. 6, no. 9, pp. 1142–1160, Sep. 2014. [113] H. Li et al., “Soluble amyloid precursor protein (APP) regulates transthyretin and Klotho gene expression without rescuing the essential function of APP,” Proc. Natl. Acad. Sci. U. S. A., vol. 107, no. 40, pp. 17362–17367, Oct. 2010. [114] M. Kurosawa et al., “Depletion of p62 reduces nuclear inclusions and paradoxically ameliorates disease phenotypes in Huntington’s model mice,” Hum. Mol. Genet., vol. 24, no. 4, pp. 1092–1105, Feb. 2015. [115] E. Ostergaard et al., “Respiratory chain complex I deficiency due to NDUFA12 mutations as a new cause of Leigh syndrome,” J. Med. Genet., vol. 48, no. 11, pp. 737–740, Nov. 2011. [116] E. Ostergaard et al., “Mutations in COA3 cause isolated complex IV deficiency associated with neuropathy, exercise intolerance, obesity, and short stature,” J. Med. Genet., vol. 52, no. 3, pp. 203–207, Mar. 2015. [117] M. J. Baker, T. Tatsuta, and T. Langer, “Quality Control of Mitochondrial Proteostasis,” Cold Spring Harb. Perspect. Biol., vol. 3, no. 7, Jul. 2011. [118] D. J. Pagliarini et al., “Involvement of a mitochondrial phosphatase in the regulation of ATP production and insulin secretion in pancreatic beta cells,” Mol. Cell, vol. 19, no. 2, pp. 197–207, Jul. 2005. 87

[119] T. Bhuin and J. K. Roy, “Rab11 in Disease Progression,” Int. J. Mol. Cell. Med., vol. 4, no. 1, pp. 1–8, 2015. [120] A. J. Groffen et al., “Doc2b is a High Affinity Ca2+ Sensor for Spontaneous Neurotransmitter Release,” Science, vol. 327, no. 5973, pp. 1614–1618, Mar. 2010. [121] L. A. Ehrman et al., “The LIM homeobox gene Isl1 is required for the correct development of the striatonigral pathway in the mouse,” Proc. Natl. Acad. Sci. U. S. A., vol. 110, no. 42, pp. E4026-4035, Oct. 2013. [122] C. P. Blomeley, L. A. Kehoe, and E. Bracci, “Substance P Mediates Excitatory Interactions between Striatal Projection Neurons,” J. Neurosci., vol. 29, no. 15, pp. 4953– 4963, Apr. 2009. [123] C. Schmitz et al., “Hippocampal Neuron Loss Exceeds Amyloid Plaque Load in a Transgenic Mouse Model of Alzheimer’s Disease,” Am. J. Pathol., vol. 164, no. 4, pp. 1495–1502, Apr. 2004. [124] E. H. Aylward et al., “Onset and rate of striatal atrophy in preclinical Huntington disease,” Neurology, vol. 63, no. 1, pp. 66–72, Jul. 2004. [125] R. H. Myers et al., “Decreased Neuronal and Increased Oligodendroglial Densities in Huntington’s Disease Caudate Nucleus,” J. Neuropathol. Exp. Neurol., vol. 50, no. 6, pp. 729–742, Nov. 1991. [126] M. Faideau et al., “In vivo expression of polyglutamine-expanded huntingtin by mouse striatal astrocytes impairs glutamate transport: a correlation with Huntington’s disease subjects,” Hum. Mol. Genet., vol. 19, no. 15, pp. 3053–3067, Aug. 2010. [127] A. H. P. Jansen et al., “Frequency of nuclear mutant huntingtin inclusion formation in neurons and glia is cell-type-specific,” Glia, vol. 65, no. 1, pp. 50–61, Jan. 2017. [128] A. A. Rikani et al., “The mechanism of degeneration of striatal neuronal subtypes in Huntington disease,” Ann. Neurosci., vol. 21, no. 3, pp. 112–114, Jul. 2014. [129] R. Smith et al., “Cholinergic neuronal defect without cell loss in Huntington’s disease,” Hum. Mol. Genet., vol. 15, no. 21, pp. 3119–3131, Nov. 2006. [130] V. Tseveleki et al., “Comparative gene expression analysis in mouse models for multiple sclerosis, Alzheimer’s disease and stroke for identifying commonly regulated and disease- specific gene changes,” Genomics, vol. 96, no. 2, pp. 82–91, Aug. 2010. [131] L. Puglielli, R. E. Tanzi, and D. M. Kovacs, “Alzheimer’s disease: the cholesterol connection,” Nat. Neurosci., vol. 6, no. 4, pp. 345–351, Apr. 2003. [132] W. G. Wood, L. Li, W. E. Müller, and G. P. Eckert, “Cholesterol as a causative factor in Alzheimer’s disease: a debatable hypothesis,” J. Neurochem., vol. 129, no. 4, pp. 559– 572, May 2014. [133] G. Di Paolo and T.-W. Kim, “Linking Lipids to Alzheimer’s Disease: Cholesterol and Beyond,” Nat. Rev. Neurosci., vol. 12, no. 5, pp. 284–296, May 2011. [134] Y. Kim, C. Kim, H. Y. Jang, and I. Mook-Jung, “Inhibition of Cholesterol Biosynthesis Reduces γ-Secretase Activity and Amyloid-β Generation,” J. Alzheimers Dis., vol. 51, no. 4, pp. 1057–1068, Jan. 2016. [135] C. Marquer et al., “Local cholesterol increase triggers amyloid precursor protein-Bace1 clustering in lipid rafts and rapid endocytosis,” FASEB J. Off. Publ. Fed. Am. Soc. Exp. Biol., vol. 25, no. 4, pp. 1295–1305, Apr. 2011. [136] M. E. Benoit, M. X. Hernandez, M. L. Dinh, F. Benavente, O. Vasquez, and A. J. Tenner, “C1q-induced LRP1B and GPR6 proteins expressed early in Alzheimer disease mouse

88

models, are essential for the C1q-mediated protection against amyloid-β neurotoxicity,” J. Biol. Chem., vol. 288, no. 1, pp. 654–665, Jan. 2013. [137] E. Zotova, C. Holmes, D. Johnston, J. W. Neal, J. a. R. Nicoll, and D. Boche, “Microglial alterations in human Alzheimer’s disease following Aβ42 immunization,” Neuropathol. Appl. Neurobiol., vol. 37, no. 5, pp. 513–524, Aug. 2011. [138] M. Kobayashi, H. Konishi, A. Sayo, T. Takai, and H. Kiyama, “TREM2/DAP12 Signal Elicits Proinflammatory Response in Microglia and Exacerbates Neuropathic Pain,” J. Neurosci. Off. J. Soc. Neurosci., vol. 36, no. 43, pp. 11138–11150, Oct. 2016. [139] M. Damiano, L. Galvan, N. Déglon, and E. Brouillet, “Mitochondria in Huntington’s disease,” Biochim. Biophys. Acta BBA - Mol. Basis Dis., vol. 1802, no. 1, pp. 52–61, Jan. 2010. [140] C. L. Benn et al., “Huntingtin modulates transcription, occupies gene promoters in vivo and binds directly to DNA in a polyglutamine-dependent manner,” J. Neurosci. Off. J. Soc. Neurosci., vol. 28, no. 42, pp. 10720–10733, Oct. 2008. [141] K. J. Gerber, K. E. Squires, and J. R. Hepler, “Roles for Regulator of G Protein Signaling Proteins in Synaptic Signaling and Plasticity,” Mol. Pharmacol., vol. 89, no. 2, pp. 273– 286, Feb. 2016. [142] E. K. Kim and E.-J. Choi, “Pathological roles of MAPK signaling pathways in human diseases,” Biochim. Biophys. Acta BBA - Mol. Basis Dis., vol. 1802, no. 4, pp. 396–405, Apr. 2010. [143] D. M. Rosenbaum, S. G. F. Rasmussen, and B. K. Kobilka, “The structure and function of G-protein-coupled receptors,” Nature, vol. 459, no. 7245, pp. 356–363, May 2009. [144] A. Thathiah and B. De Strooper, “The role of G protein-coupled receptors in the pathology of Alzheimer’s disease,” Nat. Rev. Neurosci., vol. 12, no. 2, pp. 73–87, Feb. 2011. [145] M. J. Dowie, E. L. Scotter, E. Molinari, and M. Glass, “The therapeutic potential of G- protein coupled receptors in Huntington’s disease,” Pharmacol. Ther., vol. 128, no. 2, pp. 305–323, Nov. 2010. [146] V. Rhein et al., “Amyloid-β and tau synergistically impair the oxidative phosphorylation system in triple transgenic Alzheimer’s disease mice,” Proc. Natl. Acad. Sci., vol. 106, no. 47, pp. 20057–20062, Nov. 2009. [147] H. Hong, Q. Hong, J. Liu, W. Tong, and L. Shi, “Estimating relative noise to signal in DNA microarray data,” Int. J. Bioinforma. Res. Appl., vol. 9, no. 5, pp. 433–448, 2013.

89