Washington University School of Medicine Digital Commons@Becker

ICTS Faculty Publications Institute of Clinical and Translational Sciences

2008 Variations in the transcriptome of Alzheimer's disease reveal molecular networks involved in cardiovascular diseases Monika Ray Washington University in St Louis

Jianhua Ruan University of Texas at San Antonio

Weixiong Zhang Washington University School of Medicine in St. Louis

Follow this and additional works at: https://digitalcommons.wustl.edu/icts_facpubs Part of the Medicine and Health Sciences Commons

Recommended Citation Ray, Monika; Ruan, Jianhua; and Zhang, Weixiong, "Variations in the transcriptome of Alzheimer's disease reveal molecular networks involved in cardiovascular diseases". Genome Biology, R148. 2008. Paper 98. https://digitalcommons.wustl.edu/icts_facpubs/98

This Article is brought to you for free and open access by the Institute of Clinical and Translational Sciences at Digital Commons@Becker. It has been accepted for inclusion in ICTS Faculty Publications by an authorized administrator of Digital Commons@Becker. For more information, please contact [email protected]. Open Access Research2008RayetVolume al. 9, Issue 10, Article R148 Variations in the transcriptome of Alzheimer's disease reveal molecular networks involved in cardiovascular diseases Monika Ray¤*, Jianhua Ruan¤† and Weixiong Zhang*‡

Addresses: *Washington University School of Engineering, Department of Computer Science and Engineering, 1 Brookings Drive, Saint Louis, Missouri 63130, USA. †University of Texas at San Antonio, Department of Computer Science, One UTSA Circle, San Antonio, Texas 78249, USA. ‡Washington University School of Medicine, Department of Genetics, 660 S. Euclid Ave, Saint Louis, Missouri 63110, USA.

¤ These authors contributed equally to this work.

Correspondence: Weixiong Zhang. Email: [email protected]

Published: 8 October 2008 Received: 2 May 2008 Revised: 23 August 2008 Genome Biology 2008, 9:R148 (doi:10.1186/gb-2008-9-10-r148) Accepted: 8 October 2008 The electronic version of this article is the complete one and can be found online at http://genomebiology.com/2008/9/10/R148

© 2008 Ray et al.; licensee BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Alzheimer's

Analysis link of microarray to cardiovascular data reveals disease extensive links between Alzheimer’s disease and cardiovascular diseases.

Abstract

Background: Because of its polygenic nature, Alzheimer's disease is believed to be caused not by defects in single , but rather by variations in a large number of genes and their complex interactions. A systems biology approach, such as the generation of a network of co-expressed genes and the identification of functional modules and cis-regulatory elements, to extract insights and knowledge from microarray data will lead to a better understanding of complex diseases such as Alzheimer's disease. In this study, we perform a series of analyses using co-expression networks, cis-regulatory elements, and functions of co-expressed modules to analyze single-cell gene expression data from normal and Alzheimer's disease-affected subjects.

Results: We identified six co-expressed gene modules, each of which represented a biological process perturbed in Alzheimer's disease. Alzheimer's disease-related genes, such as APOE, A2M, PON2 and MAP4, and cardiovascular disease-associated genes, including COMT, CBS and WNK1, all congregated in a single module. Some of the disease-related genes were hub genes while many of them were directly connected to one or more hub genes. Further investigation of this disease- associated module revealed cis-regulatory elements that match to the binding sites of transcription factors involved in Alzheimer's disease and cardiovascular disease.

Conclusion: Our results show the extensive links between Alzheimer's disease and cardiovascular disease at the co-expression and co-regulation levels, providing further evidence for the hypothesis that cardiovascular disease and Alzheimer's disease are linked. Our results support the notion that diseases in which the same set of biochemical pathways are affected may tend to co-occur with each other.

Background mon form of dementia. Due to its polygenic nature, AD is Late-onset Alzheimer's disease (AD) is a complex progressive believed to be caused not by defects in single genes, but rather neurodegenerative disorder of the brain and is the most com- by variations in a large number of genes and their complex

Genome Biology 2008, 9:R148 http://genomebiology.com/2008/9/10/R148 Genome Biology 2008, Volume 9, Issue 10, Article R148 Ray et al. R148.2

interactions that ultimately contribute to the broad spectrum that was designed to optimize a modularity function and of disease phenotypes. Similar to other neurodegenerative automatically identify the appropriate number of modules. diseases, AD has not yielded to conventional strategies for The cis-regulatory elements discovered in the promoter elucidating the genetic mechanisms and genetic risk factors. regions of disease related genes provide further insights into Therefore, a systems biology approach, such as the one that the possible transcriptional regulation of the genes involved was successfully employed by Chen and colleagues [1], is an in AD and their connection to CVDs, stroke and diabetes. effective alternative for analyzing complex diseases. Moreover, the single cell dataset [6] used in this study is less noisy compared to the mixed cell microarray data that were Most studies on AD first select a set of differentially expressed analyzed by Miller et al. Additionally, the single cell expres- genes on which further analysis is performed. However, com- sion data are from the entorhinal cortex, a region of the brain paring lists of genes from various AD studies is not efficient known to be the germinal site of AD and, therefore, represent without new methods being developed, which sometimes can the early stage of AD (incipient AD). Most importantly, unlike become data specific. Therefore, organizing genes into mod- multiple studies comparing AD and ageing [5,7,8], to the best ules or a modular approach that is based on criteria such as of our knowledge, our study is the first that has identified co-expression or co-regulation helps in comparing results links between CVDs, AD/neurodegenerative diseases and across studies and obtaining a global overview of the disease diabetes using a transcriptome-based systems biology pathogenesis. In this paper, we perform a transcriptome- approach. However, despite the differences in objectives, data based study by combining the analysis of co-expressed gene and methods in the study by Miller et al. and in our study, networks and the identification of functional modules and there was a significant overlap in the results obtained. This cis-regulatory elements in differentially expressed genes to indicates that the results reported here represent phenomena elucidate the biological processes involved in AD [2-4]. We that are generalizable. We have established interesting links first construct modules of highly correlated genes (that is, between the two studies, thereby highlighting the commonal- those with high similarity in their expression profiles), and ities between AD, ageing, and CVDs. We believe that analyses then identify statistically significant regulatory cis-elements such as ours and that by Miller et al. are the pieces of a puzzle (motifs) present in the genes. The analysis follows the proce- that illustrates the underlying mechanisms involved in AD dure shown in Figure 1. and the manner in which AD links to other conditions/dis- eases. The present work unveiled 1,663 genes that are differentially expressed in AD. A co-expression network method [2,3] was applied to these genes, resulting in 6 modules of co-expressed Results and discussion genes with each module representing key biological processes Significance analysis of microarrays (SAM) [9] identified perturbed in AD. Within the 6 modules, we identified 107 1,663 differentially expressed genes between AD samples and highly connected ('hub') genes. Functional annotation of controls at a false discovery rate of 0.1% (see Materials and these genes based on their association to human diseases methods). The enriched biological processes for 1,663 genes resulted in the identification of 18 disease-related cardiovas- are shown in Additional data file 1. Many processes known to cular diseases (CVDs), AD/neurodegenerative diseases, be affected in AD were enriched in the list of 1,663 transcripts. stroke and diabetes) transcripts aggregating in one module Principal components analysis [10] is an unsupervised classi- (referred to as the disease associated module). While some of fication method in which the data are segregated into classes. these 18 genes were hub genes, many of them directly con- When principal components analysis was applied to a matrix nected to one or more hub genes. Furthermore, a genome- consisting of the expression of 1,663 differentially expressed wide motif analysis [4] of the genes in the disease-associated genes and 33 subjects (10 normal and 20 AD affected), an module revealed several cis-regulatory elements that optimal separation of subjects into two groups was observed matched to the binding sites of transcription factors involved (Figure 2). The axes in Figure 2 correspond to the principal in diseases that are known to co-occur with AD. The final components (PCs), with the first PC accounting for 45.5% of result was a set of co-expressed and co-regulated modules the variance and the second PC accounting for 14.9% of the describing the higher level characteristics linking AD and variance. This demonstrated that the samples are distin- CVDs. guishable based on the expression profiles of these 1,663 genes. This implies that the samples in this dataset are well Recently, Miller et al. [5] used a systems biology approach to characterized and the information content in these differen- identify the commonalities between AD and ageing. Our work tially expressed genes is high. is significantly different from that by Miller et al. as we use a different co-expression network building method to generate Modular organization of significant genes via co- modules of co-expressed genes and then identify cis-regula- expression networks tory motifs within a module. Such a combination of The co-expression network method (CoExp) [2,3] was approaches has not been previously applied to study AD. Our applied to the set of 1,663 genes and resulted in 6 clusters/ co-expression network method [2,3] is a spectral algorithm modules (see Materials and methods; a figure showing the

Genome Biology 2008, 9:R148 http://genomebiology.com/2008/9/10/R148 Genome Biology 2008, Volume 9, Issue 10, Article R148 Ray et al. R148.3

Single cell microarray expression data

Use SAM to identify differentially expressed genes

Co-expression network tool Build co-expression networks Identify functional modules Identify hub genes

Use EASE to Check for genes associated identify with Alzheimer’s disease and enriched GO other human diseases categories

WordSpy Identify significant cis-regulatory elements in disease associated genes

FigureSteps taken 1 to analyze Alzheimer's disease using laser capture microdissected microarray data Steps taken to analyze Alzheimer's disease using laser capture microdissected microarray data. Sequence of steps taken to analyze incipient Alzheimer's disease from single cell expression data. We apply co-expression network analysis, EASE and WordSpy (motif finding method) in an integrated manner to study Alzheimer's disease and reveal connections to other conditions such as cardiovascular diseases and diabetes. entire network and modules is provided in Additional data stress response, and so on, while the other modules consist of file 4). Figure 3 shows the adjacency matrix of the co-expres- genes involved in negative regulation of metabolism, protein sion network and Figure 4 illustrates the Pearson correlation transport, sodium ion transport, and so on. Table 1 shows the coefficient (degree of similarity) between the 1,663 genes top enriched biological processes (p < 0.05) in organized into modules. The effect of CoExp applied to all all six modules. 15,827 genes (that is, no differentially expressed gene selec- tion performed) is shown in Additional data file 5. As can be noted from Table 1, many processes linked to AD, such as immune response, inflammatory response, cell devel- The two big red blocks of genes in Figure 4 represent two opment and differentiation (due to a large number of cancer groups of anti-correlated expression patterns. The upper red related genes), and so on are upregulated in incipient AD block refers to modules 1 and 2, while the lower red block rep- [11,12]. Processes related to actin are downregulated in AD resents modules 3, 4, 5 and 6. Transcripts in modules 3, 4, 5 [13]. Table 2 shows the significant Kyoto Encyclopedia of and 6 were downregulated and those in modules 1 and 2 were Genes and Genomes (KEGG) pathways represented by the upregulated. Modules 1 and 2 contain transcripts involved in genes in each module. Although there was no over-repre- cell differentiation, neuron development, immune response, sented KEGG pathway in module 5, several genes involved in

Genome Biology 2008, 9:R148 http://genomebiology.com/2008/9/10/R148 Genome Biology 2008, Volume 9, Issue 10, Article R148 Ray et al. R148.4

UnsupervisedFigure 2 classification by principal component analysis Unsupervised classification by principal component analysis. Principal component analysis was used to classify the 33 samples. The blue spheres refer to controls and the red correspond to affected subjects. This demonstrated that the samples were distinguishable based on the expression profiles of 1,663 differentially expressed genes. the negative regulation of metabolism, actin filament depo- the identification of cis-regulatory elements from the promot- lymerization, glucose metabolism, and lipid biosynthesis ers of genes. were present. Modules 2, 3, 4, 5 and 6 represent processes previously associated with AD in multiple studies [11-13]. Module 1 is associated with cardiovascular diseases and Module 5 contains processes related to glucose metabolism diabetes and recent work has shown decreased expression of energy EASE [15] uses the Genetic Association Database [16] and metabolism genes [14]. Our results further confirm this Online Mendelian Inheritance in Man to determine the asso- observation. Based on the results obtained thus far, each ciation of genes with various diseases/conditions [17-19] (see module is representative of some biological processes: mod- Materials and methods). When EASE was used to perform ule 1 represents protein synthesis; module 2 is linked to phos- functional annotation clustering based on the genes' associa- pholipid degradation; module 3 is associated with signaling tion with human disorders/diseases, module 1 contained 18 systems; module 4 represents neuron development; and disease-associated genes (Table 3). This prompted an in- modules 5 and 6 are associated with metabolism. depth examination of module 1 for our downstream analysis. Modules 2-6 did not have a significant enrichment for any The modular organization of genes led to the following inves- human disease. tigative steps: the identification of genes associated with human diseases; the identification of hub/highly connected These results provide new evidence supporting the hypothe- genes; the examination of the expression level of brain sis that there may be a strong association between CVD and derived neurotrophic factor (BDNF) in the AD subjects; and the incidence of AD [20-22]. There also has been a growing body of evidence for a link between AD and diabetes [23-25],

Genome Biology 2008, 9:R148 http://genomebiology.com/2008/9/10/R148 Genome Biology 2008, Volume 9, Issue 10, Article R148 Ray et al. R148.5

AdjacencyFigure 3 matrix of co-expression network Adjacency matrix of co-expression network. The adjacency matrix representation of the co-expression network. Modules are labeled c1, c2, c3, c4, c5 and c6. The dots refer to the intra- and inter-module edges between the genes. The graphical representation of this matrix is in Additional data file 4. with many research groups and news articles reporting that clustering of these conditions in epidemiological studies. Fur- AD may be another form of diabetes. While there are many thermore, as there are many transcripts common to these dis- transcripts in Table 3 common to the different conditions, eases/conditions, it is plausible that similar/common there are a few that are unique to a specific disease/condition, biochemical pathways are active in these seemingly different such as those encoding kinase deficient protein (WNK1), conditions. Common pathogenetic mechanisms in AD and timp metallopeptidase inhibitor 1 (TIMP1) and cystathio- CVD can suggest a causal link between CVD and AD [21,22], nine-beta-synthase (CBS), which are specific to CVD. Pterin- a hypothesis that is still controversial and under a lot of 4 alpha-carbinolamine dehydratase/dimerization cofactor debate. of hepatocyte nuclear factor 1 alpha (tcf1) 2 (or PCBD2), timp metallopeptidase inhibitor 3 (TIMP3), solute carrier Transcripts in the modules are linked to each other based on family 2 member 1 (SLC2A1) and major histocompatibility their expression similarity. 'Hub genes' are highly connected complex, class II, dq beta 1 (HLA-DQB1) are specific to diabe- nodes/transcripts in the network and are likely to play impor- tes. Von willebrand factor (VWF), alpha-2-macroglobulin tant roles in biological processes. Hub genes tend to be con- (A2M), apolipoprotein e (APOE), paraoxonase 2 (PON2), served across species and, hence, make excellent candidates and serpin peptidase inhibitor, clade a (alpha-1 antiprotein- for disease association studies in humans [27]. ase, antitrypsin), member 3 (SERPINA3) are common to most of the conditions. Archacki and colleagues have We defined hub genes to be those with 40 or more links/con- reported a list of 56 genes that are associated with coronary nections. Please refer to Additional data file 6 for the estima- artery disease [26]. Many genes from this list were also tion of hub genes. We identified 107 hub genes. The complete present in our list of 1,663 genes and present in module 1 list of hub genes, their module locations, and the number of (data not shown). links is in Additional data file 2. The hub genes included those encoding general transcription factor iiic, polypeptide 1, The hypothesis behind co-expression network analysis is that alpha 220 kda (GTF3C1), which is involved in RNA polymer- genes that are co-expressed are also co-regulated. Therefore, ase III-mediated transcription, microtubule-associated pro- since the genes specific to certain diseases and those that are tein 4 (MAP4), which promotes microtubule stability and common to all the diseases all resided in the same module, affects cell growth [28], and proprotein convertase subtili- they may be co-regulated. This could be the reason for the sin/kexin type 2 (PC2), which is responsible for the process-

Genome Biology 2008, 9:R148 http://genomebiology.com/2008/9/10/R148 Genome Biology 2008, Volume 9, Issue 10, Article R148 Ray et al. R148.6

c1 c2 c3 c4 c5 c6 1

0.8 200

0.6

400

0.4

600

0.2

800 0 Gene ID

1000 −0.2

−0.4 1200

−0.6

1400

−0.8

1600

−1 200 400 600 800 1000 1200 1400 1600 Pearson correlation Gene ID coefficient

PearsonFigure 4correlation coefficient between 1,663 genes Pearson correlation coefficient between 1,663 genes. This figure shows the strength of correlation between pairs of genes. The genes are organized by modules - c1, c2, c3, c4, c5 and c6. The top leftmost red block on the diagonal corresponds to module c1 and the bottom rightmost red block on the same diagonal refers to module c6. Modules c1 and c2 contain upregulated genes and modules c3 through c6 comprise downregulated genes. ing of neuropeptide precursors. Some of these hub genes - The total number of hub genes in each module along with the PC2, paraoxonase 2 (PON2) and peroxiredoxin 6 (PRDX6) - minimum and maximum number of links is shown in Table 4. have been implicated in late-onset AD [29-31]. Module 1 had the maximum number of hub genes. The tran- script with the largest number of links in module 1 is MAP4, Since module 1 has the disease associated genes, the hub with 63 connections. MAP4 is directly linked to other disease/ genes in this module may provide new information regarding condition associated genes such as VWF and WNK1. AD, CVD and diabetes. We identified 22 hub genes with a Increased expression of semaphorin 3b (SEMA3B; sema- number of links ranging from 42 to 63 in module 1 (for the phorin pathway) inhibits axonal elongation [32] and has been complete list of the 22 hub genes, see Additional data file 2). implicated in AD [32]. MAP4 is also connected to SEMA3B.

Genome Biology 2008, 9:R148 http://genomebiology.com/2008/9/10/R148 Genome Biology 2008, Volume 9, Issue 10, Article R148 Ray et al. R148.7

Table 1 Table 2

Top Gene Ontology biological processes in each module Statistically significant KEGG pathways

Module Activity Ease score Module KEGG pathway Ease score

Module 1 Protein biosynthesis 7.14E-06 Module 1 Ribosome 8.16E-07 Cell development 2.37E-05 Translation 3.41E-14 Cell differentiation 4.88E-05 Macromolecule biosynthesis 8.56E-05 Module 2 Phospholipid degradation 0.013 Cellular nerve ensheathment 1.11E-04 Neuron development 2.22E-04 Module 3 Signal transduction 0.002 Regulation of action potential 4.37E-04 Phosphatidylinositol signaling system 0.005

Module 2 Response to other organism 0.004 Module 4 Neuron development 2.22E-04 Immune response 0.014 Defense response 0.020 Module 6 Nucleotide metabolism 0.036 Response to stress 0.029 Statistically significant (p < 0.05) KEGG pathways present in the Protein kinase cascade 0.030 modules of the co-expression network. Integrin-mediated signalling pathway 0.030 Myeloid cell differentiation 0.040 Table 5 shows the number of links of the disease associated JAK-STAT cascade 0.042 genes and the number of hub genes they are linked with. Fig- ure 5 is a sub-network in module 1 that shows the disease- associated genes and all their links within module1. Although Module 3 Homophilic cell adhesion 2.58E-11 not all the disease-associated genes were hub genes, most of Cell-cell adhesion 2.74E-09 them were directly linked to one or more hub genes, which Nervous system development 3.44E-09 implies that they may play a key role via hub genes. Ion transport 0.007 Gamma-aminobutyric acid signalling pathway 0.009 PON2, MAP4 and atpase Na+/K+ transporting, alpha 2 (+) Secretory pathway 0.019 polypeptide (ATP1A2) are encoded by disease-associated Small GTPase mediated signal transduction 0.028 genes that are also hub genes. The overexpression of MAP4 Sodium ion transport 0.036 results in the inhibition of organelle motility and trafficking [33] and can also lead to changes in cell growth [28]. ATP1A2 Module 4 Cellular physiological process 6.91E-05 is a subunit of an integral membrane protein that is responsi- Transcription from RNA polymerase II 0.008 ble for establishing and maintaining the electrochemical gra- promoter dients of sodium and potassium ions across the plasma Protein transport 0.014 membrane [34]. These gradients are essential for osmoregu- Post-chaperonin tubulin folding pathway 0.019 lation, for sodium-coupled transport of a variety of molecules, Ubiquitin cycle 0.037 and for electrical excitability of nerve and muscle [34]. While the downregulation of ATP1A2 has been linked to migraine- Module 5 Negative regulation of metabolism 0.011 related conditions [35], the effects of its upregulation have Actin filament depolymerization 0.025 not been documented. PON2 has been implicated in AD [30] Barbed-end actin filament capping 0.025 and CVDs (Table 3). Negative regulation of actin filament 0.025 depolymerization Decreased levels of brain-derived neurotrophic factor Negative regulation of protein metabolism 0.025 BDNF is well known for its trophic functions and has been implicated in synaptic modulation, and the induction of long- Module 6 Protein transport 0.008 term potentiation [36,37]. Increased levels of BDNF are nec- Cell organization and biogenesis 0.011 essary for the survival of neurons. Decreased levels of BDNF Membrane fusion 0.028 have been linked to AD and depression [38-40]. Recently, low levels of BDNF has also been associated with diabetes [41]. RNA processing 0.029 RNA splicing 0.042 BDNF goes through post-translational modification, that is, it Statistically significant (p < 0.05) biological processes present in each of is converted into mature BDNF, by plasminogen [42]. The the six modules of the co-expression network. neurotrophic tyrosine kinase receptor type 2 (NTRK2/TrkB) is a receptor for BDNF [43].

Genome Biology 2008, 9:R148 http://genomebiology.com/2008/9/10/R148 Genome Biology 2008, Volume 9, Issue 10, Article R148 Ray et al. R148.8

Table 3 Table 5

Functional annotation clustering by disease of genes Number of links of the 18 disease-associated genes

Disease/condition Genes Gene Number of links Number of hub genes it is connected to

Neurodegeneration VWF, A2M, APOE, FTL, PON2, COMT, MAP4, TF, VWF 16 2 SERPINA3, ATP1A2, AGT A2M 17 3 Myocardial infarction A2M, APOE, PON2, SERPINA3 APOE 18 3 Alzheimer's disease A2M, APOE, SERPINA3, PON2 FTL 18 3 Cardiovascular VWF, A2M, APOE, PON2, COMT, WNK1, CBS, PON2 51 8 SERPINA3, TIMP1 COMT 17 0 Coronary artery APOE, PON2, COMT, SERPINA3 disease MAP4 63 5 TF 16 3 Type 2 diabetes VWF, A2M, APOE, PCBD2, HLA-DQB1(HLA- SERPINA3 18 3 DQB2), TIMP3, SLC2A1, AGT ATP1A2 45 7 Functional annotation clustering of genes in module 1 based on their AGT 27 5 association to human conditions/diseases. TIMP1 14 3 BDNF was not present in our list of 1,663 significant genes. WNK1 17 2 However, TrkB and serpin peptidase inhibitor, clade e CBS 16 3 (nexin, plasminogen activator inhibitor type 1), member 2 PCBD2 16 0 (SERPINE2) were present in the set of 1,663 genes and HLA- 15 2 located in module 1. Plasminogen activator inhibitor type 1 DQB1/ HLA- (PAI-1) proteins inhibit plasminogen activators [44]. There- DQB1 fore, if the level of PAI-1 is high in the AD affected samples, SLC2A1 14 4 plasminogen activators are being inhibited, resulting in TIMP3 14 0 decreased levels of mature BDNF. Interestingly, the expres- sion levels of TrkB and PAI-1 were elevated in the AD sam- Number of links of the 18 disease associated genes from module 1 and ples. However, TrkB is downregulated following the binding the number of connections they have with other hub genes. of BDNF [45]. Therefore, due to an increased level of PAI-1, sion between controls and affected samples. Microarrays are mature BDNF could not be produced, which in turn could not not sensitive enough to detect genes with low expression lev- bind to TrkB. By this reasoning, it can be concluded that high els, especially when the difference in expression is small levels of TrkB and PAI-1 imply decreased levels of BDNF, (which can be expected in subjects with incipient AD) [46- which is detrimental for the survival of neuronal populations. 49]. The fact that the selected significant genes, such as TrkB This probably leads to neuronal death in this cohort of AD and SERPINE2, could lead to the correct conclusion regard- affected subjects. ing the level of BDNF expression in AD affected samples high- lights the merits of this kind of analysis of the transcriptome In order to verify our conclusion regarding the expression when handling genes with low expression levels. Although level of BDNF in the AD patients in our dataset, we examined modules 1 and 2 have upregulated genes, genes associated the expression level of BDNF in the controls and AD affected with BDNF are located only in module 1. This further empha- samples. We found BDNF to be decreased by 1.07 in the AD sizes the importance of module 1. affected samples. BDNF was not selected to be a significant

Table 4 Comparison to the study by Miller et al. on ageing and AD Hub genes Miller et al. [5] identified 558 transcripts that were common to AD and ageing. We found more overlapping genes between Module Number of hubs Range of links our study and their study than expected by chance (p = 3.3 × Module 1 22 42-63 10-10). There were 94 genes overlapping between 1,663 signif- Module 2 17 41-56 icant genes from our study and 558 genes identified by Miller Module 3 15 40-68 et al. Of these 94 genes, 48 were present in module 1 (greater -10 Module 4 14 40-65 than expected by chance; p = 9.2 × 10 ). This indicates that Module 5 20 40-73 module 1 contains the majority of genes that have been linked to ageing and AD. Of the 48 genes that overlapped between Module 6 19 40-81 558 AD-ageing common genes and genes in module 1, WNK1 Number of hub genes and their range of connections/links in each and MAP4 were present. module.

Genome Biology 2008, 9:R148 http://genomebiology.com/2008/9/10/R148 Genome Biology 2008, Volume 9, Issue 10, Article R148 Ray et al. R148.9

Sub-networkFigure 5 in module 1 illustrating the 18 disease associated genes and their connections Sub-network in module 1 illustrating the 18 disease associated genes and their connections. This sub-network shows the 18 disease associated genes (colored yellow) and the genes that they are connected to within module 1. The hub genes are represented as triangle nodes. Disease genes MAP4, PON2 and ATP1A2 were also hub genes. Only the hub genes that connect to disease genes are shown here. Module 1 consists of 22 hub genes in total.

Furthermore, 9 genes (DAAM2, EPM2AIP1, GFAP, that may be enriched in the upstream promoter sequences of GORASP2, MAP4, NFKBIA, PRDX6, TSC22D4 and the genes in module 1 (see Materials and methods). The group UBE2D2) overlapped between 558 AD-ageing genes and the of genes in module 1 that shares a motif will be a set that is co- 107 hub genes identified in our study, 5 of which resided in expressed and coregulated. module 1. These results further highlight the significance of module 1 and it can be concluded that module 1 represents The complete set of cis-regulatory elements enriched in mod- common biochemical pathways that may be affected in all ule 1 is in Additional data file 3. A total of 89 motifs were AD, ageing, and CVD. enriched in module 1 with a p-value < 0.001, and their target genes were co-expressed with an average correlation coeffi- Cis-regulatory elements and co-regulated genes cient >0.4 and Z-score >2 (see Materials and methods). Of Cis-regulatory elements/motifs are regulatory elements in the 89 motifs, 36 matched to 26 known transcription factor the promoter region of genes to which transcription factors binding sites (TFBS) in JASPAR [50] with a matching score bind, thus regulating transcription. If a group of genes shares ≥0.8 (Table 6). Table 6 shows the number of genes within the same cis-regulatory motif, then the transcription factor module 1 whose promoter region contains a motif that that binds to the motif may regulate the group of genes. Co- matched to the TFBS of a known transcription factor. expressed modules represent genes that may be co-expressed in the cell and be a part of the same biochemical pathways. Transcription factors such as growth factor independent From our analyses thus far, we concluded that the genes con- (Gfi), peroxiredoxin 2 (Prx2/PRDX2), SP1, CAAT-enhancer tained in module 1 is of great importance. Therefore, we used binding protein (C/EBP), RelA (p65), runt box 1 (Runx1), WordSpy [4] to identify the cis-regulatory elements/motifs ELK-1, upstream stimulatory factor 1 (USF1), Rel, and TATA

Genome Biology 2008, 9:R148 http://genomebiology.com/2008/9/10/R148 Genome Biology 2008, Volume 9, Issue 10, Article R148 Ray et al. R148.10

Table 6 tors are important in development [75,76]. An extremely high number of genes were mapped to Hand1-TCF3 since cell Twenty-six transcription factors with known functions whose cis- development and differentiation is upregulated in AD [11,12]. regulatory elements were identified in the genes in the co-expres- sion network In summary, the fact that transcription factors that partici- Transcription factors Number of target genes pate in other human conditions have their binding motifs enriched in the set of significant genes associated with AD ABI4 9 adds significance to the hypothesis that many biochemical Arnt-Ahr 93 pathways common to AD and CVD are active, resulting in ARR10 6 these diseases/conditions co-occurring. Broad-complex 3 10 CEBP 20 Gfi 8 Conclusion HAND1-TCF3 279 In this study, we present an integrative systems biology Mycn 11 approach to study a complex disease such as AD. Along with Myf 8 identifying modules that illuminate higher-order properties Prx2/PRDX2 17 of the transcriptome, we identified a module that contained RELA, REL 10 many genes known to play prominent roles in CVDs and AD. RUNX1 4 We believe that this module highlights important pathophys- Snail 49 iological properties that connect AD, CVD and ageing. We SP1 47 identified several cis-regulatory elements, some of which mapped to the binding sites of known transcription factors TBP 6 involved in neurodegenerative and CVDs as well as diabetes E74A 16 and stroke. Furthermore, since microarrays are not sensitive ELK1 16 to genes with very slight differences in expression from con- SPIB 16 trols, we illustrate how other genes can be used to deduce the Hunchback 6 expression difference of such genes. This is especially critical MAX 11 while comparing groups that are very similar to each other. USF1 11 ZNF42 5-13 27 Although we highlight the contributions of a new module and NFIL3 5 network building method to the field of AD, this paper also Agamous 8 illustrated the commonalities between the study by Miller et GAMYB 6 al. [5] and our study in spite of the differences in methodology and data. This suggests the reproducible and generalizable The 26 transcription factors and the number of target genes in module 1 that have a motif in their promoters that match to the binding sites of quality of the results based on gene expression data from well the known transcription factor. characterized samples. Additionally, a modular approach, where genes are organized into modules based on co-expres- box binding protein (TBP) have been implicated in neurode- sion or co-regulation, is an efficient method for studying generative diseases (such as AD, Parkinson's, and Schizo- human diseases and comparing results from multiple studies. phrenia) [51-64], diabetes [65], stroke and CVDs [66,67]. There are 139 genes in module 1 that contain motifs that The link between CVDs, diabetes and AD is a topic of growing matched the TFBS of the known transcription factors associ- interest. The presence of perturbed genes and cis-regulatory ated with these diseases. elements related to CVDs and AD in a single module provides strong evidence to the hypotheses connecting these two con- Arnt-Ahr dimer transcription factor activates genes crucial in ditions. Interestingly, this module also contained the maxi- the response to hypoxia and hypoglycaemia [68,69]. mum number of genes (and hub genes) related to ageing. Our Hypoglycaemia and hypoxia have been known to play patho- results support the notion that diseases in which the same set physiological roles in the complications of diabetes and AD of biochemical pathways are affected may tend to co-occur [70-73]. It is well known that hypoxia has major effects on the with each other. This could be the reason why CVDs and/or cardiovascular system [74]. In light of such knowledge, it diabetes co-occur with AD. comes as no surprise that a large number of genes have cis- regulatory motifs that match the binding site of the Arnt-Ahr Small sample sizes are typical of clinical studies, especially transcription factor. those involving human samples. The largest AD gene expres- sion study at the time of writing included 33 samples (the Hand1-TCF3 and TAL1-TCF3 are components of the basic- dataset analyzed in this paper). Since the results presented helix-loop-helix (bHLH) complexes. bHLH transcription fac- here may be specific to the dataset, we are in the process of

Genome Biology 2008, 9:R148 http://genomebiology.com/2008/9/10/R148 Genome Biology 2008, Volume 9, Issue 10, Article R148 Ray et al. R148.11

extending our analysis to larger datasets. A more robust higher than 0.3, and one gene is ranked as the top-k most cor- approach to studying AD would be to obtain well character- related gene of the other; the correlation coefficient between ized large cohorts that are followed longitudinally for the best them is higher than 0.9 and one gene is within the top 50 chance of success. A comprehensive analysis incorporating most correlated gene of the other. The parameter k was deter- AD and CVD/diabetes patients along with information about mined automatically and in conjunction with the Qcut algo- their disease progression will shed more light onto the patho- rithm (discussed below), such that when k increased, the physiology of, and the link between, AD and CVDs. number of modules of co-expressed genes remained unchanged. The rationale behind using k best neighbors instead of a cut-off threshold on gene expression similarity Materials and methods for creating a network has been discussed in [2]. For the co- Data expression network generated with differentially expressed Pathologically, AD is characterized by the presence of neu- genes in this study, k = 14. rofibrillary tangles in the neurons. The dataset of Dunckley et al. [6] consists of 13 normal controls (Braak stages 0-II; aver- In order to identify dense subgraphs/modules in the co- age age 80.1 years) and 20 AD affected (Braak stages III-IV; expression network, we applied a community discovery algo- average age 84.7 years) samples obtained by laser capture rithm - Qcut, developed by Ruan and Zhang [3]. Compared to microdissection from the entorhinal cortex. Braak stages III- other clustering or graph partitioning algorithms, Qcut has IV are considered 'incipient' AD [77,78]. In this dataset, 1,000 the advantage that it does not require a user-specified neurons were collected from each of the 33 samples via laser number of clusters/modules. It is a spectral based graph par- capture microdissection. titioning algorithm that optimizes the modular function pro- posed by Newman and Girvan [81] to automatically Data were normalized using gcRMA [79]. Probesets were determine the appropriate number of modules [2,3]. Further mapped to genes using DAVID [34]. Probesets that did not evidence of its robustness can be found in [3,82]. map to any gene and those that mapped to hypothetical pro- teins, at the time of writing this manuscript, were removed. EASE [15], a tool in DAVID, was used to identify overrepre- When multiple probesets mapped to the same gene, only the sented biological processes in each module as well as perform probeset with the highest mean was selected. This preproc- functional annotation clustering based on association to essing resulted in 15,827 genes/transcripts. Differentially human diseases [34]. DAVID derives its disease associations expressed genes were identified using the two-class SAM pro- from two main sources, Online Mendelian Inheritance in cedure [9]. SAM is open-source software that uses a modified Man and the Genetic Association Database. These sources t-statistics approach to identify differentially expressed assign diseases to gene identifiers and then DAVID maps the genes. ISI citation search [80] indicates that SAM is a highly diseases to the DAVID database through the gene identifiers. popular method used for microarray analysis (over 2,000 The most significant diseases associated with a set of genes citations of the original publication in April 2001, as of July 9, are determined by term enrichment analysis using a modified 2008). Fisher Exact calculation [17-19].

Construction of co-expression networks and Identification of regulatory cis-elements identification of functional modules The interaction of transcription factors and cis-acting DNA We used a network-based approach to identify modular elements determines the gene activity under various environ- structures/clusters embedded in microarray gene expression mental conditions. Identifying functional TFBS, however, is data. The CoExp [2,3] method constructs co-expression net- not trivial, since they are usually short and degenerate, and works from microarray data and then uses a spectral based are often located several hundred to thousand bases clustering method to identify subgraphs within the network. upstream of the translational starting sites. Here we com- Nodes in the network correspond to genes and edges repre- bined several datasets and a whole-genome analysis method, sent expression similarities between genes. The motivation is WordSpy [4], to discover short DNA sequence motifs that are that genes involved in the same functional pathway are statistically enriched in the promoters of genes in the same directly connected to each other or linked via short paths. co-expression module and are associated with gene co- After network creation, the nodes are clustered into dense expression. subgraphs. We first downloaded the promoter sequences for human open To create a network from gene expression data, pairwise reading frames from the DBTSS database [83]. Each pro- expression similarity between a pair of genes was measured. moter included 1,000 bp upstream and 200 bp downstream In this study, we used the Pearson correlation coefficient for sequences relative to the transcription starting site, defined the similarity measure. For two genes to be considered as co- from full length cDNA data. From this dataset we extracted n expressed, their expression profiles needed to satisfy at least sets of promoter sequences (referred to as experimental sets), one of the following conditions: their correlation coefficient is where n is the number of co-expression modules. The i-th

Genome Biology 2008, 9:R148 http://genomebiology.com/2008/9/10/R148 Genome Biology 2008, Volume 9, Issue 10, Article R148 Ray et al. R148.12

experimental set contains the promoter sequences of genes in ing sites (position specific weight matrices) using a metric the i-th co-expression module. The complete set of human called the information score, which is the metric used in gene promoters was used as the background set. We then Matlnspector [84] in the TRANSFAC suite. If the information applied WordSpy, a steganalysis-based genome-wide motif- score for a motif is ≥0.8, then it is considered as a motif finding method, on each experimental set to discover statisti- matching to the binding site of a transcription factor. cally significant k-mers (motifs; for k = 6, 7, 8, 9, 10) accord- ing to a generative model of the promoter sequences. Abbreviations Each k-mer that was identified by WordSpy was then sub- AD: Alzheimer's disease; BDNF: brain-derived neurotrophic jected to two filtering steps. In the first filtering step, motifs factor; CoExp: co-expression network method; CVD: cardio- that are specifically enriched in the experimental set were vascular disease; KEGG: Kyoto Encyclopedia of Genes and selected. We counted the number of instances that a k-mer Genomes; MPCC: mean of the PCC values; PAI-1: plasmino- appeared in the experimental set (denoted by x) and in the gen activator inhibitor type 1; PC: principal component; SAM: background set (denoted by b). Then we computed the prob- significance analysis of microarrays; SPCC: standard devia- ability that we would expect by chance at least the same tion of the PCC values; TFBS: transcription factor binding number of occurrences in the experimental set, given the sites. number of occurrences in the background set. This probabil- ity is computed using the cumulative hyper-geometric distri- bution as: Authors' contributions WZ conceived of the research. MR and WZ designed the ⎛ N i ⎞⎛ NN− i ⎞ study. MR and JR carried out the computational analysis, and min(Nb , ) ⎜ ⎟⎜ ⎟ i − MR performed the biological analysis as well as coordinated = ⎝ k ⎠⎝ bk ⎠ PxbN(,,i , N )∑ , ⎛ N ⎞ the project. MR wrote the paper and WZ helped with the man- kx= ⎜ ⎟ uscript preparation. All authors read and approved the final ⎝ b ⎠ manuscript. where Ni and N are the sizes of the i-th experimental set and the background set, respectively. We filtered out the k-mers that had a p-value ≥ 0.01. Additional data files The following additional data are available with the online The second filter is used to select motifs that are associated version of this paper. Additional data file 1 lists the enriched with strong and significant co-expression patterns. For each biological processes in the set of 1,663 genes (p < 0.05). Addi- motif that passed the first filtering phase, we obtained a set of tional data file 2 shows the 107 hub genes with 40 or more genes ('target set') in which each gene in this set contains the connections and the clusters in which they reside. Additional motif in its promoter region. We computed the average pair- data file 3 contains the 89 statistically significant motifs over- wise Pearson correlation coefficients, denoted by pcc, from represented in module 1 along with their p-values and Z- the expression profiles of the genes in the target set. Further- scores. Additional data file 4 shows the graphical representa- more, we randomly sampled 100 control sets of genes from tion of the coexpression network with 1,663 differentially the background set that had the same size (that is, number of expressed genes. Additional data file 5 shows the adjacency genes) as the target set, and computed the pcc of each control matrix of the co-expression network analysis on 15,827 genes. set. The mean and standard deviation (denoted by mpcc and Additional data file 6 illustrates the distribution of co-expres- spcc, respectively) of the pcc values for the control sets are sion network links and estimation of hub genes. then used to compute the Z-score of the pcc value for the tar- EnrichedClickAdditionalThewhichalongCoexpressionThisgenebetweenexpressionand(modulesedgelogicalAdjacencygenesresultedshownedgesciatednetworkusedstudiesexpressednismsADnificanceinDistribution(asrefersY-axiscatesstandardpower-lawTheynodes.wetribution.+werehub 2 AD. decidedgene×studies. 89graph1072)two CoExpgenes co-expression andoncutoff.consideredtheare hereStandard withwithinbetweentoinvolved furthertheygenesplotsInandmeaningat statisticallyon any hublargeinexpressionID)doesthe meancharacterizedthe deviationthe inorderbiological3-6) Threshold genesplotsforin ADmatrix theirdistributionsdata13patterns.consistsreside.residetowas theThe revealingset firstgenes inandweightofthepositiontop. notusenetwork fileclusters.groupsanalysis firstconsists the numberascendingdeviationco-expressionin oftonumber appliedthe fileandpnon-differentiallyaverage co-expressionsincehubentiregenes gene, -valuesThelateaneedgenesfindof genes.with significantselect network=cut-offnumber234561ofprocessesprofiles theare forThegenes.9.32.are onsettheofdotsan ourupregulateda is withClusters/modules ofgenedifferentiallybyinvolvednetwork.of 40toof conservativelyselectedarbitrarilytheco-expressi eachandnumber performed.edge=smallertwo aandClusterdownregulatedunderlying links,order agoallinks thevalueorGenereferset40.7. AD1,663 smalofnumberThisshows IDare moresets node/moduleZnetw motifsisinentirealinks ofnetwork. was-scores-scores. and800forand pairof thetothwith byscale-free,Geneslco-expression thedifferentiallyapproach1groupof differentiallynumber of theatcontainsexpressedork eachsixthechosengenes connectionstosome oncompareforPearsonlinks therefersofgenesset expressedWe isover-representedbiological BDNF. study numbermodules. linksintra-networkgenessmalllinkstowardsthewith solidcontains ofgene.The extracted genes.criterion.while=are of1,663forto15,827withdifferentially resulted 22.06;small all doestheand acorrelationfor X-axis numberourscaledlinehighlyTheandlabeledthegenes Thepropernumber expressed theof networksthegenesexpressed processesanalysistheunderlyinganti-correlatedhub andTheA estimationgenes resultstwo800thnotindicates links.inter-moduleco-expressionworld genes dashednode18median differentially However,larger rightplotsbearto connectedinthegeneslength 1-13have modulesofvisualization.disease-asso- andwithin of6.4%(in Gene hubprefers coefficientgene.networks.clusterswithandexpressed onlittle ofthelinks followgenesmodule 0.4, and its Z-score > 2. and Centre for Neurobehavioral Genetics, University of California, Los Angeles, CA for his assistance in obtaining data from his AD-ageing paper. Finally, the motifs that have passed both filters are compared to the known TFBS in the JASPAR database [50]. We pre-fil- References tered the TFBSs in the database that have information con- 1. Chen Y, Zhu J, Lum PY, Yang X, Pinto S, MacNeil DJ, Zhang C, Lamb tent ≤6 bits, since these TFBSs are short and have high J, Edwards S, Sieberts SK, Leonardson A, Castellini LW, Wang S, degeneracy and, hence, may match to some known motifs Champy MF, Zhang B, Emilsson V, Doss S, Ghazalpour A, Horvath S, Drake TA, Lusis AJ, Schadt EE: Variations in DNA elucidate simply by chance. Then we computed the best un-gapped molecular networks that cause disease. Nature 2008, alignment between the motifs (n-mers) and the known bind- 452:429-435. 2. Ruan J, Zhang W: Identification and evaluation of functional

Genome Biology 2008, 9:R148 http://genomebiology.com/2008/9/10/R148 Genome Biology 2008, Volume 9, Issue 10, Article R148 Ray et al. R148.13

modules in gene co-expression networks. In Systems Biology and and Aging. Dement Geriatr Cogn Disord 2002, 14:77-83. Computational Proteomics Berlin/Heidelberg: Springer; 2007:57-76. 25. Craft S, Watson GS: Insulin and neurodegenerative disease: [Lecture Notes in Computer Science, volume 4532] shared and specific mechanisms. Lancet Neurol 2004, 3:169-178. 3. Ruan J, Zhang W: Identifying network communities with a high 26. Archacki SR, Angheloiu G, Tian XL, Tan FL, DiPaola N, Shen GQ, resolution. Phys Rev E Stat Nonlin Soft Matter Phys 2008:016104. Moravec C, Ellis S, Topol EJ, Wang Q: Identification of new genes 4. Wang G, Zhang W: A steganalysis-based approach to compre- differentially expressed in coronary artery disease by expres- hensive identification and characterization of functional reg- sion profiling. Physiol Genomics 2003, 15:65-74. ulatory elements. Genome Biol 2006, 7:R49. 27. Casci T: Systems biology: Network fundamentals, via hub 5. Miller JA, Oldham MC, Geschwind DH: A systems level analysis of genes. Nat Rev Genet 2006, 7:664-665. transcriptional changes in alzheimer's disease and normal 28. Nguyen HL, Gruber D, McGraw T, Sheetz MP, Bulinski JC: Stabiliza- aging. J Neurosci 2008, 28:1410-1420. tion and functional modulation of microtubules by microtu- 6. Dunckley T, Beach TG, Ramsey KE, Grover A, Mastroeni D, Walker bule-associated protein 4. Biol Bull 1998, 194:354-357. DG, LaFleur BJ, Coon KD, Brown KM, Caselli R, Kukull W, Higdon 29. Krapfenbauer K, Engidawork E, Cairns N, Fountoulakis M, Lubec G: R, McKeel D, Morris JC, Hulette C, Schmechel D, Reiman EM, Rogers Aberrant expression of peroxiredoxin subtypes in neurode- J, Stephan DA: Gene expression correlates of neurofibrillary generative disorders. Brain Res 2003, 967:152-160. tangles in Alzheimer's disease. Neurobiol Aging 2006, 30. Shi J, Zhang S, Tang M, Liu X, Li T, Han H, Wang Y, Guo Y, Zhao J, Li 27:1359-1371. H, Ma C: Possible association between Cys311Ser polymor- 7. Ricciarelli R, d'Abramo C, Massone S, Marinari U, Pronzato M, Taba- phism of paraoxonase 2 gene and late-onset Alzheimer's dis- ton M: Microarray analysis in Alzheimer's disease and normal ease in Chinese. Brain Res Mol Brain Res 2004, 120:201-204. aging. IUBMB Life 2004, 56:349-354. 31. Winsky-Sommerer R, Grouselle D, Rougeot C, Laurent V, David JP, 8. Pereira AC, Wu W, Small SA: Imaging-guided microarray: iso- Delacourte A, Dournaud P, Seidah NG, Lindberg I, Trottier S, Epel- lating molecular profiles that dissociate Alzheimer's disease baum J: The proprotein convertase PC2 is involved in the from normal aging. Ann N Y Acad Sci 2007, 1097:225-238. maturation of prosomatostatin to somatostatin-14 but not 9. Tusher VG, Tibshirani R, Chu G: Significance analysis of micro- in the somatostatin deficit in Alzheimer's disease. Neuro- arrays applied to the ionising radiation response. Proc Natl science 2003, 122:437-447. Acad Sci USA 2001, 98:5116-5121. 32. Blalock EM, Geddes JW, Chen KC, Porter NM, Markesbery WR, 10. Ringner M: What is principal component analysis? Nat Landfield PW: Incipient Alzheimer's disease: Microarray cor- Biotechnol 2008, 26:303-304. relation analyses reveal major transcriptional and tumor 11. Norris CM, Kadish I, Blalock EM, Chen KC, Thibault V, Porter NM, suppressor responses. Proc Natl Acad Sci U S A 2004, Landfield PW, Kraner SD: Calcineurin triggers reactive/inflam- 101:2173-2178. matory processes in astrocytes and is upregulated in aging 33. Bulinski JC, McGraw TE, Gruber D, Nguyen HL, Sheetz MP: Overex- and Alzheimers models. J Neurosci 2005, 25:4649-4658. pression of MAP4 inhibits organelle motility and trafficking 12. Matsuoka Y, Picciano M, Malester B, LaFrancois J, Zehr C, Daeschner in vivo. J Cell Sci 1997, 110:3055-3064. JM, Olschowka JA, Fonseca MI, O'Banion MK, Tenner AJ, Lemere CA, 34. Dennis G Jr, Sherman BT, Hosack DA, Yang J, Gao W, Lane HC, Lem- Duff K: Inflammatory responses to amyloidosis in a trans- picki RA: DAVID: Database for Annotation, Visualization, and genic mouse model of Alzheimers disease. Am J Pathol 2001, Integrated Discovery. Genome Biol 2003, 4:P3. 158:1345-1354. 35. De Fusco M, Marconi R, Silvestri L, Atorino L, Rampoldi L, Morgante 13. Kojima N, Shirao T: Synaptic dysfunction and disruption of L, Ballabio A, Aridon P, Casari G: Haploinsufficiency of ATP1A2 postsynaptic drebrinactin complex: A study of neurological encoding the Na +/K + pump alpha 2 subunit associated with disorders accompanied by cognitive deficits. Neurosci Res familial hemiplegic migraine type 2. Nat Genet 2003, 2007, 58:1-5. 33:192-196. 14. Liang WS, Reiman EM, Valla J, Dunckley T, Beach TG, Grover A, 36. Yamada K, Mizuno M, Nabeshima T: Role for brain-derived neu- Niedzielko TL, Schneider LE, Mastroeni D, Caselli R, Kukull W, Mor- rotrophic factor in learning and memory. Life Sci 2002, ris JC, Hulette CM, Schmechel D, Rogers J, Stephan DA: Alzheimers 70:735-744. disease is associated with reduced expression of energy 37. Tyler WJ, Alonso M, Bramham CR, Pozzo-Miller LD: From acquisi- metabolism genes in posterior cingulate neurons. Proc Natl tion to consolidation: on the role of brain-derived Acad Sci USA 2008, 105:4441-4446. neurotrophic factor signaling in hippocampal-dependent 15. DAVID Bioinformatics Resources [http://niaid.abcc.ncifcrf.gov/ learning. Learn Mem 2002, 9:224-237. home.jsp] 38. Tsai SJ: Brain-derived neurotrophic factor: a bridge between 16. Genetic Association Database [http://geneticassocia major depression and Alzheimer's disease? Med Hypotheses tiondb.nih.gov] 2003, 61:110-113. 17. Sherman BT, Huang DW, Tan Q, Guo Y, Bour S, Liu D, Stephens R, 39. Laske C, Stransky E, Leyhe T, Eschweiler GW, Wittorf A, Richartz E, Baseler MW, Lane HC, Lempicki RA: DAVID Knowledgebase: a Bartels M, Buchkremer G, Schott K: Stage-dependent BDNF gene-centered database integrating heterogeneous gene serum concentrations in Alzheimers disease. J Neural Transm annotation resources to facilitate high-throughput gene 2006, 113:1217-1224. functional analysis. BMC Bioinformatics 2007, 8:426. 40. Karege F, Perret G, Bondolfi G, Schwald M, Bertschy G, Aubry JM: 18. Huang DW, Sherman BT, Tan Q, Collins JR, Alvord WG, Roayaei J, Decreased serum brain-derived neurotrophic factor levels in Stephens R, Baseler MW, Lane HC, Lempicki RA: The DAVID major depressed patients. Psychiatry Res 2002, 109:143-148. Gene Functional Classification Tool: a novel biological mod- 41. Krabbe K, Nielsen A, Krogh-Madsen R, Plomgaard P, Rasmussen P, ule-centric algorithm to functionally analyze large gene lists. Erikstrup C, Fischer C, Lindegaard B, Petersen A, Taudorf S, Secher Genome Biol 2007, 8:R183. N, Pilegaard H, Bruunsgaard H, Pedersen B: Brain-derived neuro- 19. Huang DW, Sherman BT, Tan Q, Kir J, Liu D, Bryant D, Guo Y, trophic factor (BDNF) and type 2 diabetes. Diabetologia 2007, Stephens R, Baseler MW, Lane HC, Lempicki RA: DAVID Bioinfor- 50:431-438. matics Resources: expanded annotation database and novel 42. GeneCards [http://www.genecards.org] algorithms to better extract biology from large gene lists. 43. Haapasalo A, Sipola I, Larsson K, Akerman K, Stoilov P, Stamm S, Nucleic Acids Res 2007:W169-175. Wong G, Castren E: Regulation of TRKB surface expression by 20. Stampfer MJ: Cardiovascular disease and Alzheimer's disease: brain-derived neurotrophic factor and truncated TRKB common links. J Internal Med 2006, 260:211-223. isoforms. J Biol Chem 2002, 277:43160-43167. 21. Rosendorff C, Beeri MS, Silverman JM: Cardiovascular risk factors 44. Huber K, Christ G, Wojta J, Gulba D: Plasminogen activator for Alzheimer's disease. Am J Geriatr Cardiol 2007, 16:143-149. inhibitor type-1 in cardiovascular disease. Thromb Res 2001, 22. Stewart R: Cardiovascular factors in Alzheimer's disease. J 103:S7-S19. Neurol Neurosurg Psychiatry 1998, 65:143-147. 45. Sommerfeld MT, Schweigreiter R, Barde YA, Hoppe E: Down-regu- 23. Janson J, Laedtke T, Parisi JE, O'Brien P, Petersen RC, Butler PC: lation of the neurotrophin receptor TrkB following ligand Increased risk of type 2 diabetes in Alzheimer disease. Diabe- binding. Evidence for an involvement of the proteasome and tes 2004, 53:474-481. differential regulation of TrkA and TrkB. J Biol Chem 2000, 24. MacKnight C, Rockwood K, Awalt E, McDowell I: Diabetes melli- 275:8982-8990. tus and the risk of dementia, Alzheimer's disease and vascu- 46. Bunney WE, Bunney BG, Vawter MP, Tomita H, Li J, Evans SJ, Chou- lar cognitive impairment in the Canadian Study of Health dary PV, Myers RM, Jones EG, Watson SJ, Akil H: Microarray tech-

Genome Biology 2008, 9:R148 http://genomebiology.com/2008/9/10/R148 Genome Biology 2008, Volume 9, Issue 10, Article R148 Ray et al. R148.14

nology: a review of new strategies to discover candidate abolic syndrome in the Chinese population. Diabetologia 2005, vulnerability genes in psychiatric disorders. Am J Psychiatry 48:2018-2024. 2003, 160:657-666. 66. Choquette AC, Bouchard L, Houde A, Bouchard C, Psse L, Vohl MC: 47. Pan YS, Lee YS, Lee YL, Lee WC, Hsieh SY: Differentially profiling Associations between USF1 gene variants and cardiovascu- the low-expression transcriptomes of human hepatoma lar risk factors in the Quebec Family Study. Clin Genet 2007, using a novel SSH/microarray approach. BMC Genomics 2006, 71:245-253. 7:131. 67. Komulainen K, Alanne M, Auro K, Kilpikari R, Pajukanta P, Saarela J, 48. Yue H, Eastman PS, Wang BB, Minor J, Doctolero MH, Nuttall RL, Ellonen P, Salminen K, Kulathinal S, Kuulasmaa K, Silander K, Salomaa Stack R, Becker JW, Montgomery JR, Vainer M, Johnston R: An eval- V, Perola M, Peltonen L: Risk alleles of USF1 gene predict cardi- uation of the performance of cDNA microarrays for detect- ovascular disease of women in two prospective studies. PLoS ing changes in global mRNA expression. Nucleic Acids Res 2001, Genet 2006, 2:e69. 29:E41-1. 68. Maltepe E, Schmidt JV, Baunoch D, Bradfield CA, Simon MC: Abnor- 49. Canales RD, Luo Y, Willey JC, Austermiller B, Barbacioru CC, Boysen mal angiogenesis and responses to glucose and oxygen dep- C, Hunkapiller K, Jensen RV, Knight CR, Lee KY, Ma Y, Maqsodi B, rivation in mice lacking the protein ARNT. Nature 1997, Papallo A, Peters EH, Poulter K, Ruppel PL, Samaha RR, Shi L, Yang 386:403-407. W, Zhang L, Goodsaid FM: Evaluation of DNA microarray 69. Erbel PJ, Card PB, Karakuzu O, Bruick RK, Gardner KH: Structural results with quantitative gene expression platforms. Nat basis for PAS domain heterodimerization in the basic helix- Biotechnol 2006, 24:1115-1122. loophelix-PAS transcription factor hypoxia-inducible factor. 50. Sandelin A, Alkema W, Engstrom P, Wasserman WW, Lenhard B: Proc Natl Acad Sci USA 2003, 100:15504-15509. JASPAR: an open-access database for eukaryotic transcrip- 70. Catrina SB, Okamoto K, Pereira T, Brismar K, Poellinger L: Hyperg- tion factor binding profiles. Nucleic Acids Res 2004:D91-94. lycemia regulates hypoxia-inducible factor-1alpha protein 51. Tsuda H, Jafar-Nejad H, Patel AJ, Sun Y, Chen HK, Rose MF, Venken stability and function. Diabetes 2004, 53:3226-3232. KJ, Botas J, Orr HT, Bellen HJ, Zoghbi HY: The AXH domain of 71. Shi J, Xiang Y, Simpkins JW: Hypoglycemia enhances the expres- Ataxin-1 mediates neurodegeneration through its interac- sion of mRNA encoding beta-amyloid precursor protein in tion with Gfi-1/Senseless proteins. Cell 2005, 122:633-644. rat primary cortical astroglial cells. Brain Res 1997, 52. Qu D, Rashidian J, Mount MP, Aleyasin H, Parsanejad M, Lira A, Haque 772:247-251. E, Zhang Y, Callaghan S, Daigle M, Rousseaux MW, Slack RS, Albert 72. Peers C, Pearson HA, Boyle JP: Hypoxia and Alzheimer's PR, Vincent I, Woulfe JM, Park DS: Role of Cdk5-mediated phos- disease. Essays Biochem 2007, 43:153-164. phorylation of Prx2 in MPTP toxicity and Parkinson's 73. Sun X, He G, Qing H, Zhou W, Dobie F, Cai F, Staufenbiel M, Huang disease. Neuron 2007, 55:37-52. LE, Song W: Hypoxia facilitates Alzheimer's disease 53. Fang J, Nakamura T, Cho DH, Gu Z, Lipton SA: S-nitrosylation of pathogenesis by up-regulating BACE1 gene expression. Proc peroxiredoxin 2 promotes oxidative stress-induced neuronal Natl Acad Sci USA 2006, 103:18727-18732. cell death in Parkinson's disease. Proc Natl Acad Sci USA 2007, 74. Germack R, Leon-Velarde F, Valdes De La Barra R, Farias J, Soto G, 104:18742-18747. Richalet JP: Effect of intermittent hypoxia on cardiovascular 54. Santpere G, Nieto M, Puig B, Ferrer I: Abnormal Sp1 transcrip- function, adrenoceptors and muscarinic receptors in Wistar tion factor expression in Alzheimer disease and tauopathies. rats. Exp Physiol 2002, 87:453-460. Neurosci Lett 2006, 397:30-34. 75. Yelon D, Ticho B, Halpern ME, Ruvinsky I, Ho RK, Silver LM, Stainier 55. Christensen M, Zhou W, Qing H, Lehman A, Philipsen S, Song W: DY: The bHLH transcription factor hand2 plays parallel roles Transcriptional regulation of BACE1, the amyloid precursor in zebrafish heart and pectoral fin development. Development protein beta-Secretase, by Sp1. Mol Cell Biol 2004, 24:865-874. 2000, 127:2573-2582. 56. Li R, Strohmeyer R, Liang Z, Lue LF, Rogers J: CCAAT/enhancer 76. Firulli BA, Howard MJ, McDaid JR, McIlreavey L, Dionne KM, Cen- binding protein delta (C/EBPdelta) expression and elevation tonze VE, Cserjesi P, Virshup DM, Firulli AB: PKA, PKC, and the in Alzheimer's disease. Neurobiol Aging 2004, 25:991-999. protein phosphatase 2A influence HAND factor function: a 57. Perez-Capote K, Saura J, Serratosa J, Sola C: Expression of C/ mechanism for tissue-specific transcriptional regulation. Mol EBPalpha and C/EBPbeta in glial cells in vitro after inducing Cell 2003, 12:1225-1237. glial activation by different stimuli. Neurosci Lett 2006, 77. Rossler M, Zarski R, Bohl J, Ohm TG: Stage-dependent and sec- 410:25-30. tor-specific neuronal loss in hippocampus during Alzheimers 58. Barkett M, Gilmore TD: Control of apoptosis by Rel/NF-kappaB disease. Acta Neuropathol 2002, 103:363-369. transcription factors. Oncogene 1999, 18:6910-6924. 78. Braak H, Braak E: Neuropathological stageing of Alzheimer- 59. Tomita S, Fujita T, Kirino Y, Suzuki T: PDZ domain-dependent related changes. Acta Neuropathol 1991, 82:239-259. suppression of NF-kappa B/p65-induced Abeta 42 produc- 79. Irizarry RA, Wu Z, Jaffee HA: Comparison of Affymetrix Gene- tion by a neuron-specific X11-like protein. J Biol Chem 2000, Chip expression measures. Bioinformatics 2006, 22:789-794. 275:13056-13060. 80. ISI [http://scientific.thomson.com/isi/] 60. Kimura R, Kamino K, Yamamoto M, Nuripa A, Kida T, Kazui H, Hash- 81. Newman M, Girvan M: Finding and evaluating community imoto R, Tanaka T, Kudo T, Yamagata H, Tabara Y, Miki T, Akatsu H, structure in networks. Phys Rev E 2004, 69:026113. Kosaka K, Funakoshi E, Nishitomi K, Sakaguchi G, Kato A, Hattori H, 82. Ruan J, Zhang W: Identification and evaluation of weak com- Uema T, Takeda M: The DYRK1A gene, encoded in chromo- munity structures in networks. In Proceedings of the Twenty-First some 21 Down syndrome critical region, bridges between National Conference on Artificial Intelligence; July 16-20, 2006: Boston, beta-amyloid production and tau phosphorylation in Alzhe- Massachusetts Edited by: Gil Y, Mooney RJ. Menlo Park, California: imer disease. Hum Mol Genet 2007, 16:15-23. The AAAI Press; 2006:470-475. 61. Pastorcic M, Das HK: Ets transcription factors ER81 and Elk1 83. Wakaguri H, Yamashita R, Suzuki Y, Sugano S, Nakai K: DBTSS: regulate the transcription of the human presenilin 1 gene database of transcription start sites, progress report 2008. promoter. Brain Res Mol Brain Res 2003, 113:57-66. Nucleic Acids Res 2008:D97-101. 62. Tong L, Balazs R, Thornton PL, Cotman CW: Beta-amyloid pep- 84. Quandt K, Frech K, Karas H, Wingender E, Werner T: MatInd and tide at sublethal concentrations downregulates brain- MatInspector: new fast and versatile tools for detection of derived neurotrophic factor functions in cultured cortical consensus matches in nucleotide sequence data. Nucleic Acids neurons. J Neurosci 2004, 24:6799-6809. Res 1995, 23:4878-4884. 63. Salero E, Giménez C, Zafra F: Identification of a non-canonical E- box motif as a regulatory element in the proximal promoter region of the apolipoprotein E gene. Biochem J 2003, 370:979-986. 64. Reid SJ, van Roon-Mom WM, Wood PC, Rees MI, Owen MJ, Faull RL, Dragunow M, Snell RG: TBP, a polyglutamine tract containing protein, accumulates in Alzheimer's disease. Brain Res Mol Brain Res 2004, 125:120-128. 65. Ng MC, Miyake K, So WY, Poon EW, Lam VK, Li JK, Cox NJ, Bell GI, Chan JC: The linkage and association of the gene encoding upstream stimulatory factor 1 with type 2 diabetes and met-

Genome Biology 2008, 9:R148