Differences in human and chimpanzee expression patterns define an evolving network of transcription factors in brain

Katja Nowicka,b, Tim Gernata,b, Eivind Almaasc, and Lisa Stubbsa,b,1

aInstitute for Genomic Biology and bDepartment of Cell and Developmental Biology, University of Illinois at Urbana-Champaign, Urbana, IL 61801; and cDepartment of Biotechnology, Norwegian University of Science and Technology, N-7491 Trondheim, Norway

Communicated by Gene E. Robinson, University of Illinois at Urbana-Champaign, Urbana, IL, October 4, 2009 (received for review July 7, 2009) Humans differ from other primates by marked differences in and networking of specific TFs could be driving major changes cognitive abilities and a significantly larger brain. These differences between primate species. However, previous analyses of human correlate with metabolic changes, as evidenced by the relative and chimpanzee TF gene expression did not include a compar- up-regulation of energy-related and metabolites in human ison of gene expression in brain. brain. While the mechanisms underlying these evolutionary Other large microarray studies have included human and changes have not been elucidated, altered activities of key tran- chimpanzee brain comparisons (5, 7, 11), but the assessment of scription factors (TFs) could play a pivotal role. To assess this TF gene expression is complicated by several issues. In partic- possibility, we analyzed microarray data from five tissues from ular, many TF loci are members of extended gene families, yet humans and chimpanzees. We identified 90 TF genes with signif- most microarray platforms are not designed to uniquely detect icantly different expression levels in human and chimpanzee brain the specific family members. The problem is particularly acute among which the rapidly evolving KRAB-zinc finger genes are for the largest family of TFs in mammals, the KRAB zinc finger markedly over-represented. The differentially expressed TFs clus- (KRAB-ZNF) genes. About one-third of these genes are pri- ter within a robust regulatory network consisting of two distinct mate-specific, including many recent duplicates (12). In striking but interlinked modules, one strongly associated with energy contrast to other TFs, KRAB-ZNFs have on average accumu- metabolism functions, and the other with transcription, vesicular lated more amino acid differences between humans and chim- transport, and ubiquitination. Our results suggest that concerted panzees than other genes, indicating that they may have con- changes in a relatively small number of interacting TFs may tributed disproportionately to the phenotypic differences coordinate major gene expression differences in human and chim- between these species (13, 14). panzee brain. To enable an accurate comparison of TF gene expression and network structure in human and chimpanzee brain, we devised comparative transcriptomics ͉ KRAB-zinc finger genes ͉ primate evolution ͉ a strategy to reliably distinguish expression levels of individual gene regulatory network evolution gene family members. Our analysis of an established dataset (11) uncovered 90 TF genes that are differentially expressed and umans differ from chimpanzees in a number of important revealed that they are organized in a coexpression network Hanatomical and physiological respects, most strikingly in our comprised of two modules with distinct functions. Both modules enhanced cognitive abilities and a substantial increase in the are enriched for primate-specific KRAB-ZNF genes, which relative size of the human brain (1). Although the human brain despite their recent advent are robustly embedded in the chim- is relatively energy-efficient per cell compared with brains of panzee and human brain networks. Our results implicate a other species, this increased capacity imposes a significant network of TFs with differential expression in human and metabolic and oxidative burden (2, 3). Several studies have noted chimpanzee brain involved in regulation of energy metabolism, the up-regulation of genes and metabolites involved in oxidative vesicle transport, and related functions required to build and metabolism and mitochondrial function in human brains com- maintain the larger and more complex human brain. pared with chimpanzee brains (2, 4, 5). These data, together with evidence of positive selection acting on the promoters of genes Results involved in energy metabolism during human evolution, indicate TF Genes with Differential Expression in Human Compared to Chim- that increased energy production has been essential to the panzee Brain. We reexamined the expression of TF genes (15) in evolution of the human brain (6). The relative up-regulation of a published Affymetrix microarray data set that contrasts five human genes in other functional categories, including neuro- human and chimpanzee tissues: heart, kidney, liver, testis, and protection and synaptic transport, has also been documented (7). brain—specifically prefrontal cortex (PFC). Although all cortex However, the molecular mechanisms underlying these well- regions display very similar expression differences between documented species differences have not been elucidated. humans and chimpanzees (16), the PFC is a good study object Although some differences in human–chimpanzee gene ex- because of its marked differences in structure and function pression may be due to cis-regulatory element divergence, between the two species. For convenience, we will refer to these transcription factors (TFs) represent another potential source of samples simply as ‘‘brain’’ in the following discussion. expression variability. Whereas most cis-element mutations would be expected to have limited, localized effects, alterations in TF sequence and/or expression could alter the expression of Author contributions: K.N. and L.S. designed research; K.N. performed research; T.G. and E.A. contributed new reagents/analytic tools; K.N., T.G., E.A., and L.S. analyzed data; and hundreds of target genes in a coordinated fashion (8, 9). Because K.N. and L.S. wrote the paper. of these predicted consequences, it is often assumed that TFs are The authors declare no conflict of interest. evolutionarily stable, and indeed, TFs as a class are structurally 1To whom correspondence should be addressed at: Institute for Genomic Biology, Univer- well conserved (8). However, two recent studies have identified sity of Illinois at Urbana-Champaign, 1206 West Gregory Drive, Urbana, IL 61801. E-mail: TF genes as enriched among genes with expression patterns that [email protected]. are under directional selection in humans (4, 10). These studies This article contains supporting information online at www.pnas.org/cgi/content/full/ raise the intriguing hypothesis that differences in the expression 0911376106/DCSupplemental.

22358–22363 ͉ PNAS ͉ December 29, 2009 ͉ vol. 106 ͉ no. 52 www.pnas.org͞cgi͞doi͞10.1073͞pnas.0911376106 Downloaded by guest on September 24, 2021 variation within species to the interspecies difference (SI Text). On the other hand, KRAB-ZNFs are significantly depleted (PT, P ϭ 0.999) from the set of differentially expressed genes in testis. The only other large TF families, encoding Homeobox and the basic helix–loop–helix (bHLH) , do not show such enrichment: Whereas 23.2% of brain-expressed KRAB-ZNFs are differentially expressed between species, only 8.2% of the Homeobox and 5.5% of the bHLH genes have changed in brain expression. Our analysis therefore revealed a clear contrast between KRAB-ZNFs and other TFs as well as between brain and other tissues (Fig. 1). To focus on the differentially expressed TFs most likely affecting brain functions, we applied three additional filters. In Fig. 1. Percentage of differentially expressed genes among human and the remainder of this paper, we will only refer to genes as chimpanzee tissues. The proportion of all genes (white), all transcription ‘‘changed’’ between humans and chimpanzees if they (i) are factors (light gray), all KRAB-ZNF (KZNF) genes (gray), conserved KRAB-ZNF significantly different in expression after correcting the P value genes (dark gray), or primate-specific KRAB-ZNF genes (black) that are differ- obtained from the two-sample t test within each tissue for entially expressed between species (t test, P Ͻ 0.01) is shown separately for multiple testing (P Ͻ 0.05; Benjamini–Hochberg correction), (ii) each tissue. Asterisks mark values that represent significant enrichment. Num- have a difference of at least 1.2-fold, and (iii) have a difference bers of genes per category are between 7 and 6,720. of 20 units of expression values. The latter criterion ensures that the gene is expressed at a modest level in at least one species. We Expression data were analyzed from all five tissues in each of found 90 TFs, including 33 KRAB-ZNFs, that met these more six individual humans and in five individual chimpanzees (11). stringent requirements for differential expression between hu- man and chimpanzee brains (Table S2). Despite the fact that To improve the reliability of the comparisons, we masked all most TFs (79 of 90) are expressed in all five tissues, about probes that do not match both genomes perfectly (10). To detect one-quarter of them (18 of 79) have changed specifically in brain. gene family members uniquely, we also removed probes with The proportion of KRAB-ZNFs that have changed only in brain more than one exact match in either genome. This approach EVOLUTION is higher (9 of 29). Interestingly, recent primate-specific KRAB- yielded modified probe sets for many genes, including 1,463 of ZNFs [defined as genes within human segmental duplications of 1,715 probe sets detecting TF loci, and allowed unique expres- at least 90% overall sequence identity (Segmental Duplication sion measurements for 1,063 TFs, including 331 KRAB-ZNF Database, http://humanparalogy.gs.washington.edu, accessed genes (http://znf.igb.uiuc.edu/human). Especially for KRAB- March 30, 2007) (17) that recognize no ortholog in the mouse or ZNFs this is a clear improvement over an earlier study that used dog genomes (12)], are enriched among brain-changed genes smaller, unmasked Affymetrix expression arrays and reliably (PT, P ϭ 0.027) (Fig. 1). Importantly, primate-specific KRAB- detected the expression of only 211 KRAB-ZNF genes, almost ZNFs are not overrepresented among the genes displaying none of which were primate-specific (12). human-chimpanzee expression differences in any other tissue To identify the genes with the largest differences in robust (Table S3). The other TFs that display changed expression levels multiarray average (RMA) expression level between species in in human and chimpanzee brain correspond to a wide variety of each tissue, we initially used a two-sample t test with a P value Ͻ different families (Table S2). While these include a few cut-off of P 0.01. For this cut-off we calculated a false well-studied TFs most have no known regulatory functions or Ͻ discovery rate of 2% for all tissues (Table S1). Implementation established roles in the brain. of the unique-sequence mask clearly affected the discernment of differentially expressed genes (Table S2). Twenty TF genes that Brain-Changed TF Genes Form a Bimodular Coexpression Network. would otherwise have been selected as differentially expressed in Structure of the human brain TF network. The diversity of TF types brain did not display significant species differences after mask- raised the question of whether these proteins are acting inde- ing; likewise, 16 TFs were added to the differentially expressed pendently or coordinately in the brain. To address this question gene set. and to gain clues to possible TF gene functions, we identified As previously noted (11), after our masking, most expression potentially concordant changes between TFs and other genes differences were seen in testis, whereas the four somatic tissues using expression correlation-based methods. showed approximately the same number of differentially ex- As a first step, we computed expression correlations for the pressed genes (Fig. 1). We were particularly interested in ex- brain-changed TFs (Fig. 2). To provide maximum information pression differences in brain, which was not investigated in for these correlations, we used all 30 human samples (six previous studies of human and chimpanzee TFs (4). In contrast individuals per five tissues). This step permitted us to compute to these studies, we did not find TFs as a whole to be over- gene expression correlation values with higher statistical power, represented in the differentially expressed genes from any but it restricts our analysis to genes that are expressed in all five examined tissue (Table S1). This difference could have several tissues, including 79 of the 90 brain-changed TF genes. KRAB- explanations, including differences in array platform and gene ZNFs are thought to function primarily as repressors, but many representation. However, when we analyzed KRAB-ZNF genes TFs can act as activators or repressors depending on context separately we observed a significant [permutation test (PT), P ϭ (e.g., refs. 18 and 19). To allow inclusion of genes that are 0.029] enrichment of genes of this family among differentially negatively regulated by differentially expressed TFs, we identi- expressed genes in brain (Table S1). When we excluded KRAB- fied both positively and negatively correlated genes for each TF ZNFs from our analysis, TFs as a group were significantly from the human tissue dataset. under-represented among the genes with changed brain expres- To further enrich for genes that are potentially directly sion in human compared to chimpanzee (Table S1). We con- affected by changes in TF expression, we next applied an firmed this finding through a series of tests, including calculating interspecies filter (Fig. 2). Specifically, from each set of positively the expected number of differences among genes with similar correlated genes, we retained only those genes with a human– expression intensities like KRAB-ZNFs, analysis on the probe chimpanzee brain expression difference in the same direction as set (rather than gene) level, and comparing gene expression the TF. For example, for each TF that is relatively up-regulated

Nowick et al. PNAS ͉ December 29, 2009 ͉ vol. 106 ͉ no. 52 ͉ 22359 Downloaded by guest on September 24, 2021 Fig. 3. Weighted topological overlap network between brain-changed transcription factors. Using as an input the correlations of the full set of TF-associated genes in human samples, the wTO network was calculated between the 79 ubiquitously expressed brain-changed transcription factors (␻ij Ͼ 0.3). TFs up-regulated in human brain compared with chimpanzee Fig. 2. Strategy to obtain TF-associated gene sets. Summary of the strategy brain are shown in red, and down-regulated TFs are shown in green. Positive used to identify positively and negatively associated gene sets for the network and negative links between TFs are shown in red and green, respectively. analysis. Details are described in the text. Numbers label the four TFs with the highest BC scores.

in human brain compared with chimpanzee brain, we retained calculation with an alternative approach (22) (SI Text and only those genes that are also more highly expressed in human Fig. S1D). brain in the interspecies comparison. Similarly, we filtered the The topology of the two modules of the observed human wTO negatively correlated gene sets by retaining only those genes that network (Fig. 3) is noticeable different; in particular, the average changed in the opposite direction as the TF in brain expression. density (23) of Module 1, d ϭ 0.65, is significantly larger than Thus, for example, for each human up-regulated TF, we retained that of Module 2, d ϭ 0.43 (Mann–Whitney test, P ϭ 2.11 ϫ only genes that displayed lower expression in human compared 10Ϫ8). The two modules are interconnected by a relatively small to chimpanzee brain. In the following discussion we will refer to number of links: Only 54 intermodule links are connecting the set of genes retained by this method as the TF-associated significantly fewer TFs than expected by chance (28 up-regulated gene sets. In many of the TF-associated gene sets we found TFs; PT, P ϭ 0.047; 17 down-regulated TFs; PT, P Ͻ 10Ϫ6). To significant enrichment of (GO) groups indicat- determine the TFs important for connecting the two modules in ing related functions, and in gene sets associated with two TFs an unsupervised way, we calculated the betweenness-centrality with known DNA binding sites we found significant enrichment (BC) score for all TFs (24). Simply stated, the BC score of a gene of those motifs in predicted promoter regions (SI Text). These corresponds to the number of shortest paths between all pairs of data support the idea that the associated gene lists are indeed genes that pass through that gene and is hence a measure for the functionally related to the TFs. centrality of a gene in a network. Genes with high BC scores are We noted considerable overlap between the gene sets asso- sitting on a large number of shortest paths, thus acting as bridge ciated with certain TFs and the GO categories enriched among nodes in a modular network. We find that ZNF542, NFYA, those gene sets (Table S4), indicating that groups of TFs might TAF11, and ZNF717 are the TFs with the largest BC scores be acting cooperatively. To test this hypothesis, we calculated the (numbered 1–4 in Fig. 3, respectively; see Table S5 for a ␻ weighted topological overlap (wTO) matrix [ ij] (20–22) for the complete list). brain-changed human TFs. The wTO method allows us to Comparing network structure between species. To investigate species- visualize the interactions between TFs in a large network by not specific differences in network architecture, we used the same just displaying possible direct TF correlations, but also by taking approach to calculate a wTO network representing the connec- the overlap between TF-associated gene sets into account. In tions between the changed TFs in all five chimpanzee tissues (a other words, instead of drawing all links between all correlated total of 25 samples). The TFs in the chimpanzee brain network genes in the network, the overall commonality in the associated are the same as in the human brain network; however, because gene sets for two TFs is represented. Detrimental effects result- expression patterns for genes have changed between species, the ing from incorrectly assigned associated genes (false positives) TF-associated gene sets might be different. Because there were are therefore strongly suppressed, and we expect the resulting six humans but only five chimpanzees in the original study, we network to provide more robust information about the connec- produced 50 human wTO networks based on random sampling tions between TFs than a simple gene correlation network. of five out of six human individuals per tissue to facilitate a direct The resulting human brain wTO network (Fig. 3 and Fig. S1A) comparison between the networks for the two species. These shows a striking biclustering of human up-regulated and human samples were then used to generate a consensus human network down-regulated TFs into separate modules with high connectiv- consisting of all links found in most of the resulting networks. ity within each module but few links between them. We hereafter Note that the structure of the human network is robust with refer to these modules as Module 1 (dominated by human brain regard to this resampling (Fig. S1C). We find that the overall up-regulated TF genes) and Module 2 (dominated by down- topology of the chimpanzee network is very similar to the human regulated TFs). Testing multiple wTO cutoffs in the interval [0.2, consensus network (Fig. S1B). However, human TFs are more 0.5] we demonstrated that this structure is robust with respect to interconnected with each other (PT, P Ͻ 0.02) and have mainly cutoff choice. Furthermore, we implemented several tests to gained links between Module 1 TFs (PT, P Ͻ 0.02), whereas ensure that the observed network structure is significantly chimpanzee TFs have significantly more links between Module different from random expectation and compared our wTO 2 TFs (PT, P Ͻ 0.02). The links connecting the two modules are

22360 ͉ www.pnas.org͞cgi͞doi͞10.1073͞pnas.0911376106 Nowick et al. Downloaded by guest on September 24, 2021 Fig. 5. GO categories over-represented among genes associated with the TFs of each module. The program FUNC (25) was used to test for GO categories enriched among associated gene sets of TFs of each module. The sizes of the pie segments are proportional to the number of genes annotated in a given enriched (P Ͻ 0.05) GO category. The legend is sorted by size of the pie segments, going from the biggest to the smallest segment in Module 1 and then from the smallest to the biggest segment in Module 2. Fig. 4. Difference in number of species-specific links among TFs in the human and chimpanzee wTO network. For each TF, the difference in the number of human- vs. chimpanzee-specific links is plotted. For definition of species- to chimpanzee, the enriched GO functions are all predicted to be specific links, see the text. Genes with the largest numbers of human-specific relatively increased in human brain. links are positioned at top. The mean difference is 1.3 and 2 SDs are 10.3. Discussion EVOLUTION In this study we examined TFs with differential expression enriched for potential species-specific links (␹2 test, P ϭ 2.2 ϫ between humans and chimpanzees and the potential impact of Ϫ 10 16). these TFs on global gene expression differences between both Next we identified TFs that are differently integrated into the species. To visualize potential interactions and combinatorial network in the two species. Therefore, we identified all species- effects of these TFs, we represent the similarity of their associ- specific links between pairs of TFs (Table S6). We defined ated gene sets in a wTO-based network. Several recent studies human-specific links as those that are present in all 50 resampled of gene coexpression networks have used a wTO measure, based networks but not in the chimpanzee network. Likewise, chim- on pairwise correlations of all expressed genes as input for panzee-specific links are those present in the chimpanzee net- clustering algorithms, to identify gene modules of interest (26, work but in none of the resampled ones. Note, that these 27). In contrast, we focused on TFs that are differentially definitions probably lead to an underestimation of species- expressed in human and chimpanzee PFC together with their specific links. Human TFs have on average four links that are not correlated genes. After calculating expression correlation be- present in the chimpanzee network, whereas chimpanzee TFs tween the TFs and other genes within each species, we further have on average two potential chimpanzee-specific links (Fig. 4). enriched for potential direct targets of the TFs using interspecies TCEAL1, TBX19, ZFHX1B, and ZNF295 (human) and TFDP2 comparison as a filter. Finally, we implemented a wTO measure (chimpanzee) stand out as having gained a significant (Ͼ2 SD) that accounts for both positive and negative correlations. This number of species-specific links. These species-specific links can approach allowed us to incorporate genes affected by TFs that shift the position of a TF in the network. For instance TFDP2 has are predicted to function as negative regulators, like the KRAB- 14 chimpanzee-specific links, all connecting this gene more ZNFs (28). tightly to Module 2, whereas it is linked more closely to Module The resulting wTO network is TF-focused, but it captures the 1 in humans (Fig. S1, compare A with B). TF’s patterns of overlapping influence on a wider collection of brain-changed genes. We should note that the TF-associated Functions associated with the TF modules. To identify potential genes used to construct the network are ubiquitously expressed, functions for the two modules, we determined which genes were because the initial, intraspecies correlations were calculated most frequently associated with TFs in each module and tested using data from all five tissues. Nevertheless, we expect the brain whether these genes were enriched for certain GO categories network to have many unique features because Ϸ23% of the using the program FUNC (25). We found that each module is differentially expressed TFs in it have not changed significantly characterized by distinct enrichment of GO categories (Fig. 5), in the other tested tissues. Furthermore, only 25% of TF- and enriched GO categories were almost identical for the associated genes in brain are associated with the same TF in corresponding chimpanzee modules (Fig. S2). In particular, heart and testis, and Ͻ7% of these associations are maintained Module 1 TF-associated genes are highly enriched for GO in kidney and liver (Table S7). Hence, the wTO network of other categories involved in transcription, ubiquitination, and vesicular tissues would be constructed of different gene sets and links. This transport; among Module 2 TF-associated genes, GO categories dataset included samples of the PFC as the representative brain corresponding to translation, mitochondrial function and energy tissue, and it is possible that the brain TF network we have metabolism are most highly over-represented. Interestingly, for identified is specifically active in PFC rather than being common the human modules, we found significant enrichment for GO to all regions of the brain. Previous studies have shown that, categories only for genes that are positively correlated with TFs although gene expression is very similar among different cortical in Module 1 and only for genes negatively correlated with TFs regions (16), cortex displays more human–chimpanzee coexpres- in Module 2. Because most Module 1 TFs are up-regulated and sion network differences than other regions of the brain (26). most TFs of Module 2 are down-regulated in human compared These findings are consistent with the known evolutionary

Nowick et al. PNAS ͉ December 29, 2009 ͉ vol. 106 ͉ no. 52 ͉ 22361 Downloaded by guest on September 24, 2021 differences and make the PFC a good choice for our comparative represent a very recent evolutionary change. In addition, some TFs study. stand out for striking species-specific differences in the number and Our analysis has identified a cadre of brain-expressed TF identity of their network links. Given the dramatic differences in genes that have changed expression patterns in a concerted human and chimpanzee brain size, it is intriguing to note that fashion in recent primate history. The TFs cluster into a tight ZFHX1B, mentioned above for its association with human micro- bimodular network suggesting that, rather than functioning cephaly, is one of the genes with the highest differences in connec- independently, this diverse set of proteins acts coordinately to tivity between species (Fig. 4). regulate specific processes differently in human brain compared The primate brain network we have identified contains deeply with chimpanzee brain. This type of concerted TF change is the conserved genes, but our analysis indicates that recently evolved signature of dramatic shift within larger biological pathways, KRAB-ZNF genes play a disproportionate role. Furthermore, clues to which may be found in the distinct sets of functions the links connecting the two modules are enriched for species- associated with each of the two network modules. specific links, suggesting that the cross-talk between the two Genes associated with TFs in Module 1 are associated most modules is still evolving. This argues that the observed human strongly within GO categories related to transcription, derived brain network is a very recent evolutionary invention. To test this from the TFs themselves along with cofactors and chromatin hypothesis, to investigate how the human network formed from proteins (Fig. 5 and Fig. S2). A second set of functions related a putative ancestral network and how newly evolved genes have to vesicle-mediated transport reflects the prominence of genes been incorporated, it will be important to analyze expression encoding small GTPase proteins, which play critical roles in data from additional primate species, for instance orangutan and neurite outgrowth, axonal transport, and synaptic transmission rhesus macaque, in future studies. (29, 30). These functions are predicted to be relatively up- Although several brain-changed KRAB-ZNFs have been im- regulated in human, consistent with added requirements of plicated in brain development or human neurological disorders building and maintaining the larger human cortex. Module 1 (38–40), most of these proteins are presently of unknown TF-associated genes are also annotated in ubiquitination and the functions. Data presented here provide clues linking them to unfolded protein response, processes that serve neuroprotective potential target genes and regulatory partners in well-studied functions and are critical to the maintenance of neuronal health functional pathways. These data also suggest concrete ap- (31). Most Module 1 TFs are of unknown function, but some of proaches to test these hypotheses in future studies. For example, them have been implicated in processes related to our GO-based siRNA knockdown and ChIP experiments with ZNF717 and functional predictions. For example, OPTN, has been implicated other ‘‘bridge’’ genes can test the relationship between these TFs, in regulation of vesicular trafficking and is thought to play a their associated gene sets, and other TFs in the network. neuroprotective role (32). Other Module 1 TFs function in However, any gene in this brain-changed TF network can play an neuroprotection as well as neurite outgrowth and cortical neu- important functional role. Given the fact that the functions of rogenesis; most notably in this latter category, ZFHX1B is most of these brain-expressed TFs are unknown, new data associated with Mowat–Wilson syndrome, a complex human regarding their expression patterns and interactions in specific disorder associated with cortical hypoplasia and microcephaly cell types, their target genes, and their DNA binding activities (33). The tight linkages between these known proteins and other will further our understanding of the differences between pri- Module 1 TFs indicate roles for the uncharacterized TF genes in mate species and will provide new clues to the basis for pheno- forebrain development, neurite outgrowth, and neuroprotection. typic differences between the human and chimpanzee brain. Module 2 is associated overwhelmingly with pathways linked to energy metabolism (Fig. 5). The overall up-regulation of Materials and Methods energy metabolism genes and metabolites in human brain has Microarray Data Preprocessing and Analysis. We downloaded CEL files for HG also been noted in previous studies (2, 4, 5, 7). While higher U133 Plus 2.0 arrays from a previous study; tissue samples were taken from metabolic rates may provide additional fuel for the larger human apparently healthy male individuals of various ages from each species, all of brain, they also yield increased levels of reactive oxygen species which had died a sudden death (11). Note, that the donors of the tissues are (ROS), which can lead to neuronal death (34). It might therefore not the same individuals across the five tissues. We aligned probe sequences against the hg18 and panTro2 genomes and removed all probes with DNA be predicted that peroxisomal functions, which play a critical role sequence differences between species similarly as previously described (11). in ROS metabolism and the management of oxidative stress (35), Next, probes with multiple matches in the hg18 or panTro2 genome were would be salient in Module 2. Our results indicate that ROS- removed. Probe sets with less than four remaining probes were discarded producing and -metabolizing functions are indeed coordinately from further analysis, leaving 47,506 probe sets with on average 8.08 (median regulated and point to a module of TFs that may be involved in 8) probes per set. The numbers of probes per probe set remaining for all probe this important functional coupling. A few well-studied Module 2 sets, for TFs, or for KRAB-ZNF genes were not significantly different from each TFs provide consistent functional links. For instance, homeobox other (Kruskal–Wallis test, P Ͼ 0.22; all pairwise comparisons). RMA (41) protein PKNOX1 is involved in the regulation of genes associated expression values and MAS5 detection P values were calculated with the with mitochondrial membrane permeability and organelle ho- Bioconductor ‘‘affy’’ package (42). Probe sets were considered to be ‘‘ex- pressed’’ at a detection value of P Ͻ 0.05. For genes with multiple expressed meostasis (36), and AHR is involved in the regulation of mito- probe sets, a mean expression intensity was calculated. chondrial membrane potential and production of ROS (37). FDR was calculated for unadjusted P Ͻ 0.01 (two-sample t test) using a PT It is difficult to infer TF hierarchies from the data presented here. (based on random assignment of species labels 1,000 times) and comparing However, certain TFs stand out for their position in the network the median number of significant genes in the permuted datasets with the structure. For example, ZNF717, a KRAB-ZNF of unknown observed number. To test for over-representation of TFs and KRAB-ZNFs function, serves as a bridge node between the two portions of the among differentially expressed genes, we used PTs (based on drawing random network, tied by positive links to Module 2 and negative links to genes 1,000 times). P values were calculated based on the number of times Module 1 (Fig. 3, labeled as number 4). It is tempting to speculate that the number of significant genes in a permuted dataset was equal to or that these negatively linked TFs constitute direct repressive targets larger than the observed number. of ZNF717, a hypothesis that can now be tested experimentally. TF-Associated Gene Sets and wTO Calculation for TF Network. We defined genes Like several other KRAB-ZNFs in this network, ZNF717 is a as associated with a TF if they were significantly positively or negatively (P Ͻ primate-specific duplicate, with an ortholog in chimpanzee and 0.05) correlated with the TF in a Spearman rank correlation test performed orangutan but not in rhesus macaque (as determined by reciprocal across all five tissues and all six human individuals (30 samples) and were BLAST and investigation of synthetic genomic region); its predicted differentially expressed between human and chimpanzee brains in the pre- role as a connector between the two modules would therefore dicted direction (see Structure of the human brain TF network). We then

22362 ͉ www.pnas.org͞cgi͞doi͞10.1073͞pnas.0911376106 Nowick et al. Downloaded by guest on September 24, 2021 calculated the weighted topological overlap (20–22) matrix starting from the members of TF associated gene sets into account when assigning a wTO score ϭ ϭ ʦ Ϫ ϭ adjacency matrix A [aij], with aij Corr(i,j) [ 1,1] and aii 0, representing to each TF pair. We used a cutoff of ␻ij Ͼ 0.3 for our network analysis (see the 79 differentially expressed TFs and the 1,273 members of their associated SI Text for a discussion of this cut-off choice). gene sets and containing a total of 29,361 significant (P Ͻ 0.05) pairwise correlation scores. Note that the wTO also incorporates the correlations of two Enrichment of GO Groups. To relate GO categories to a module in the wTO TFs’ associated gene sets. Further note that previously published expressions network, we counted how often a gene was a member of a gene set associated for the wTO (e.g., ref. 20) are only appropriate for positive adjacency matrices. with a TF of Module 1 or Module 2. Next, FUNC (25) was performed with the Here we have extended the range of validity of the wTO to also include the following settings: Wilcoxon test, cut-off of five genes per group; GO Anno- case a ʦ [Ϫ1,1] when a Յ 0fa a Յ 0 for all u (and conversely a Ն 0fa a ij ij iu uj ij iu uj tation from April 2008; and GO categories with P Ͻ 0.05 before and after Ն 0 for all u): refinement are reported. ͸ a a ϩ a u iu uj ij ACKNOWLEDGMENTS. We thank Mehmet Somel, Janet Kelso and Michael ␻ij ϭ , min͑ki,kj͒ ϩ 1 Ϫ ͯaijͯ Lachmann for help with the implementation of the mask and Elbert Branscomb, Aron Branscomb, and David Clayton for critical reviews of the where the weighted connectivity of a node i is ki ϭ͚j ͯaijͯ. Thus, when manuscript. This work was supported by National Institute of General Medical generating the wTO network by calculating ␻ij between the 79 TFs, we take all Sciences Grant GM078368 (to L.S.).

1. Varki A, Altheide TK (2005) Comparing the human and chimpanzee genomes: Search- 23. Horvath S, Dong J (2008) Geometric interpretation of gene coexpression network ing for needles in a haystack. Genome Res 15:1746–1758. analysis. PLoS Comput Biol 4:e1000117. 2. Khaitovich P, et al. (2008) Metabolic changes in schizophrenia and human brain 24. Newman ME (2001) Scientific collaboration networks. II. Shortest paths, weighted evolution. Genome Biol 9:R124. networks, and centrality. Phys Rev E Stat Nonlin Soft Matter Phys 64:016132. 3. Sokoloff L (1960) Quantitative measurements of cerebral blood flow in man. Methods 25. Prufer K, et al. (2007) FUNC: A package for detecting significant associations between Med Res 8:253–261. gene sets and ontological annotations. BMC Bioinformatics 8:41. 4. Blekhman R, Oshlack A, Chabot AE, Smyth GK, Gilad Y (2008) Gene regulation in 26. Oldham MC, Horvath S, Geschwind DH (2006) Conservation and evolution of gene primates evolves under tissue-specific selection pressures. PLoS Genet 4:e1000271. coexpression networks in human and chimpanzee brains. Proc Natl Acad Sci USA 5. Uddin M, et al. (2004) Sister grouping of chimpanzees and humans as revealed by 103:17973–17978. genome-wide phylogenetic analysis of brain gene expression profiles. Proc Natl Acad 27. Oldham MC, et al. (2008) Functional organization of the transcriptome in human brain. Sci USA 101:2957–2962. Nat Neurosci 11:1271–1282. 6. Haygood R, Fedrigo O, Hanson B, Yokoyama KD, Wray GA (2007) Promoter regions of 28. Schultz DC, Friedman JR, Rauscher FJ, III (2001) Targeting histone deacetylase com- many neural- and nutrition-related genes have experienced positive selection during plexes via KRAB-zinc finger proteins: The PHD and bromodomains of KAP-1 form a human evolution. Nat Genet 39:1140–1144. cooperative unit that recruits a novel isoform of the Mi-2alpha subunit of NuRD. Genes EVOLUTION 7. Caceres M, et al. (2003) Elevated gene expression levels distinguish human from Dev 15:428–443. non-human primate brains. Proc Natl Acad Sci USA 100:13030–13035. 29. Linseman DA, Loucks FA (2008) Diverse roles of Rho family GTPases in neuronal 8. Ranz JM, Machado CA (2006) Uncovering evolutionary patterns of gene expression development, survival, and death. Front Biosci 13:657–676. using microarrays. Trends Ecol Evol 21:29–37. 30. Ng EL, Tang BL (2008) Rab GTPases and their roles in brain neurons and glia. Brain Res 9. Wray GA, et al. (2003) The evolution of transcriptional regulation in eukaryotes. Mol Rev 58:236–246. Biol Evol 20:1377–1419. 31. Jenner P (2003) Oxidative stress in Parkinson’s disease. Ann Neurol 53(Suppl 3):S26– 10. Gilad Y, Oshlack A, Smyth GK, Speed TP, White KP (2006) Expression profiling in S36, and discussion (2003) 53(Suppl 3):S36–S28. primates reveals a rapid evolution of human transcription factors. Nature 440:242–245. 32. Anborgh PH, et al. (2005) Inhibition of metabotropic glutamate receptor signaling by 11. Khaitovich P, et al. (2005) Parallel patterns of evolution in the genomes and transcrip- the huntingtin-binding protein optineurin. J Biol Chem 280:34840–34848. tomes of humans and chimpanzees. Science 309:1850–1854. 33. Mowat DR, Wilson MJ, Goossens M (2003) Mowat–Wilson syndrome. J Med Genet 12. Huntley S, et al. (2006) A comprehensive catalog of human KRAB-associated zinc finger 40:305–310. genes: Insights into the evolutionary history of a large family of transcriptional 34. Masters CJ, Crane DI (1995) On the role of the peroxisome in ontogeny, ageing and repressors. Genome Res 16:669–677. degenerative disease. Mech Ageing Dev 80:69–83. 13. Bustamante CD, et al. (2005) Natural selection on protein-coding genes in the human 35. Schrader M, Yoon Y (2007) Mitochondria and peroxisomes: Are the ‘big brother’ and genome. Nature 437:1153–1157. the ‘little sister’ closer than assumed? BioEssays 29:1105–1114. 14. Nielsen R, et al. (2005) A scan for positively selected genes in the genomes of humans 36. Micali N, Ferrai C, Fernandez-Diaz LC, Blasi F, Crippa MP (2009) Prep1 directly regulates and chimpanzees. PLoS Biol 3:e170. 15. Messina DN, Glasscock J, Gish W, Lovett M (2004) An ORFeome-based analysis of human the intrinsic apoptotic pathway by controlling Bcl-XL levels. Mol Cell Biol 29:1143– genes and the construction of a microarray to interrogate their 1151. expression. Genome Res 14:2041–2047. 37. Fisher MT, Nagarkatti M, Nagarkatti PS (2005) Aryl hydrocarbon receptor-dependent 16. Khaitovich P, et al. (2004) Regional patterns of gene expression in human and chim- induction of loss of mitochondrial membrane potential in epididydimal spermatozoa panzee brains. Genome Res 14:1462–1473. by 2,3,7,8-tetrachlorodibenzo-p-dioxin (TCDD). Toxicol Lett 157:99–107. 17. Cheng Z, et al. (2005) A genome-wide comparison of recent chimpanzee and human 38. Lugtenberg D, et al. (2006) ZNF674: A new kruppel-associated box-containing zinc- segmental duplications. Nature 437:88–93. finger gene involved in nonsyndromic X-linked mental retardation. Am J Hum Genet 18. Ceribelli M, et al. (2008) The histone-like NF-Y is a bifunctional transcription factor. Mol 78:265–278. Cell Biol 28:2047–2058. 39. Poot M, et al. (2007) Dandy-Walker complex in a boy witha5Mbdeletion of region 19. He Y, Casaccia-Bonnefil P (2008) The Yin and Yang of YY1 in the nervous system. 1q44 due to a paternal t(1;20)(q44;q13.33). Am J Med Genet 143:1038–1044. J Neurochem 106:1493–1502. 40. Tentler D, et al. (2003) A candidate region for Asperger syndrome defined by two 17p 20. Carlson MR, et al. (2006) Gene connectivity, function, and sequence conservation: breakpoints. Eur J Hum Genet 11:189–195. Predictions from modular yeast co-expression networks. BMC Genomics 7:40. 41. Bolstad BM, Irizarry RA, Astrand M, Speed TP (2003) A comparison of normalization 21. Ravasz E, Somera AL, Mongru DA, Oltvai ZN, Barabasi AL (2002) Hierarchical organi- methods for high density oligonucleotide array data based on variance and bias. zation of modularity in metabolic networks. Science 297:1551–1555. Bioinformatics 19:185–193. 22. Zhang B, Horvath S (2005) A general framework for weighted gene co-expression 42. Gentleman RC, et al. (2004) Bioconductor: Open software development for computa- network analysis. Stat Appl Genet Mol Biol 4:17. tional biology and bioinformatics. Genome Biol 5:R80.

Nowick et al. PNAS ͉ December 29, 2009 ͉ vol. 106 ͉ no. 52 ͉ 22363 Downloaded by guest on September 24, 2021