Meta-Analysis of Transcriptomic Datasets Identifies Genes Enriched
Total Page:16
File Type:pdf, Size:1020Kb
9860–9873 Nucleic Acids Research, 2017, Vol. 45, No. 17 Published online 18 August 2017 doi: 10.1093/nar/gkx714 Meta-analysis of transcriptomic datasets identifies genes enriched in the mammalian circadian pacemaker Laurence A. Brown1, John Williams2, Lewis Taylor1, Ross J. Thomson1,PatrickM.Nolan2, Russell G. Foster1,* and Stuart N. Peirson1,* 1Sleep and Circadian Neuroscience Institute (SCNi), Nuffield Department of Clinical Neurosciences, University of Oxford, Oxford, OX1 3RE, UK and 2MRC Harwell Institute, Harwell Campus, Oxfordshire OX11 0RD, UK Received September 07, 2016; Revised July 27, 2017; Editorial Decision July 30, 2017; Accepted August 04, 2017 ABSTRACT of an endogenous biological clock. In mammals, the site of the master circadian pacemaker is the suprachiasmatic The master circadian pacemaker in mammals is lo- nuclei (SCN) in the anterior hypothalamus (1). The SCN cated in the suprachiasmatic nuclei (SCN) which reg- receives light information from the retina via the retinohy- ulate physiology and behaviour, as well as coordi- pothalamic tract, synchronizing (entraining) SCN rhythms nating peripheral clocks throughout the body. In- to the external environment. SCN lesions result in loss of vestigating the function of the SCN has often fo- physiological and behavioural rhythms, and transplanta- cused on the identification of rhythmically expressed tion of foetal SCN can restore rhythmicity with a period genes. However, not all genes critical for SCN func- consistent with that of the donor tissue (2). The circa- tion are rhythmically expressed. An alternative strat- dian clock is the product of an intracellular transcriptional- egy is to characterize those genes that are selec- translational feedback loop (TTFL), comprised of a num- tively enriched in the SCN. Here, we examined the ber of so-called ‘clock genes’. Whilst clock genes are ex- pressed in tissues throughout the body, the coordinated transcriptome of the SCN and whole brain (WB) of function of the circadian system depends upon neural, hor- mice using meta-analysis of publicly deposited data monal and behavioural output of the SCN pacemaker (3– across a range of microarray platforms and RNA- 5). Seq data. A total of 79 microarrays were used (24 The identification of mammalian clock genes has crit- SCN and 55 WB samples, 4 different microarray plat- ically depended upon the application of forward and re- forms), alongside 17 RNA-Seq data files (7 SCN and verse genetics. Large scale mutagenesis projects have been 10 WB). 31 684 MGI gene symbols had data for at invaluable in identifying novel mouse mutants with circa- least one platform. Meta-analysis using a random ef- dian phenotypes. Mapping the underlying mutation has led fects model for weighting individual effect sizes (de- to the identification of a number of key clock genes, includ- rived from differential expression between relevant ing Clock (6), Fbxl3 (7), Fbxl21 (8)andZfhx3 (9). By con- SCN and WB samples) reliably detected known SCN trast, reverse genetics has been used to identify homologs of Drosophila clock genes (identified by forward genetics), in- markers. SCN-enriched transcripts identified in this cluding Period 1–3 (10–12)andCryptochrome 1–2 (13,14)as study provide novel insights into SCN function, in- well as proteins that interact with known clock components, cluding identifying genes which may play key roles such as Arntl (Bmal1, (15)). A key feature of many of the in SCN physiology or provide SCN-specific drivers. core clock genes is that they are rhythmically expressed over the 24-h cycle. Therefore, studies to identify new clock genes INTRODUCTION have often taken the approach of identifying transcripts that are rhythmically expressed in the SCN or other periph- Life on Earth has evolved under a predictably changing cy- eral clocks, under either entrained light/dark conditions or cle of light and darkness and, as a result, virtually all or- over one or more circadian cycles under constant conditions ganisms demonstrate striking changes in physiology and be- (16). These studies have been successful in identifying novel haviour over the 24-h day. These rhythms are not simply clock genes as well as genes important for SCN function a response to the changing environment, but persist under (17,18). However, not all clock genes are rhythmically ex- constant conditions––providing evidence for the existence *To whom correspondence should be addressed. Tel: +44 0 1865 618 674; Email: [email protected] Correspondence may also be addressed to Russell G. Foster. Tel: +44 0 1865 618 661; Email: [email protected] C The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. Nucleic Acids Research, 2017, Vol. 45, No. 17 9861 pressed, and rather than changing in abundance, some com- numbers of transcripts. The methods for drawing together ponents of the TTFL may show rhythmic post-translational a series of experiments with differing biological and techni- modification or simply depend upon interactions with other cal variance are numerous, and depend partly on the quality rhythmic components. Moreover, many genes involved in and detail of the available data. Where only lists of signifi- key functions of the SCN are not elements of the TTFL. cant genes are available, methods such as vote-counting pro- For example, a number of neuropeptides and their receptors vide the most straightforward approach to meta-analysis play key roles in SCN physiology (such as vasoactive intesti- (30). Venn-diagrams are often used as a simple form of vote nal polypeptide, VIP (19)andVIPR2(20)), as do GABA counting, focusing on genes that pass both a fold-change receptors (21). criteria and one or more statistical tests. A problem of this An alternative strategy to identify mechanisms critical for approach is that the pre-processing methods and inclusion SCN function is to characterize those genes that are selec- criteria used by the original authors to construct each gene tively enriched in the SCN. The most straightforward way of list are unlikely to be comparable across all data sources identifying SCN-enriched genes is to directly compare the (16). Alternative approaches make use of original raw data transcriptome of SCN against the whole brain. Despite the by combining either rank-orders of genes (31)orP-values value of this approach, only a few such studies exist - focus- as measures of significance (32). Finally, weighting the con- ing on multiple brain regions (22), or specifically on a subset tribution of each data point for a gene by the inverse of of genes (23). Another advantage of this approach is that it the variance of that data provides a way of comparing data enables SCN-specific genes to be identified, providing criti- across studies, assuming that lower variance is an indicator cal tools for conditional transgenesis. As many clock genes of reproducibility (33). These methods are discussed in de- are expressed throughout the body, studies of constitutive tail by Ramasamy et al.(34), which provides the basis for knockout mouse models can be problematic due to develop- our analysis. mental effects or differing roles of these genes in other target Here, we describe a meta-analysis of expression data from tissues/organs. As a result, many researchers are looking to SCN and whole brain across a range of microarray tech- the use of conditional knockouts, for example, using Cre- nologies and incorporating recent RNA-Seq data (Figure lox technology (24,25). The identification of genes that are 1). We use the term ‘platform’ in the rest of the paper to specifically and highly enriched in the SCN may not only refer to each different type of microarray technology (e.g. reveal new biological insights regarding SCN physiology, Mouse Exon ST1.0 arrays) or RNA-Sequencing. This meta- but also has the potential to produce SCN-specific drivers, analysis enables us to reliably identify transcripts enriched which would be of benefit to the circadian community. in the SCN. These transcripts provide novel insights into the Studies using transcriptomic approaches face several function of the master circadian pacemaker, providing new well-characterized challenges (26). Biological and technical candidates that may play key roles in SCN function, as well variance result in a degree of noise in any study, but when as providing new targets for SCN-specific gene targeting. making comparisons between thousands of genes, false pos- itive rates become a major problem. Using P < 0.05, com- MATERIALS AND METHODS paring the expression of ∼20 000 transcripts would be ex- Data-mining to identify SCN and WB datasets for pected to give 1000 false positives––genes that would ap- Affymetrix platforms pear significant even though they are unchanged. As such, false discovery rates are corrected to account for this issue. Analysis of publicly available datasets revealed that a large However, due to the cost of running transcriptomic exper- part of the apparent variation in any comparison was at- iments, sample sizes are often limited. This results in re- tributable to differences in the different array platforms. For duced power––that is the ability to identify real differences this reason samples from a particular microarray platform where they exist, which can result in biologically relevant could only be used if both SCN and whole brain (WB) data findings being missed. As such, transcriptomic studies are were available for that platform. In addition, investigations often a statistical balancing act––resulting in a list of candi- were limited to the Affymetrix microarray platforms (by far date genes which may contain false positives, and may be the most prominent platforms with suitable data) and we missing real genes of interest.