Systems-Epigenomics Inference of Transcription Factor Activity
Total Page:16
File Type:pdf, Size:1020Kb
Chen et al. Genome Biology (2017) 18:236 DOI 10.1186/s13059-017-1366-0 RESEARCH Open Access Systems-epigenomics inference of transcription factor activity implicates aryl- hydrocarbon-receptor inactivation as a key event in lung cancer development Yuting Chen1†, Martin Widschwendter2 and Andrew E. Teschendorff1,2,3*† Abstract Background: Diverse molecular alterations associated with smoking in normal and precursor lung cancer cells have been reported, yet their role in lung cancer etiology remains unclear. A prominent example is hypomethylation of the aryl hydrocarbon-receptor repressor (AHRR) locus, which is observed in blood and squamous epithelial cells of smokers, but not in lung cancer. Results: Using a novel systems-epigenomics algorithm, called SEPIRA, which leverages the power of a large RNA- sequencing expression compendium to infer regulatory activity from messenger RNA expression or DNA methylation (DNAm) profiles, we infer the landscape of binding activity of lung-specific transcription factors (TFs) in lung carcinogenesis. We show that lung-specific TFs become preferentially inactivated in lung cancer and precursor lung cancer lesions and further demonstrate that these results can be derived using only DNAm data. We identify subsets of TFs which become inactivated in precursor cells. Among these regulatory factors, we identify AHR, the aryl hydrocarbon-receptor which controls a healthy immune response in the lung epithelium and whose repressor, AHRR, has recently been implicated in smoking-mediated lung cancer. In addition, we identify FOXJ1, a TF which promotes growth of airway cilia and effective clearance of the lung airway epithelium from carcinogens. Conclusions: We identify TFs, such as AHR, which become inactivated in the earliest stages of lung cancer and which, unlike AHRR hypomethylation, are also inactivated in lung cancer itself. The novel systems-epigenomics algorithm SEPIRA will be useful to the wider epigenome-wide association study community as a means of inferring regulatory activity. Keywords: Smoking, Cancer, EWAS, Transcription factor, Regulatory network, DNA methylation, Gene expression, Causality, AHRR Background associated with major cancer risk factors in normal or pre- Elucidating the mechanisms of early carcinogenesis is cursor cancer cells [3–9]. Smoking is of particular interest important, not only for improving our understanding of since it is a potent risk factor for many cancers, especially cancer, but also for devising and implementing risk pre- lung cancer. diction and preventive action strategies [1, 2]. To this end, Many previous efforts have identified molecular many studies have begun to map molecular alterations changes in normal or cancer cells exposed to smoke car- cinogens. For instance, studies of the somatic mutation landscape of a wide range of different cancer types have * Correspondence: [email protected]; [email protected] †Equal contributors unraveled a somatic mutational signature that is associ- 1CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for ated with smoking exposure [4, 10]. Other studies com- Computational Biology, 320 Yue Yang Road, Shanghai 200031, China paring gene expression levels in the normal lung tissue 2Department of Women’s Cancer, University College London, 74 Huntley Street, London WC1E 6AU, UK adjacent to cancer in smokers vs non-smokers have Full list of author information is available at the end of the article © The Author(s). 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Chen et al. Genome Biology (2017) 18:236 Page 2 of 18 identified smoking-associated gene-expression signatures leverages the power of a large RNA-sequencing (RNA- [9, 11]. Epigenome-wide association studies (EWAS) seq) expression compendium encompassing thousands conducted in blood [8, 12–14] and buccal tissue [6] have of samples from many different tissue types, while also identified highly reproducible smoking-associated adjusting for cell-type heterogeneity. Although several differentially methylated CpGs (smkDMCs) [15]. A re- methods for inferring TF binding activity from gene ex- cent EWAS in buccal cells, a source of tissue enriched pression data exist [33–41], SEPIRA is also able to infer for squamous epithelial cells, also showed how many of regulatory activity purely from the patterns of promoter the smkDMCs mapping to promoters, anti-correlate DNAm change at a key set of high-quality targets. We with corresponding gene expression changes in the nor- note that computational tools to infer regulatory activity mal lung tissue of smokers [6]. More recent studies have from DNAm profiles have not been extensively applied shown that many of the top-ranked smkDMCs (e.g. this or validated [36, 37, 40]. We posited that a powerful tool includes CpGs mapping to the aryl hydrocarbon- for inferring regulatory activity from DNAm profiles receptor repressor [AHRR] locus) predict the future risk would be particularly valuable for identifying early causal of lung cancer and all-cause mortality [16–22]. Some pathways in carcinogenesis, as TF binding sites are often studies have even suggested that hypomethylation at the observed to become hypermethylated in response to a AHRR locus (and other top-ranked smkDMCs) may be wide range of different cancer risk factors, including causally involved in mediating the risk of smoking on smoking and age, which may cause, or be a reflection of, lung cancer [16]. However, the biological mechanism(s) differential binding activity [6, 31, 32, 42]. linking hypomethylation of the AHRR and other top- Importantly, using SEPIRA, we are here able to shed ranked smkDMCs to lung cancer risk remain elusive. In new light on the potential role of the AHR/AHRR path- fact, the AHR pathway is mostly known as a toxin- way in lung cancer etiology, linking its inactivation to an response pathway, suggesting that the DNA methylation altered immune response in the lung epithelium, while (DNAm) changes observed at the AHRR locus may also identifying other regulatory pathways (e.g. FOXJ1/ merely reflect a response to smoke toxins without neces- HIF3A) which become inactivated in smoking-associated sarily being causally involved [6, 23]. Consistent with lung cancer, in precursor lung cancer lesions, and in this, many of the top-ranked hypomethylated smkDMCs, normal cells exposed to smoke carcinogens. Specifically, including those mapping to the AHRR locus, do not our work points towards inactivation of the AHR pathway exhibit hypomethylation in lung cancer [6], suggesting as the more fundamental event underlying smoking- that cells carrying these DNAm alterations are not mediated lung carcinogenesis, instead of AHRR hypo- selected for during cancer progression. Thus, the role of methylation which is not observed in lung cancer. The the AHR-pathway in lung cancer etiology is unclear. unbiased discovery of the AHR pathway as well as path- Here we decided to approach this paradox from a ways involved in hypoxia (HIF3A) and mucosa-mediated systems-epigenomics perspective. Instead of performing clearance of lung airways (FOXJ1), demonstrates the single-CpG site association analysis, as is customary in ability of SEPIRA to identify early and potentially causal EWAS, we here aimed to derive a dynamic landscape of pathways in lung cancer development. As such, SEPIRA regulatory activity of transcription factors (TFs) in lung constitutes a novel approach which opens up the infer- carcinogenesis. Our rationale to focus on TFs is three- ence of TF binding activity to EWAS and cancer epige- fold. First, several recent studies have shown that inacti- nome studies. vation of tissue-specific TFs in cancer is under positive selection [24–26]. Blocks in differentiation, often medi- Results ated by inactivation of tissue-specific TFs is believed to Overall rationale and strategy be an early event which precedes uncontrolled cell We developed SEPIRA, a novel systems-epigenomics growth [27–29]. Second, cancer risk single nucleotide computational method that would allow us to estimate polymorphisms (SNP) often map to non-coding regula- TF binding activity in any given sample. Briefly, the tory regions, including enhancers, suggesting that the algorithm begins by constructing a tissue-specific TF risk effect may be mediated through disruption of TF regulatory network consisting of: (1) TFs that are sig- binding [30]. Third, DNAm patterns offer great promise nificantly more expressed in that tissue (compared to as a means of inferring tissue-specific TFs via TF binding other tissues); and (2) a list of high-quality downstream activity [31, 32]. gene targets (Fig. 1a). This network, as well as a In order to infer regulatory activity of TFs, we devised regression-based method to infer TF activity from this a novel algorithm called SEPIRA (Systems EPigenomics network, is then validated in independent datasets, con- Inference of Regulatory Activity), which aims to infer sisting of either gene expression or promoter DNAm sample-specific