Integrative Network-Based Approach Identifies Key Genetic Elements In
Total Page:16
File Type:pdf, Size:1020Kb
Hamed et al. BMC Genomics 2015, 16(Suppl 5):S2 www.biomedcentral.com/1471-2164/16/S5/S2 RESEARCH Open Access Integrative network-based approach identifies key genetic elements in breast invasive carcinoma Mohamed Hamed, Christian Spaniol, Alexander Zapp, Volkhard Helms* From X-meeting 2014 - International Conference on the Brazilian Association for Bioinformatics and Computational Biology Belo Horizonte, Brazil. 28-30 October 2014 Abstract Background: Breast cancer is a genetically heterogeneous type of cancer that belongs to the most prevalent types with a high mortality rate. Treatment and prognosis of breast cancer would profit largely from a correct classification and identification of genetic key drivers and major determinants driving the tumorigenesis process. In the light of the availability of tumor genomic and epigenomic data from different sources and experiments, new integrative approaches are needed to boost the probability of identifying such genetic key drivers. We present here an integrative network-based approach that is able to associate regulatory network interactions with the development of breast carcinoma by integrating information from gene expression, DNA methylation, miRNA expression, and somatic mutation datasets. Results: Our results showed strong association between regulatory elements from different data sources in terms of the mutual regulatory influence and genomic proximity. By analyzing different types of regulatory interactions, TF-gene, miRNA-mRNA, and proximity analysis of somatic variants, we identified 106 genes, 68 miRNAs, and 9 mutations that are candidate drivers of oncogenic processes in breast cancer. Moreover, we unraveled regulatory interactions among these key drivers and the other elements in the breast cancer network. Intriguingly, about one third of the identified driver genes are targeted by known anti-cancer drugs and the majority of the identified key miRNAs are implicated in cancerogenesis of multiple organs. Also, the identified driver mutations likely cause damaging effects on protein functions. The constructed gene network and the identified key drivers were compared to well-established network-based methods. Conclusion: The integrated molecular analysis enabled by the presented network-based approach substantially expands our knowledge base of prospective genomic drivers of genes, miRNAs, and mutations. For a good part of the identified key drivers there exists solid evidence for involvement in the development of breast carcinomas. Our approach also unraveled the complex regulatory interactions comprising the identified key drivers. These genomic drivers could be further investigated in the wet lab as potential candidates for new drug targets. This integrative approach can be applied in a similar fashion to other cancer types, complex diseases, or for studying cellular differentiation processes. * Correspondence: [email protected] Center for Bioinformatics, Saarland University, 66041 Saarbrucken, Germany © 2015 Hamed et al.; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http:// creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Hamed et al. BMC Genomics 2015, 16(Suppl 5):S2 Page 2 of 14 www.biomedcentral.com/1471-2164/16/S5/S2 Background based on whole-genome gene expression profiling, DNA Breast cancer is one of the most common and predomi- methylome, miRNA expression, and genomic mutations nant cancer types that affects millions of cases and of breast cancer samples from the TCGA portal [30]. causes thousands of deaths every year [1,2]. With an Based on this, we constructed a gene regulatory network individual probability of 12% to develop breast cancer, it that conforms to the conditions of such biological data is the most frequently diagnosed cancer type among and we identified network modules of dysregulated women and accounts for the second-highest number of genes. Each module turned out to have distinct func- fatalities (15%) of female cancer patients besides lung tional categories, cellular pathways, as well as oncogene cancer [3]. Due to its complexity and heterogeneity [4], and tumor suppressor specificity. We also extracted the molecular mechanism and regulatory patterns breast cancer specific subnetworks from the human gen- underlying breast carcinoma have not been completely ome regulatory interactome induced by the dysregulated unraveled so far. miRNAs and the dysregulated mRNAs. Furthermore, we Treatment and prognosis of cancer development relies demonstrated a strong association between the different largely on a correct classification of the histological genetic molecules in terms of the interchangeable regu- grade and identification of the major determinants driv- latory effect and genomic proximity. Then, we identified ing the tumorigenesis process. To better address this, putative genetic key drivers/determinants from genes, many studies have attempted to build predictive models miRNAs, and somatic mutations that could possibly by analyzing and integrating heterogeneous data sources. drive the oncogenic processes in breast cancer. Our For example, Cava et al. presented an effective discrimi- findings are strongly supported by independent evi- nation of breast cancer types based on a support vector dences. For instance, the protein products of about one machine classifier combining copy number variations, third of the identified driver genes are known binding SNP data, and the expression values of miRNAs, and targets of anti-breast cancer drugs, and most of the mRNAs [5]. Also, miRNA-mRNA interactions were identified key miRNAs are implicated in cancerogenesis combined with transcription factor (TF)-gene interac- of multiple organs. Moreover, all the identified driver tions to unravel the combinatorial molecular regulations mutations are predicted to cause damaging effects on that facilitate progression of colorectal and breast cancer structures and functions of the related proteins. The [6,7]. Along the same lines, the integration of gene rest of the identified driver molecules represent novel expression data with protein interaction networks into potential candidates for new drug targets and further integrated weighted networks has already proven fruitful experimental research is warranted to confirm these in a variety of applications within cancer genomics findings. [8-23]. In general, the combination of microarray studies with mathematical models such as network theory Methods allows to define relationships between genes and to dis- Datasets and pre-processing cover interacting networks and pathways, improving the Data on gene expression, DNA methylation, miRNA understanding of complex diseases [24]. expression, and somatic mutations for normal and In recent years, novel network-based approaches have breast invasive carcinoma samples were collected during been introduced to improve the understanding of com- May 2014 from The Cancer Genome Atlas (TCGA) plex human diseases from multiple perspectives. For [1,30] data portal. All datasets were obtained in level instance, differential network analysis attempts to better three (log2 transformed and normalized) except the characterize disease phenotypes under two different somatic mutations (level two). For consistency, we only conditions by studying the changes in the related net- considered samples that were common between all four work interaction patterns [8,9,17,18,25-29]. In cancer datasets. This yielded in total 151 samples consisting of genomics, the differential network approach was able to 131 tumor samples and 20 normal samples (Additional identify essential gene modules that lead to crucial novel file S1). For both gene expression and methylation data- biological insights and significant implications for sets, all probes containing NA values or that were anno- understanding tumorigenesis [9,17,18]. tated to unknown or multiple genes were removed. In the light of the recent availability of tumor genomic Also, probes values were merged by computing the data and the complexity of the related high throughput mean of all probes related to single genes within a single analysis, new integrative approaches are needed to boost sample as previously described in [31]. the probability of successfully identifying the associated From the DNA methylation data, we kept only those genetic key drivers, the causal regulators, the related probes representing CpG sites in the promoter regions mutations, biomarkers, and their molecular interactions of genes. For this, we used the transcription start sites that potentially drive tumorigenesis. To this end, this (TSS) for all human genes from the Eukaryotic Promo- study presents an integrative network-based approach ter Database EPD [32]. Promoter regions were defined Hamed et al. BMC Genomics 2015, 16(Suppl 5):S2 Page 3 of 14 www.biomedcentral.com/1471-2164/16/S5/S2 as an interval of ±2 kb around the TSS as described in inferred at least twice in the three runs (edge confidence [33].ThenweselectedonlythoseCpGsiteswhose level ≥ 66.6%). genomic coordinates are contained in that interval. The As candidate set of directed interactions, we consid- final sizes of the