Systematic Tracking of Disrupted Modules Identifies Significant Genes and Pathways in Hepatocellular Carcinoma
Total Page:16
File Type:pdf, Size:1020Kb
ONCOLOGY LETTERS 12: 3285-3295, 2016 Systematic tracking of disrupted modules identifies significant genes and pathways in hepatocellular carcinoma MENG-HUI ZHANG1, QIN-HAI SHEN2, ZHAO-MIN QIN3, QIAO-LING WANG4 and XI CHEN5 1Department of General Surgery, The Fourth Hospital of Jinan, Jinan, Shandong 250031; Departments of 2Medicine and 3Nursing, Shandong Medical College, Jinan, Shandong 250002; 4Department of Ophthalmology, The Second Hospital of Jinan, Jinan, Shandong 250022; 5Department of Ophthalmology, The Ninth Hospital of Chongqing, Chongqing 400700, P.R. China Received May 6, 2015; Accepted July 12, 2016 DOI: 10.3892/ol.2016.5039 Abstract. The objective of the present study is to identify Introduction significant genes and pathways associated with hepatocellular carcinoma (HCC) by systematically tracking the dysregulated Hepatocellular carcinoma (HCC) is the fifth most modules of re‑weighted protein-protein interaction (PPI) common cancer worldwide and the third leading cause of networks. Firstly, normal and HCC PPI networks were inferred cancer‑associated mortality (1), which explains the importance and re‑weighted based on Pearson correlation coefficient. Next, of identifying novel early diagnostic markers and therapeutic modules in the PPI networks were explored by a clique-merging targets (2). HCC primarily develops from cirrhosis caused by algorithm, and disrupted modules were identified utilizing a chronic infection with hepatitis B virus or hepatitis C virus maximum weight bipartite matching in non-increasing order. (HCV), alcoholic injury, and, to a lesser extent, from geneti- Then, the gene compositions of the disrupted modules were cally determined disorders (3). However, the heterogeneity of studied and compared with differentially expressed (DE) HCC presents unique challenges in identifying biomarkers genes, and pathway enrichment analysis for these genes was and exploring molecular pathogenesis in this disease (4). performed based on Expression Analysis Systematic Explorer. Identifying genes that are differentially expressed (DE), Finally, validations of significant genes in HCC were conducted exhibit similar expression profiles to known disease genes, using reverse transcription‑quantitative polymerase chain are ‘central’ or ‘reachable’ in disease molecular networks, or reaction (RT‑qPCR) analysis. The present study evaluated display disease associations according to the literature is the 394 disrupted module pairs, which comprised 236 dysregu- main method to evaluate biomarkers (5). In addition, a crucial lated genes. When the dysregulated genes were compared distinguishing factor of cancer genes is their involvement in with 211 DE genes, a total of 26 common genes [including core mechanisms responsible for genome stability and cell phospholipase C beta 1, cytochrome P450 (CYP) 2C8 and proliferation (such as DNA damage repair and cell cycle), CYP2B6] were obtained. Furthermore, 6 of these 26 common and the fact that these genes function as highly synergetic or genes were validated by RT‑qPCR. Pathway enrichment coordinated groups (6). Therefore, critical to implicating genes analysis of dysregulated genes demonstrated that neuroactive in cancer is the identification of core modules, including path- ligand-receptor interaction, purine and drug metabolism, and ways and complexes, that are dysregulated in cancer. metabolism of xenobiotics mediated by CYP were signifi- Beyond straightforward scoring genes in a gene regulatory cantly disrupted pathways. In conclusion, the present study network, it is crucial to study the behavior of modules across greatly improved the understanding of HCC in a systematic specific conditions in a controlled manner in order to under- manner and provided potential biomarkers for early detection stand the disease mechanisms and to identify biomarkers (7). and novel therapeutic methods. For example, Zhang et al (8) have identified tightly connected gene co-expression sub-networks across 30 cancer networks in various cell lines, and have tracked aberrant modules as frequent sub-networks appearing across these cancers. However, studying multiple cancers simultaneously makes it challenging to discern clearly the intricate underlying mecha- Correspondence to: Dr Qin-Hai Shen, Department of Medicine, nisms. Shandong Medical College, 5460 South Second Ring Road, Jinan, Furthermore, it is important to effectively integrate omics Shandong 250002, P.R. China E-mail: [email protected] data into such an analysis. For example, Magger et al (9) combined protein-protein interaction (PPI) and gene expression Key words: hepatocellular carcinoma, modules, protein-protein data to construct tissue-specific PPI networks for 60 tissues, interaction network, dysregulated gene, pathway, reverse and used them to prioritize disease genes. A few significant transcription-quantitative polymerase chain reaction genes may not be identifiable through their own behavior, but their changes are quantifiable when considered in conjunction with other genes (which is known as modules) (10). Therefore, 3286 ZHANG et al: IDENTIFICATION OF GENES AND PATHWAYS IN HEPATOCELLULAR CARCINOMA 2 a systematic tracking of gene and module behavior across , where a=s-µ-σ α and b=σ. Of note, Ø and Φ were the standard specific conditions in a controlled manner is required. Besides, normal distribution density and distribution function, respec- since a number of human genes have not yet been assigned to tively. Mismatch (MM) probe intensities were corrected by definitive pathways, scoring pathways based on module anal- 'mas'. ysis has become a more reliable analyzing approach compared Normalization was performed through a quantiles‑based 1 to individual gene analysis. algorithm (15). Specifically, the transformation x'i=F- [G(xi)] Therefore, the present study systematically tracked the was used, where G was estimated by the empirical disrupted modules of re‑weighted PPI networks to identify distribution of each array and F was estimated using the significant DE genes and pathways between normal controls empirical distribution of the averaged sample quantiles. and HCC patients, in order to reveal potential biomarkers for Using the ‘mas’ method to conduct PM/MM correction (16), HCC. To achieve this, normal and HCC PPI networks were an ideal mismatch was subtracted from PM. The ideal MM firstly inferred based on Pearson correlation coefficient (PCC). would always be less than the corresponding PM, and thus, Next the modules in the PPI networks were explored based on a it could be safely subtracted without the risk of achieving clique-merging algorithm, and disrupted modules were identi- negative values. fied by matching normal and HCC modules. Subsequently, the The summarization method was ‘medianpolish’ (14). A gene compositions of the disrupted modules were studied and multichip linear model was fit to the data from each probe set. compared with DE genes, and pathway enrichment analysis In particular, for a probe set k with I=1, …, Ik probes and data was performed for these genes. Finally, dysregulated genes of from j=1, …, J arrays, the following model was fitted: HCC were validated utilizing reverse transcription‑quantitative polymerase chain reaction (RT‑qPCR) analysis. Materials and methods where αi was a probe effect and βj was the log2 expression value. Inferring normal and HCC PPI networks Next, the data were screened by the feature filter method of Human PPI network construction. A dataset of literature‑curated the genefilter package version 1.54.2 (bioconductor.org/pack- human PPIs from the Search Tool for the Retrieval of Inter- ages/release/bioc/html/genefilter.html). The gene expression acting Genes/Proteins (STRING; string-db.org/), comprising value for each gene was obtained, and the number of genes 16,730 genes and 1,048,576 interactions, was utilized (11). For with multiple probes was determined to be 12,493. STRING analysis, self-loops and proteins without expression value were removed. The remaining largest connected compo- Re‑weighting gene interactions by PCC. In the present study, nent with score >0.75 was kept as the selected PPI network, PCC was selected to re‑weight gene interactions in HCC which consisting of 9,273 genes and 58,617 interactions. and normal networks. PCC was a measure of the correlation between two variables, assigning a value between -1 and +1 Gene expression dataset and dataset preprocess. The inclusive, and evaluated the probability of two co-expressed microarray expression profiles of -E GEOD-14520 (12,13) gene pairs (17). The PCC of a pair of genes (X and Y), which from the ArrayExpress database (www.ebi.ac.uk/arrayex- encoded the corresponding paired proteins (u and v) inter- press/) were selected for the study. In E-GEOD-14520, there acting in the PPI network, was defined as: were a total of 488 samples, which were processed on two platformed. To eliminate the batch effects, only samples processed on the GeneChip® Human Genome U133A 2.0 Array (Affymetrix, Inc., Santa Clara, CA, USA), were recruited in the present study, which consisted of 123 samples. where s was the number of samples of the gene expression The gene expression profiles were preprocessed with stan- data; g(X,i) or g(Y,i) was the expression level of the gene X dard methods, including background correction via ‘robust or Y in the sample i under a specific condition; ḡ(X) or ḡ(Y) multiarray average (rma)’ (14), ‘quantiles’ (15), ‘mas’ (16) and represented the mean expression level of gene X or Y, and s(X) ‘medianpolish’