Identification of Key Candidate Genes and Pathways in Hepatocellular Carcinoma by Integrated Bioinformatical Analysis

Total Page:16

File Type:pdf, Size:1020Kb

Identification of Key Candidate Genes and Pathways in Hepatocellular Carcinoma by Integrated Bioinformatical Analysis 4932 EXPERIMENTAL AND THERAPEUTIC MEDICINE 15: 4932-4942, 2018 Identification of key candidate genes and pathways in hepatocellular carcinoma by integrated bioinformatical analysis TONGHAI XING1, TINGMANG YAN2 and QIANG ZHOU1 Departments of 1General Surgery and 2Urology, Shanghai General Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai 200080, P.R. China Received October 30, 2017; Accepted March 13, 2018 DOI: 10.3892/etm.2018.6075 Abstract. Hepatocellular carcinoma (HCC) is one of the most high in China (2) on account of hepatitis B (HBV) and C (HCV) common malignant neoplasms worldwide, however the under- infection. However, in the western world, HCC is primarily due lying mechanisms and gene signatures of HCC are unknown. to non-alcoholic fatty liver disease (NAFLD) (3,4). The aggres- In the present study the profile datasets of four cohorts were sive nature of this malignancy combined with the limited integrated to elucidate the pathways and candidate genes therapeutic options result in poor prognosis (1,5). Unfortunately, of HCC. The expression profiles GSE25097, GSE45267, HCC has a high recurrence rate even with successful surgical GSE57957 and GSE62232 were downloaded from the Gene removal because of the insensitivity of HCC to chemotherapy Expression Omnibus database, including 436 HCC and 94 agents and radiotherapy (6). Although there have been exten- normal liver tissues. A total of 185 differentially expressed sive previous studies on the molecular mechanism of HCC genes (DEGs) were identified in HCC, including 92 upregu- formation and progression, the causes of HCC are not yet clear. lated genes and 92 downregulated genes. Gene ontology (GO) Hence, owing to the high morbidity and mortality in HCC, was performed, which revealed that the upregulated DEGs it is very important to determine the underlying molecular were primarily enriched in cell division, mitotic nuclear divi- mechanisms and pathological processes and identify suitable sion, mitotic cytokinesis and G1/S transition of the mitotic molecular biomarkers for early HCC diagnosis. cell cycle. Pathway enrichment was analyzed based on the While HCC is a highly heterogeneous tumor, with different Kyoto Encyclopedia of Genes and Genomes database to genetic compositions and multifaceted molecular pathogen- assess the functional relevance of DEGs. The most significant esis (7,8), it is also a slow process during which plenty of genomic module was selected from protein-protein interactions and alterations accumulate and change the hepatocellular pheno- 15 important hub genes were identified. The sub‑networks of type, leading to cellular intermediates and multiple monoclonal hub genes were involved in cell division, p53 signaling, and that evolve into HCC (9,10). Microarray technology has become T lymphotropic virus type I infection signaling pathways. In an indispensable tool to monitor genome wide expression levels conclusion, the present study revealed that the identified DEG of genes in a given organism. Gene Expression Omnibus (GEO) candidate genes may promote the understanding of the cause and The Cancer Genome Atlas (TCGA) are two public func- and molecular mechanisms underlying the development of tional genomics data repositories accepting data from diverse HCC and that these candidates and signal pathways may be microarray platforms (11). Nowadays, gene expression studies potential targets of clinical therapy for HCC. by microarray have been performed to uncover molecular variations in HCC (12-14). Pathway analysis has shown several Introduction important cellular signaling pathway alterations to be linked with the main pathogenic mechanism (8). A small number Hepatocellular carcinoma (HCC) is one of the most common of hub genes (proteins) were identified as key modulators in malignant neoplasms worldwide (1). Its incidence is particularly HCC metastasis by using protein-protein interaction (PPI) network. However, the most significantly dysregulated genes from previous studies are inconsistent because of the sample heterogeneity in independent studies, small sample size, and different statistical methods. Correspondence to: Dr Qiang Zhou, Department of General In the present study, four original microarray datasets Surgery, Shanghai General Hospital, School of Medicine, Shanghai Jiao Tong University, 100 Haining Road, Shanghai 200080, GSE25097, GSE45267, GSE57957, and GSE62232 P. R. Ch i na were downloaded from the NCBI-Gene Expression E-mail: [email protected] Omnibus database (NCBI-GEO) (available online: https://www.ncbi.nlm.nih.gov/geo/), and included 436 HCC and Key words: hepatocellular carcinoma, bioinformatical analysis, 94 normal liver tissues. The microarray data was normalized differentially expressed genes and preprocessed using the robust multi-array average algorithm (RMA). The differentially expressed genes (DEGs) were obtained using the limma package. Venny was applied to XING et al: IDENTIFICATION OF KEY GENES AND PATHWAYS IN HEPATOCELLULAR CARCINOMA 4933 filter the overlapping DEGs among groups. Gene ontology (GO) (available online: http://string-db.org) was employed to and pathway enrichment analysis were also performed for develop DEGs-encoded proteins and protein-protein interac- annotation and visualization with DAVID and KEGG PATHWAY tion network (PPI). Second, the Cytoscape software (17) was (Available online: http://www.genome.jp/kegg). We used the utilized to construct a protein interaction relationship network STRING online database (Available online: http://string-db.org) and analyze the interaction relationship of the candidate to annotate functional interactions of DEGs. The most significant DEGs encoding proteins in colon cancer. Third, the Network module was selected from PPI, and 15 important hub genes Analyzer plug-in was used to calculate the node degree, i.e., were identified. The sub‑networks of hub genes were involved the numbers of inter‑connections to filter the hub genes of PPI. in cell division, p53 signaling and HTLV-I infection signaling The corresponding proteins in the central nodes might be the pathways. core proteins and key candidate genes that have important physiological regulatory functions. The cBio Cancer Genomics Materials and methods Portal (http://cbioportal.org) is an open platform for exploring multidimensional cancer genomics data by encapsulating Microarray data information and data preprocessing. The molecular profiling data obtained from cancer tissue and cell microarray data from GSE25097, GSE45267, GSE57957, lines into readily understandable genetic, epigenetic, gene and GSE62232 were downloaded from the GEO database expression, and proteomic events. Complex cancer genomic (http://www.ncbi.nlm.nih.gov/geo/). The microarray data of profiles can be easily accessed using the query interface of GSE25097 was based on GPL10687 platforms (Rosetta/Merck the portal enabling researchers to explore and compare genetic Human RSTA Affymetrix 1.0 microarray, Custom CDF; alterations across samples. The underlying data thus obtained Affymetrix Inc., Santa Clara, CA, USA). The microarray data can be linked to clinical outcomes to facilitate novel discovery of GSE45267 was based on GPL570 platforms (HG-U133_ in biological systems. Through use of the portal search func- Plus_2, Affymetrix Human Genome U133 Plus 2.0 Array). tion, identified DEG candidate genes are classified as altered The microarray data of GSE57957 was based on GPL10558 or not altered. The genomics datasets are then presented using platforms (Illumina HumanHT-12 v4.0 expression beadchip). OncoPrint as heatmaps-a visually appealing display of altera- The microarray data of GSE62232 was based on GPL570 plat- tions in gene arrays across tumor samples. forms (HG-U133_Plus_2, Affymetrix Human Genome U133 Plus 2.0 Array). We chose these four datasets for integrated Results analysis, because they represented different racial popula- tions. The microarray data was preprocessed using the RMA Identification of DEGs in HCC. NCBI-GEO is a public func- with the Affy and limma package in Bioconductor (v1.46.1; tional genomics data repository accepting data from diverse http://www.bioconductor.org/). Background correction, microarray platforms, from which gene expression profiles of normalization, and calculation of expression were all included HCC and normal or adjacent liver tissue from the GSE25097, in the preprocessing stage. The microarray data probe was GSE45267, GSE57957, and GSE62232 databases were transformed to gene symbols with Bioconductor Annotation obtained. The microarray data of GSE25097 had 268 HCC Data software packages. If several probes were mapped to one tissues and 6 normal liver tissues (18); the GSE62232 data had gene symbol, the mean value was set as the final expression 81 HCC tissues and 10 normal liver tissues (12); the GSE45267 value of this gene. The DEGs in every group were analyzed data included 48 HCC tissues and 39 healthy liver tissues; and using the limma package. In Bioconductor. The DEG P-values the GSE57957 data had 39 pairs of HCC tissues and matched were calculated using the unpaired or paired Student's t-test paraneoplastic tissue (19). Using P<0.01 and [logFC]>1 as provided by the limma package. P<0.01 and [logFC]>1 were cut-off criterion, we extracted 1072, 1850, 4055, and 1876 DEGs used as cut-off criterion for DEGs. Hierarchical clustering from the expression profile datasets GSE25097, GSE45267, analysis of the DEGs was then performed and visualized GSE57957, and GSE62232, respectively.
Recommended publications
  • Developing Transgenic Models to Induce Late-Life Mortality in the Malaria Vector Anopheles Gambiae
    Developing transgenic models to induce late-life mortality in the malaria vector Anopheles gambiae Silke Fuchs A Thesis submitted in fulfilment of requirements for the degree of Doctor of Philosophy of Imperial College London Imperial College London Natural Sciences Imperial College Road, South Kensington London, SW7 2AZ 1 Declaration of Originality I declare that the intellectual content of this thesis is the product of my own research work. Any ideas or quotations from the work of other people are fully acknowledged in accordance with the standard referencing practices. 2 Abstract A key factor that limits the transmission of malaria is the age of its vector, Anopheles gambiae. This is due to the long extrinsic incubation period (EIP) of the parasite which can last a high proportion of the adult mosquito‟s life. Therefore, only a small percentage of a mosquito population is able to transmit malaria. This bottleneck could be of use in developing novel vector control strategies which are based on artificially shortening the adult life span of mosquitoes through the introduction of suitably tailored transgenic constructs. One strategy was to induce a toxic late-acting amino acid metabolism disease in mosquitoes that kills females after their amino acid rich blood meal. Phenylpyruvate, the toxic accumulation product in human Phenylketonuria disease, was found to cause increased death when fed to mosquitoes. GC-MS analysis showed that similar to PKU patients, phenylpyruvate had been converted to another toxic metabolite phenyllactate. Using RNAi, two enzymes PAH and PPO9 of the phenylalanine pathway were knocked down and their effect on survival, behaviour and melanisation immune response was investigated.
    [Show full text]
  • Genetic Factors Regulating Expression of Dopaminergic Genes
    Genetic Factors Regulating Expression of Dopaminergic Genes DISSERTATION Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate School of The Ohio State University By Elizabeth Stofko Barrie Graduate Program in Integrated Biomedical Science Program The Ohio State University 2014 Dissertation Committee: Professor Wolfgang Sadee, Advisor Professor R. Thomas Boyd Professor Howard Gu Professor Amanda Toland Copyright by Elizabeth Stofko Barrie 2014 Abstract Genetic differences are one of the main contributors to diversity in clinical phenotypes. This project identifies and defines genetic factors affecting dopamine dysregulation focusing on three genes in dopamine signaling: DBH, COMT and SNCA. To study this I measure allelic expression, selecting RNA samples from post-mortem human tissue that are heterozygous for a marker SNP, and quantitate the expression of each allele. Dopamine β-hydroxylase (DBH) encodes an enzyme which converts dopamine to norepinephrine. A promoter SNP, rs1611115 has been associated with low DBH and high dopamine plasma levels; however, underlying mechanisms remain uncertain. I found a tissue-specific effect of rs1611115 in liver, with up to 11 fold differences in allelic and overall mRNA expression (p<0.0004 to 2x10-7 and p<0.0001 respectively), indicating decreased transcription. Interestingly, locus coeruleus and adrenal gland, the main sources of DBH in the body, did not demonstrate this robust effect; only small AEI ratios were detected in these tissues. More frequent than rs1611115 and in linkage disequilibrium with it, a second SNP, rs1108580 was associated with reduced allelic mRNA expression in all tissues tested. This dual mechanism accounts for the previously described genotype effect on DBH plasma levels, with a novel role for liver as an important source of variability in DBH levels.
    [Show full text]
  • Signatures of Reproductive Isolation in Patterns of Single Nucleotide Diversity Across
    Genetics: Published Articles Ahead of Print, published on September 2, 2005 as 10.1534/genetics.105.046193 Signatures of reproductive isolation in patterns of single nucleotide diversity across inbred strains of mice Bret A. Payseur1 and Hopi E. Hoekstra2 1Department of Medical Genetics University of Wisconsin Madison, WI 53706 Email: [email protected] 2Section of Ecology, Behavior and Evolution Division of Biological Sciences University of California, San Diego La Jolla, CA 92093 email: [email protected] Corresponding author: Bret A. Payseur Department of Medical Genetics University of Wisconsin Madison, WI 53706 Fax: 608-262-2976 Email: [email protected] Running head: Reproductive isolation in Mus ABSTRACT Reproductive isolation is often caused by the disruption of genic interactions that evolve in geographically separate populations. Identifying the genomic regions and genes involved in these interactions, known as “Dobzhansky-Muller incompatibilities”, can be challenging, but is facilitated by the wealth of genetic markers now available in model systems. In recent years, the complete genome sequence and thousands of single nucleotide polymorphisms (SNPs) from laboratory mice, which are largely genetic hybrids between Mus musculus and Mus domesticus, have become available. Here, we use these resources to locate genomic regions that may underlie reproductive isolation between these two species. Using genotypes from 332 SNPs that differ between wild- derived strains of M. musculus and M. domesticus, we identified several physically unlinked SNP pairs that show exceptional gametic disequilibrium across the lab strains. Conspecific alleles were associated in a disproportionate number of these cases, consistent with the action of natural selection against hybrid gene combinations. As predicted by the Dobzhansky-Muller model, this bias was differentially attributable to locus pairs for which one hybrid genotype was missing.
    [Show full text]
  • Neonatal Growth Measurements, Umbilical Concentrations, And
    Clemson University TigerPrints All Theses Theses 12-2017 Neonatal Growth Measurements, Umbilical Concentrations, and Differential Gene Expression in the Placenta of Swine in Relation to Placental Efficiency Shanice Katelin Krombeen Clemson University Follow this and additional works at: https://tigerprints.clemson.edu/all_theses Recommended Citation Krombeen, Shanice Katelin, "Neonatal Growth Measurements, Umbilical Concentrations, and Differential Gene Expression in the Placenta of Swine in Relation to Placental Efficiency" (2017). All Theses. 2789. https://tigerprints.clemson.edu/all_theses/2789 This Thesis is brought to you for free and open access by the Theses at TigerPrints. It has been accepted for inclusion in All Theses by an authorized administrator of TigerPrints. For more information, please contact [email protected]. NEONATAL GROWTH MEASUREMENTS, UMBILICAL CONCENTRATIONS, AND DIFFERENTIAL GENE EXPRESSION IN THE PLACENTA OF SWINE IN RELATION TO PLACENTAL EFFICIENCY A Thesis Presented to the Graduate School of Clemson University In Partial Fulfillment of the Requirements for the Degree Master of Science Animal and Veterinary Science by Shanice Katelin Krombeen December 2017 Accepted by: Dr. Tiffany A. Wilmoth, Committee Chair Dr. Julia L. Sharp Dr. Christopher A. Saski i ABSTRACT Placental efficiency (PE) describes the relationship between placental and fetal weights and is defined as fetal weight divided by placental weight. Within pig litters, PE can vary drastically, resulting in similarly sized pigs associated with very different placenta, up to a 50% weight difference. However, the means enabling the smaller placenta to grow a similarly sized littermate is unknown. The main objectives of this study were to (1) determine the concentration of glucose and cortisol in venous umbilical blood of pigs at birth and evaluate if a relationship existed between concentration and PE, and (2) determine the expression level of genes in placental and associated endometrial tissues of high PE and low PE feto-placental units.
    [Show full text]
  • Homo Sapiens, Homo Neanderthalensis and the Denisova Specimen: New Insights on Their Evolutionary Histories Using Whole-Genome Comparisons
    Genetics and Molecular Biology, 35, 4 (suppl), 904-911 (2012) Copyright © 2012, Sociedade Brasileira de Genética. Printed in Brazil www.sbg.org.br Research Article Homo sapiens, Homo neanderthalensis and the Denisova specimen: New insights on their evolutionary histories using whole-genome comparisons Vanessa Rodrigues Paixão-Côrtes, Lucas Henrique Viscardi, Francisco Mauro Salzano, Tábita Hünemeier and Maria Cátira Bortolini Departamento de Genética, Instituto de Biociências, Universidade Federal do Rio Grande do Sul, Porto Alegre, RS, Brazil. Abstract After a brief review of the most recent findings in the study of human evolution, an extensive comparison of the com- plete genomes of our nearest relative, the chimpanzee (Pan troglodytes), of extant Homo sapiens, archaic Homo neanderthalensis and the Denisova specimen were made. The focus was on non-synonymous mutations, which consequently had an impact on protein levels and these changes were classified according to degree of effect. A to- tal of 10,447 non-synonymous substitutions were found in which the derived allele is fixed or nearly fixed in humans as compared to chimpanzee. Their most frequent location was on chromosome 21. Their presence was then searched in the two archaic genomes. Mutations in 381 genes would imply radical amino acid changes, with a frac- tion of these related to olfaction and other important physiological processes. Eight new alleles were identified in the Neanderthal and/or Denisova genetic pools. Four others, possibly affecting cognition, occured both in the sapiens and two other archaic genomes. The selective sweep that gave rise to Homo sapiens could, therefore, have initiated before the modern/archaic human divergence.
    [Show full text]
  • Ubiquitination/Proteasome Pathways Specialized in Cellular Stress
    The Journal of Immunology The Proto-MHC of Placozoans, a Region Specialized in Cellular Stress and Ubiquitination/Proteasome Pathways Jaanus Suurva¨li,* Luc Jouneau,† Dominique The´pot,‡ Simona Grusea,x Pierre Pontarotti,{ Louis Du Pasquier,‖ Sirje Ru€utel€ Boudinot,* and Pierre Boudinot† The MHC is a large genetic region controlling Ag processing and recognition by T lymphocytes in vertebrates. Approximately 40% of its genes are implicated in innate or adaptive immunity. A putative proto-MHC exists in the chordate amphioxus and in the fruit fly, indicating that a core MHC region predated the emergence of the adaptive immune system in vertebrates. In this study, we identify a putative proto-MHC with archetypal markers in the most basal branch of Metazoans—the placozoan Trichoplax adhaerens, indicating that the proto-MHC is much older than previously believed—and present in the common ancestor of bilaterians (contains vertebrates) and placozoans. Our evidence for a T. adhaerens proto-MHC was based on macrosynteny and phylogenetic analyses revealing approximately one third of the multiple marker sets within the human MHC-related paralogy groups have unique counterparts in T. adhaerens, consistent with two successive whole genome duplications during early verte- brate evolution. A genetic ontologic analysis of the proto-MHC markers in T. adhaerens was consistent with its involvement in defense, showing proteins implicated in antiviral immunity, stress response, and ubiquitination/proteasome pathway. Proteasome genes psma, psmb, and psmd are present, whereas the typical markers of adaptive immunity, such as MHC class I and II, are absent. Our results suggest that the proto-MHC was involved in intracellular intrinsic immunity and provide insight into the primordial architecture and functional landscape of this region that later in evolution became associated with numerous genes critical for adaptive immunity in vertebrates.
    [Show full text]
  • Role of Cecr2 and Candidate Modifier Genes of Cecr2 in the Neural Tube Defect Exencephaly
    Role of Cecr2 and Candidate Modifier Genes of Cecr2 in the Neural Tube Defect Exencephaly by Renee Yvonne Marie Leduc A thesis submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Molecular Biology and Genetics Department of Biological Sciences University of Alberta © Renee Yvonne Marie Leduc, 2015 Abstract Neurulation, the early developmental process that forms the rudiment of the brain and spinal cord, relies upon the intricate interplay of hundreds of genes in multiple genetic pathways within the appropriate environmental conditions. Neural tube defects (NTDs), common congenital birth defects in humans, arise when the process of neurulation is disrupted. Human NTDs are multifactorial disorders, which means that combinations of many genetic, epigenetic, and environmental factors interact in order for disease to manifest. Our mouse model of NTDs develops the perinatal lethal cranial NTD exencephaly (the murine equivalent of anencephaly in humans) when homozygous for a loss-of-function mutation in the ATP-dependent chromatin remodeling gene Cecr2. Much like in humans, manifestation of the exencephaly phenotype in Cecr2 mutant mice is dependent on multiple factors. Work in this thesis focused on identifying and characterizing the multifactorial nature of exencephaly in both Cecr2 mutant mice and in humans. A previously established incomplete penetrance of exencephaly in Cecr2 homozygous mutant mouse embryos is indicative of genetic and/or environmental changes that are contributing resistance to exencephaly. An updated penetrance analysis that I performed revealed a reduction in exencephaly penetrance, which demonstrated that genetic and/or environmental factors are changing over time. Previous work in our lab has shown that the penetrance of exencephaly in Cecr2 mutant mice is dependent on mouse strain, as Cecr2 mutant BALB/cCrl mice are susceptible to developing exencephaly but Cecr2 mutant FVB/N mice are resistant.
    [Show full text]