Bicluster-Based Identification of Gene Sets Through Multivariate Meta-Analysis (MVMA)

Total Page:16

File Type:pdf, Size:1020Kb

Bicluster-Based Identification of Gene Sets Through Multivariate Meta-Analysis (MVMA) Bicluster-Based Identification of Gene Sets Through Multivariate Meta-Analysis (MVMA) Tim Wu A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy University of Washington 2018 Reading Committee: Peter Tarczy-Hornoch, Chair Roger E. Bumgarner Shuai Huang Program Authorized to Offer Degree: Biomedical and Health Informatics © Copyright 2018 Tim Wu University of Washington Abstract Bicluster-Based Identification of Gene Sets Through Multivariate Meta-Analysis (MVMA) Tim Wu Chair of the Supervisory Committee: Peter Tarczy-Hornoch, MD, FACMI Department of Biomedical Informatics and Medical Education Omics technologies are among the most exciting developments in biology and medicine in recent decades. They offer a whole new way of investigating a sample or a patient by taking comprehensive molecular-level snapshots. These snapshots, in the form of massive amount of data, provide important hints about the pathophysiological state of the target. Despite the promises of the omics technologies, their usefulness hinges upon proper translation of the data into knowledge. This dissertation is focused on mining of public gene expression data to discover gene sets that may be parts of biological pathways. It tries to answer these two overall questions: (1) what is the data mining method best suited for finding gene sets? (2) how best to utilize multiple datasets in order to increase statistical strength? Biclustering has been proven to be highly effective for identifying gene sets. Compared to traditional clustering methods, biclustering recognizes a list of genes that are up- or down- regulated under a subset of the conditions, as opposed to the whole spectrum of the conditions. A large number of biclustering algorithms have been applied to analysis of gene expression data. Condition-dependent Correlation Subgroups (CCS), as one of these algorithms, is chosen for the current study. Identifying individual biclusters using CCS is the task of Aim 1. Most public expression datasets have relatively small sample sizes. Making inference on these datasets may be error prone, which motivates the use of multiple datasets to increase the statistical power. This study makes use of multiple related datasets by adapting the approach of meta-analysis. More specifically, a group of biclusters, each coming from a separate dataset, are identified. Meta-analysis is then applied to these biclusters. Hence, the biclusters are analogous to the individual studies in a traditional meta-analysis. The goal is to identify a gene set, through combining the evidence in the individual biclusters. Since each gene in this group of bicluster is modeled as an endpoint (equivalent to outcome in traditional meta-analysis context), and the correlations among the endpoints are taken into consideration, the approach of multivariate meta-analysis (MVMA) is taken. Using MVMA to combine biclusters from separate datasets is the focus of Aim 2. Despite the fact that biclustering has significantly reduced the dimension, analyzing the stack still faces the difficulty of high dimensionality (p) and small number of available datasets (n), which is the well-known p >> n problem. The traditional MVMA methods, either within the Bayesian or the Frequentist framework, are not effective when p is over 50. Since a typical bicluster stack has a dimension in the range of 70 - 150, it renders the traditional methods impractical in the current context. A previous study by Jackson and Riley [1] proposed an interesting two-step procedure for MVMA to tackle the issue of data scarcity. It involves estimation of the between-study covariance matrix as the step 1, following by making inference about the overall effect sizes as the step 2. In step 2, multivariate t rather than normal distribution is used in order to take the uncertainty of the between-study variance estimate into account. Jackson’s method is implemented and tested in the current study. Unfortunately, it is found to be still slow for moderate or high dimensions, mainly because of method of moments (MM) used in step 1. To overcome this constraint, an alternative step 1 method is proposed, which involves using weighted sample covariance matrix, subject to matrix regularization, to approximate the between-study variance/covariance. A series of simulation studies have shown that the improved two-step procedure performs favorably compared to the traditional MVMA methods as well as Jackson’s original routine. Given these results, the new two-step procedure is applied to analysis of real bicluster stacks, which leads to a series of candidate gene sets. The candidate gene sets are then analyzed in Aim 3 by enrichment-based analyses using public pathway knowledge bases. The specific methods used include Over Representation Analysis (ORA), Gene Set Enrichment Analysis (GSEA), and Network Topology-based Analysis (NTA). A key finding is that high-certainty effect size estimates derived from MVMA are often associated with significant enrichment results from the pathway analysis, especially when the size of bicluster stack is big enough. In other words, effect size estimates are predictive of the biological relevance of the gene sets, which is perhaps the most significant result of the current study. Table of Contents Chapter 1 Executive Summary ....................................................................................................... 1 1.1 Overview .......................................................................................................................... 1 1.2 Motivation for this dissertation ........................................................................................ 3 1.3 Research aims and questions ............................................................................................ 4 1.4 Outline of this dissertation ............................................................................................... 6 1.5 Contributions .................................................................................................................... 8 Chapter 2 Background and significance ....................................................................................... 10 2.1 Overview of high-throughput omics technologies ......................................................... 10 2.2 Gene expression analysis and its applications................................................................ 12 2.3 Pathways and gene sets as functional units .................................................................... 13 2.4 Challenges in analysis of public expression data ........................................................... 15 2.5 Commonly used data mining methods and their limitations .......................................... 17 2. 5.1 K-means clustering ..................................................................................................... 17 2.5.2 Hierarchical clustering ................................................................................................ 19 2.5.3 Self-organizing feature map (SOFM) ......................................................................... 21 2.5.4 Limitations of the traditional clustering methods ....................................................... 22 2.6 Biclustering .................................................................................................................... 23 2.6.1 Overview of biclustering ............................................................................................ 23 2.6.2 Diversity of biclustering algorithms ........................................................................... 24 2.6.3 Summary ..................................................................................................................... 28 2.7 Approaches to combining biclusters .............................................................................. 31 2.7.1 Current approaches for utilizing multiple datasets ..................................................... 31 2.7.2 Overview of meta-analysis ......................................................................................... 34 2.7.3 Fixed-effect versus random-effects models ................................................................ 36 2.7.4 Multivariate meta-analysis (MVMA) ......................................................................... 37 2. 7.5 Literature review on the use of meta-analysis for gene expression data ........................ 40 2.8 Pathway analysis of gene sets ........................................................................................ 41 2.10 Overall strategy .............................................................................................................. 44 Chapter 3 Biclustering and stacking of biclusters......................................................................... 46 3.1 Background, motivations, and strategy .......................................................................... 47 3.2 Identification of individual biclusters ............................................................................ 48 3.2.1 Pre-processing and normalization of microarray data ................................................ 48 3.2.2 Selection of biclustering algorithm............................................................................. 49 3.2.3 Parameter tuning and performance of CCS ................................................................ 51 3.2.4 Results on individual biclusters .................................................................................
Recommended publications
  • Whole-Genome Microarray Detects Deletions and Loss of Heterozygosity of Chromosome 3 Occurring Exclusively in Metastasizing Uveal Melanoma
    Anatomy and Pathology Whole-Genome Microarray Detects Deletions and Loss of Heterozygosity of Chromosome 3 Occurring Exclusively in Metastasizing Uveal Melanoma Sarah L. Lake,1 Sarah E. Coupland,1 Azzam F. G. Taktak,2 and Bertil E. Damato3 PURPOSE. To detect deletions and loss of heterozygosity of disease is fatal in 92% of patients within 2 years of diagnosis. chromosome 3 in a rare subset of fatal, disomy 3 uveal mela- Clinical and histopathologic risk factors for UM metastasis noma (UM), undetectable by fluorescence in situ hybridization include large basal tumor diameter (LBD), ciliary body involve- (FISH). ment, epithelioid cytomorphology, extracellular matrix peri- ϩ ETHODS odic acid-Schiff-positive (PAS ) loops, and high mitotic M . Multiplex ligation-dependent probe amplification 3,4 5 (MLPA) with the P027 UM assay was performed on formalin- count. Prescher et al. showed that a nonrandom genetic fixed, paraffin-embedded (FFPE) whole tumor sections from 19 change, monosomy 3, correlates strongly with metastatic death, and the correlation has since been confirmed by several disomy 3 metastasizing UMs. Whole-genome microarray analy- 3,6–10 ses using a single-nucleotide polymorphism microarray (aSNP) groups. Consequently, fluorescence in situ hybridization were performed on frozen tissue samples from four fatal dis- (FISH) detection of chromosome 3 using a centromeric probe omy 3 metastasizing UMs and three disomy 3 tumors with Ͼ5 became routine practice for UM prognostication; however, 5% years’ metastasis-free survival. to 20% of disomy 3 UM patients unexpectedly develop metas- tases.11 Attempts have therefore been made to identify the RESULTS. Two metastasizing UMs that had been classified as minimal region(s) of deletion on chromosome 3.12–15 Despite disomy 3 by FISH analysis of a small tumor sample were found these studies, little progress has been made in defining the key on MLPA analysis to show monosomy 3.
    [Show full text]
  • A Computational Approach for Defining a Signature of Β-Cell Golgi Stress in Diabetes Mellitus
    Page 1 of 781 Diabetes A Computational Approach for Defining a Signature of β-Cell Golgi Stress in Diabetes Mellitus Robert N. Bone1,6,7, Olufunmilola Oyebamiji2, Sayali Talware2, Sharmila Selvaraj2, Preethi Krishnan3,6, Farooq Syed1,6,7, Huanmei Wu2, Carmella Evans-Molina 1,3,4,5,6,7,8* Departments of 1Pediatrics, 3Medicine, 4Anatomy, Cell Biology & Physiology, 5Biochemistry & Molecular Biology, the 6Center for Diabetes & Metabolic Diseases, and the 7Herman B. Wells Center for Pediatric Research, Indiana University School of Medicine, Indianapolis, IN 46202; 2Department of BioHealth Informatics, Indiana University-Purdue University Indianapolis, Indianapolis, IN, 46202; 8Roudebush VA Medical Center, Indianapolis, IN 46202. *Corresponding Author(s): Carmella Evans-Molina, MD, PhD ([email protected]) Indiana University School of Medicine, 635 Barnhill Drive, MS 2031A, Indianapolis, IN 46202, Telephone: (317) 274-4145, Fax (317) 274-4107 Running Title: Golgi Stress Response in Diabetes Word Count: 4358 Number of Figures: 6 Keywords: Golgi apparatus stress, Islets, β cell, Type 1 diabetes, Type 2 diabetes 1 Diabetes Publish Ahead of Print, published online August 20, 2020 Diabetes Page 2 of 781 ABSTRACT The Golgi apparatus (GA) is an important site of insulin processing and granule maturation, but whether GA organelle dysfunction and GA stress are present in the diabetic β-cell has not been tested. We utilized an informatics-based approach to develop a transcriptional signature of β-cell GA stress using existing RNA sequencing and microarray datasets generated using human islets from donors with diabetes and islets where type 1(T1D) and type 2 diabetes (T2D) had been modeled ex vivo. To narrow our results to GA-specific genes, we applied a filter set of 1,030 genes accepted as GA associated.
    [Show full text]
  • Intra-Species Differences in Population Size Shape Life History and Genome Evolution
    bioRxiv preprint doi: https://doi.org/10.1101/852368; this version posted December 12, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 1 Intra-Species Differences in Population Size shape Life History and Genome Evolution 2 Authors: David Willemsen1, Rongfeng Cui1, Martin Reichard2, Dario Riccardo Valenzano1,3* 3 Affiliations: 4 1Max Planck Institute for Biology of Ageing, Cologne, Germany. 5 2The Czech Academy of Sciences, Institute of Vertebrate Biology, Brno, Czech Republic. 6 3CECAD, University of Cologne, Cologne, Germany. 7 *Correspondence to: [email protected] 8 Key words: life history, evolution, genome, population genetics, killifish, Nothobranchius 9 furzeri, lifespan, sex chromosome, selection, genetic drift 10 11 1 bioRxiv preprint doi: https://doi.org/10.1101/852368; this version posted December 12, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 12 Abstract 13 The evolutionary forces shaping life history trait divergence within species are largely unknown. 14 Killifish (oviparous Cyprinodontiformes) evolved an annual life cycle as an exceptional 15 adaptation to life in arid savannah environments characterized by seasonal water availability. The 16 turquoise killifish (Nothobranchius furzeri) is the shortest-lived vertebrate known to science and 17 displays differences in lifespan among wild populations, representing an ideal natural experiment 18 in the evolution and diversification of life history.
    [Show full text]
  • Supplementary Table S4. FGA Co-Expressed Gene List in LUAD
    Supplementary Table S4. FGA co-expressed gene list in LUAD tumors Symbol R Locus Description FGG 0.919 4q28 fibrinogen gamma chain FGL1 0.635 8p22 fibrinogen-like 1 SLC7A2 0.536 8p22 solute carrier family 7 (cationic amino acid transporter, y+ system), member 2 DUSP4 0.521 8p12-p11 dual specificity phosphatase 4 HAL 0.51 12q22-q24.1histidine ammonia-lyase PDE4D 0.499 5q12 phosphodiesterase 4D, cAMP-specific FURIN 0.497 15q26.1 furin (paired basic amino acid cleaving enzyme) CPS1 0.49 2q35 carbamoyl-phosphate synthase 1, mitochondrial TESC 0.478 12q24.22 tescalcin INHA 0.465 2q35 inhibin, alpha S100P 0.461 4p16 S100 calcium binding protein P VPS37A 0.447 8p22 vacuolar protein sorting 37 homolog A (S. cerevisiae) SLC16A14 0.447 2q36.3 solute carrier family 16, member 14 PPARGC1A 0.443 4p15.1 peroxisome proliferator-activated receptor gamma, coactivator 1 alpha SIK1 0.435 21q22.3 salt-inducible kinase 1 IRS2 0.434 13q34 insulin receptor substrate 2 RND1 0.433 12q12 Rho family GTPase 1 HGD 0.433 3q13.33 homogentisate 1,2-dioxygenase PTP4A1 0.432 6q12 protein tyrosine phosphatase type IVA, member 1 C8orf4 0.428 8p11.2 chromosome 8 open reading frame 4 DDC 0.427 7p12.2 dopa decarboxylase (aromatic L-amino acid decarboxylase) TACC2 0.427 10q26 transforming, acidic coiled-coil containing protein 2 MUC13 0.422 3q21.2 mucin 13, cell surface associated C5 0.412 9q33-q34 complement component 5 NR4A2 0.412 2q22-q23 nuclear receptor subfamily 4, group A, member 2 EYS 0.411 6q12 eyes shut homolog (Drosophila) GPX2 0.406 14q24.1 glutathione peroxidase
    [Show full text]
  • Supplementary Material
    BMJ Publishing Group Limited (BMJ) disclaims all liability and responsibility arising from any reliance Supplemental material placed on this supplemental material which has been supplied by the author(s) J Neurol Neurosurg Psychiatry Page 1 / 45 SUPPLEMENTARY MATERIAL Appendix A1: Neuropsychological protocol. Appendix A2: Description of the four cases at the transitional stage. Table A1: Clinical status and center proportion in each batch. Table A2: Complete output from EdgeR. Table A3: List of the putative target genes. Table A4: Complete output from DIANA-miRPath v.3. Table A5: Comparison of studies investigating miRNAs from brain samples. Figure A1: Stratified nested cross-validation. Figure A2: Expression heatmap of miRNA signature. Figure A3: Bootstrapped ROC AUC scores. Figure A4: ROC AUC scores with 100 different fold splits. Figure A5: Presymptomatic subjects probability scores. Figure A6: Heatmap of the level of enrichment in KEGG pathways. Kmetzsch V, et al. J Neurol Neurosurg Psychiatry 2021; 92:485–493. doi: 10.1136/jnnp-2020-324647 BMJ Publishing Group Limited (BMJ) disclaims all liability and responsibility arising from any reliance Supplemental material placed on this supplemental material which has been supplied by the author(s) J Neurol Neurosurg Psychiatry Appendix A1. Neuropsychological protocol The PREV-DEMALS cognitive evaluation included standardized neuropsychological tests to investigate all cognitive domains, and in particular frontal lobe functions. The scores were provided previously (Bertrand et al., 2018). Briefly, global cognitive efficiency was evaluated by means of Mini-Mental State Examination (MMSE) and Mattis Dementia Rating Scale (MDRS). Frontal executive functions were assessed with Frontal Assessment Battery (FAB), forward and backward digit spans, Trail Making Test part A and B (TMT-A and TMT-B), Wisconsin Card Sorting Test (WCST), and Symbol-Digit Modalities test.
    [Show full text]
  • Identification of Potential Key Genes and Pathway Linked with Sporadic Creutzfeldt-Jakob Disease Based on Integrated Bioinformatics Analyses
    medRxiv preprint doi: https://doi.org/10.1101/2020.12.21.20248688; this version posted December 24, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission. Identification of potential key genes and pathway linked with sporadic Creutzfeldt-Jakob disease based on integrated bioinformatics analyses Basavaraj Vastrad1, Chanabasayya Vastrad*2 , Iranna Kotturshetti 1. Department of Biochemistry, Basaveshwar College of Pharmacy, Gadag, Karnataka 582103, India. 2. Biostatistics and Bioinformatics, Chanabasava Nilaya, Bharthinagar, Dharwad 580001, Karanataka, India. 3. Department of Ayurveda, Rajiv Gandhi Education Society`s Ayurvedic Medical College, Ron, Karnataka 562209, India. * Chanabasayya Vastrad [email protected] Ph: +919480073398 Chanabasava Nilaya, Bharthinagar, Dharwad 580001 , Karanataka, India NOTE: This preprint reports new research that has not been certified by peer review and should not be used to guide clinical practice. medRxiv preprint doi: https://doi.org/10.1101/2020.12.21.20248688; this version posted December 24, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission. Abstract Sporadic Creutzfeldt-Jakob disease (sCJD) is neurodegenerative disease also called prion disease linked with poor prognosis. The aim of the current study was to illuminate the underlying molecular mechanisms of sCJD. The mRNA microarray dataset GSE124571 was downloaded from the Gene Expression Omnibus database. Differentially expressed genes (DEGs) were screened.
    [Show full text]
  • Mitoxplorer, a Visual Data Mining Platform To
    mitoXplorer, a visual data mining platform to systematically analyze and visualize mitochondrial expression dynamics and mutations Annie Yim, Prasanna Koti, Adrien Bonnard, Fabio Marchiano, Milena Dürrbaum, Cecilia Garcia-Perez, José Villaveces, Salma Gamal, Giovanni Cardone, Fabiana Perocchi, et al. To cite this version: Annie Yim, Prasanna Koti, Adrien Bonnard, Fabio Marchiano, Milena Dürrbaum, et al.. mitoXplorer, a visual data mining platform to systematically analyze and visualize mitochondrial expression dy- namics and mutations. Nucleic Acids Research, Oxford University Press, 2020, 10.1093/nar/gkz1128. hal-02394433 HAL Id: hal-02394433 https://hal-amu.archives-ouvertes.fr/hal-02394433 Submitted on 4 Dec 2019 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Distributed under a Creative Commons Attribution| 4.0 International License Nucleic Acids Research, 2019 1 doi: 10.1093/nar/gkz1128 Downloaded from https://academic.oup.com/nar/advance-article-abstract/doi/10.1093/nar/gkz1128/5651332 by Bibliothèque de l'université la Méditerranée user on 04 December 2019 mitoXplorer, a visual data mining platform to systematically analyze and visualize mitochondrial expression dynamics and mutations Annie Yim1,†, Prasanna Koti1,†, Adrien Bonnard2, Fabio Marchiano3, Milena Durrbaum¨ 1, Cecilia Garcia-Perez4, Jose Villaveces1, Salma Gamal1, Giovanni Cardone1, Fabiana Perocchi4, Zuzana Storchova1,5 and Bianca H.
    [Show full text]
  • A Dissertation Entitled the Androgen Receptor
    A Dissertation entitled The Androgen Receptor as a Transcriptional Co-activator: Implications in the Growth and Progression of Prostate Cancer By Mesfin Gonit Submitted to the Graduate Faculty as partial fulfillment of the requirements for the PhD Degree in Biomedical science Dr. Manohar Ratnam, Committee Chair Dr. Lirim Shemshedini, Committee Member Dr. Robert Trumbly, Committee Member Dr. Edwin Sanchez, Committee Member Dr. Beata Lecka -Czernik, Committee Member Dr. Patricia R. Komuniecki, Dean College of Graduate Studies The University of Toledo August 2011 Copyright 2011, Mesfin Gonit This document is copyrighted material. Under copyright law, no parts of this document may be reproduced without the expressed permission of the author. An Abstract of The Androgen Receptor as a Transcriptional Co-activator: Implications in the Growth and Progression of Prostate Cancer By Mesfin Gonit As partial fulfillment of the requirements for the PhD Degree in Biomedical science The University of Toledo August 2011 Prostate cancer depends on the androgen receptor (AR) for growth and survival even in the absence of androgen. In the classical models of gene activation by AR, ligand activated AR signals through binding to the androgen response elements (AREs) in the target gene promoter/enhancer. In the present study the role of AREs in the androgen- independent transcriptional signaling was investigated using LP50 cells, derived from parental LNCaP cells through extended passage in vitro. LP50 cells reflected the signature gene overexpression profile of advanced clinical prostate tumors. The growth of LP50 cells was profoundly dependent on nuclear localized AR but was independent of androgen. Nevertheless, in these cells AR was unable to bind to AREs in the absence of androgen.
    [Show full text]
  • Newfound Coding Potential of Transcripts Unveils Missing Members Of
    bioRxiv preprint doi: https://doi.org/10.1101/2020.12.02.406710; this version posted December 3, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. 1 Newfound coding potential of transcripts unveils missing members of 2 human protein communities 3 4 Sebastien Leblanc1,2, Marie A Brunet1,2, Jean-François Jacques1,2, Amina M Lekehal1,2, Andréa 5 Duclos1, Alexia Tremblay1, Alexis Bruggeman-Gascon1, Sondos Samandi1,2, Mylène Brunelle1,2, 6 Alan A Cohen3, Michelle S Scott1, Xavier Roucou1,2,* 7 1Department of Biochemistry and Functional Genomics, Université de Sherbrooke, Sherbrooke, 8 Quebec, Canada. 9 2 PROTEO, Quebec Network for Research on Protein Function, Structure, and Engineering. 10 3Department of Family Medicine, Université de Sherbrooke, Sherbrooke, Quebec, Canada. 11 12 *Corresponding author: Tel. (819) 821-8000x72240; E-Mail: [email protected] 13 14 15 Running title: Alternative proteins in communities 16 17 Keywords: alternative proteins, protein network, protein-protein interactions, pseudogenes, 18 affinity purification-mass spectrometry 19 20 1 bioRxiv preprint doi: https://doi.org/10.1101/2020.12.02.406710; this version posted December 3, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. 21 Abstract 22 23 Recent proteogenomic approaches have led to the discovery that regions of the transcriptome 24 previously annotated as non-coding regions (i.e.
    [Show full text]
  • Open Data for Differential Network Analysis in Glioma
    International Journal of Molecular Sciences Article Open Data for Differential Network Analysis in Glioma , Claire Jean-Quartier * y , Fleur Jeanquartier y and Andreas Holzinger Holzinger Group HCI-KDD, Institute for Medical Informatics, Statistics and Documentation, Medical University Graz, Auenbruggerplatz 2/V, 8036 Graz, Austria; [email protected] (F.J.); [email protected] (A.H.) * Correspondence: [email protected] These authors contributed equally to this work. y Received: 27 October 2019; Accepted: 3 January 2020; Published: 15 January 2020 Abstract: The complexity of cancer diseases demands bioinformatic techniques and translational research based on big data and personalized medicine. Open data enables researchers to accelerate cancer studies, save resources and foster collaboration. Several tools and programming approaches are available for analyzing data, including annotation, clustering, comparison and extrapolation, merging, enrichment, functional association and statistics. We exploit openly available data via cancer gene expression analysis, we apply refinement as well as enrichment analysis via gene ontology and conclude with graph-based visualization of involved protein interaction networks as a basis for signaling. The different databases allowed for the construction of huge networks or specified ones consisting of high-confidence interactions only. Several genes associated to glioma were isolated via a network analysis from top hub nodes as well as from an outlier analysis. The latter approach highlights a mitogen-activated protein kinase next to a member of histondeacetylases and a protein phosphatase as genes uncommonly associated with glioma. Cluster analysis from top hub nodes lists several identified glioma-associated gene products to function within protein complexes, including epidermal growth factors as well as cell cycle proteins or RAS proto-oncogenes.
    [Show full text]
  • Supplemental Table S1. Primers for Sybrgreen Quantitative RT-PCR Assays
    Supplemental Table S1. Primers for SYBRGreen quantitative RT-PCR assays. Gene Accession Primer Sequence Length Start Stop Tm GC% GAPDH NM_002046.3 GAPDH F TCCTGTTCGACAGTCAGCCGCA 22 39 60 60.43 59.09 GAPDH R GCGCCCAATACGACCAAATCCGT 23 150 128 60.12 56.52 Exon junction 131/132 (reverse primer) on template NM_002046.3 DNAH6 NM_001370.1 DNAH6 F GGGCCTGGTGCTGCTTTGATGA 22 4690 4711 59.66 59.09% DNAH6 R TAGAGAGCTTTGCCGCTTTGGCG 23 4797 4775 60.06 56.52% Exon junction 4790/4791 (reverse primer) on template NM_001370.1 DNAH7 NM_018897.2 DNAH7 F TGCTGCATGAGCGGGCGATTA 21 9973 9993 59.25 57.14% DNAH7 R AGGAAGCCATGTACAAAGGTTGGCA 25 10073 10049 58.85 48.00% Exon junction 9989/9990 (forward primer) on template NM_018897.2 DNAI1 NM_012144.2 DNAI1 F AACAGATGTGCCTGCAGCTGGG 22 673 694 59.67 59.09 DNAI1 R TCTCGATCCCGGACAGGGTTGT 22 822 801 59.07 59.09 Exon junction 814/815 (reverse primer) on template NM_012144.2 RPGRIP1L NM_015272.2 RPGRIP1L F TCCCAAGGTTTCACAAGAAGGCAGT 25 3118 3142 58.5 48.00% RPGRIP1L R TGCCAAGCTTTGTTCTGCAAGCTGA 25 3238 3214 60.06 48.00% Exon junction 3124/3125 (forward primer) on template NM_015272.2 Supplemental Table S2. Transcripts that differentiate IPF/UIP from controls at 5%FDR Fold- p-value Change Transcript Gene p-value p-value p-value (IPF/UIP (IPF/UIP Cluster ID RefSeq Symbol gene_assignment (Age) (Gender) (Smoking) vs. C) vs. C) NM_001178008 // CBS // cystathionine-beta- 8070632 NM_001178008 CBS synthase // 21q22.3 // 875 /// NM_0000 0.456642 0.314761 0.418564 4.83E-36 -2.23 NM_003013 // SFRP2 // secreted frizzled- 8103254 NM_003013
    [Show full text]
  • Uncovering Adaptive Evolution in the Human Lineage Magdalena Gayà-Vidal1,2 and M Mar Albà1,3*
    Gayà-Vidal and Albà BMC Genomics 2014, 15:599 http://www.biomedcentral.com/1471-2164/15/599 RESEARCH ARTICLE Open Access Uncovering adaptive evolution in the human lineage Magdalena Gayà-Vidal1,2 and M Mar Albà1,3* Abstract Background: The recent increase in human polymorphism data, together with the availability of genome sequences from several primate species, provides an unprecedented opportunity to investigate how natural selection has shaped human evolution. Results: We compared human branch-specific substitutions with variation data in the current human population to measure the impact of adaptive evolution on human protein coding genes. The use of single nucleotide polymorphisms (SNPs) with high derived allele frequencies (DAFs) minimized the influence of segregating slightly deleterious mutations and improved the estimation of the number of adaptive sites. Using DAF ≥ 60% we showed that the proportion of adaptive substitutions is 0.2% in the complete gene set. However, the percentage rose to 40% when we focused on genes that are specifically accelerated in the human branch with respect to the chimpanzee branch, or on genes that show signatures of adaptive selection at the codon level by the maximum likelihood based branch-site test. In general, neural genes are enriched in positive selection signatures. Genes with multiple lines of evidence of positive selection include taxilin beta, which is involved in motor nerve regeneration and syntabulin, and is required for the formation of new presynaptic boutons. Conclusions: We combined several methods to detect adaptive evolution in human coding sequences at a genome-wide level. The use of variation data, in addition to sequence divergence information, uncovered previously undetected positive selection signatures in neural genes.
    [Show full text]