CSE642 Final Version

Total Page:16

File Type:pdf, Size:1020Kb

CSE642 Final Version Eindhoven University of Technology MASTER Dimensionality reduction of gene expression data Arts, S. Award date: 2018 Link to publication Disclaimer This document contains a student thesis (bachelor's or master's), as authored by a student at Eindhoven University of Technology. Student theses are made available in the TU/e repository upon obtaining the required degree. The grade received is not published on the document as presented in the repository. The required complexity or quality of research of student theses may vary by program, and the required minimum study period may vary in duration. General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain Eindhoven University of Technology MASTER THESIS Dimensionality Reduction of Gene Expression Data Author: S. (Sako) Arts Daily Supervisor: dr. V. (Vlado) Menkovski Graduation Committee: dr. V. (Vlado) Menkovski dr. D.C. (Decebal) Mocanu dr. N. (Nikolay) Yakovets May 16, 2018 v1.0 Abstract The focus of this thesis is dimensionality reduction of gene expression data. I propose and test a framework that deploys linear prediction algorithms resulting in a reduced set of selected genes relevant to a specified case. Abstract In cancer research there is a large need to automate parts of the process of diagnosis, this is mainly to reduce cost, make it faster and more accurate. Gene expression data of tumor samples is known to contain much information about the disease and with that information that can help with diagnosing and even curing the patient. Datamining methods are an obvious candidate to aid in this automation and therefore have been deployed on gene expression data in a number of research papers. However, all these researchers face the same problem, the limited amount of samples and the large number of features. Due to the many genes in the human genome and the only few tumor samples that have been processed for their gene expression data, the data set has much more features than there are samples. This makes any type of data analysis hard due to easy overfitting on data with these properties. Thusly, there is a need for a way of effectively reducing the dimensionality of the samples, without removing the information of interest, to enable more effective data analysis of the reduced set. In this thesis, I propose a framework that is capable of reducing the dimensionality of gene expres- sion samples for case-specific purposes. I explore multiple types of dimensionality reduction from basic statistical ones to novel deep learning algorithms. This research concludes with suggesting a combination of multiple linear prediction algorithms for feature selection in a case-specific fashion. With these types of algorithms some problems exist with the selections stability and robustness, I designed a framework aimed at improving these properties of the resulting selection. The frame- work combines multiple of these algorithms with cross-folding to end up with a sufficiently stable set of features that can be used for further analysis. Apart from this main result, the framework produces metrics that indicate the quality of the selection and it does additional genetic analysis and plotting relevant for field experts. Besides proposing and arguing the setup and validity of this framework, the implemented frame- work is tested on several medically relevant use cases and the results of these tests are presented and analyzed in this thesis as well. These results show the effectiveness of the framework on certain use-cases and the limits of the gene expression data. They prove that the framework is a solution to the proposed problem and show that the framework could add value for medical professionals during their daily practices. 3 Acknowledgments Before diving into the full scientific extent of this thesis I would like to reserve some space to thank the people that helped me reach this goal in acquiring my Master of Science title. First and foremost, I would like to thank my daily supervisor Vlado Menkovski. Without his support and insights I would not have been able to finish a project of this magnitude. I would like to thank him for sparking my interest in deep learning, a field that I am now pursuing a career in. I would like to thank him for introducing me to the right people enabling me to finish two successful internships during my masters, during these internships I learned most of my skills in this field. I would like to thank him for providing the atmosphere and infrastructure within the TU/e's deep learning community that helped me thrive in this field. And lastly, I would like to thank him for putting up with me all this time, I have been bothering him for over a year now, and while I know he is a busy man that didn't stop me from dropping in at his office whenever it suited me. I would like to thank Decebal Mocanu and Nikolay Yakovets for being part of my graduation committee and plowing through this stack of paper. I'm not much of a reader myself so I am extra grateful for this effort. I would like to thank the guys and girls of the deep learning community at the TU/e. We spend many AA meetings together, I learned a lot about the field by listening to their struggles and achievements, and I'm very grateful for their feedback on mine. I would also like to thank my fellow Data Science pioneers, Jos, Stefan, and Puck, for the many projects and assignments we collaborated on over the course of my masters. I will also extend my gratitude to Nevenka and my colleagues at Philips New York, not only for their expertise in the field of genomics without which this project would not have been possible, but also for their warmth and kindness during my stay in the US. I really enjoyed myself in the office and at the activities they invited me too outside of office. I really hope our paths will cross once more in the future. A special shout-out to Steven and Andrea for letting me stay at Andrea's place the times I didn't make it to the last train out of NYC. Many thanks to all the friends I made at GEWIS, especially those of B.O.O.M. and the BAr Committee with which I drank many a beer and I expect to drink many more in the future. These people made my years as a student the time of my life and I look back on many great memories of activities partially financed by the many kind companies giving large sums of money to GEWIS. A special thanks to my fellow board members of the 33rd board of GEWIS. It could not have been easy to deal with me as a chairman, especially on some early Friday mornings. Last, but definitely not least, I would like to thank my girlfriend Jet, my parents, Jan and Annelies and my sisters Tika and Nadi, for their continuous support during my studies. They always provided me with a safe haven to return to and a place to take the necessary rest during my studies. Special thanks to my parents for providing me with many freshly ironed dress shirts. Oh, and let's not forget to thank Daan for being Daan I guess. 4 Contents 1 Introduction 6 2 Domain and Data7 2.1 Domain..........................................7 2.2 Data Extraction.....................................8 2.3 Data Source........................................9 3 Background 11 3.1 Dimensionality Reduction................................ 11 3.2 Feature Extraction.................................... 11 3.3 Feature Selection..................................... 14 4 Proposed Methodology 20 4.1 Approach to Dimensionality Reduction......................... 20 4.2 Robustness........................................ 22 4.3 Framework........................................ 23 5 Implementation 26 5.1 Data Retrieval, Processing and Enrichment...................... 26 5.2 Experimental Structure................................. 28 6 Results 31 6.1 Research Results..................................... 31 6.2 Framework Results.................................... 35 7 Conclusion 60 7.1 Method of Reduction................................... 60 7.2 Experimental Results................................... 61 7.3 Future Work....................................... 61 8 References 63 Appendices 65 A Result Interpretations 66 B Selected Genes 68 C Large Tables 98 D Sacred notebook 112 E Code listings 120 5 Chapter 1 Introduction Within the field of medicine, cancer is clearly one of the most researched diseases. This interest is caused by the number of deaths and the difficulty of preventing or predicting when and where the disease will pop up. Data mining is an obvious candidate to help with this predicting and diagnosing due to its ability to find previously unknown relations with few or any real knowledge about a process' innerworkings. With more diagnostic and demographic data of cancer patients becoming available, the amount of data mining algorithms applicable and their effectiveness is growing. Especially the genetic information amongst this data is interesting since it is known that a human's collective genome causes all sorts of phenotypes, which thusly should be derivable from this genetic data. However, there are very few phenotypes of which the direct relation to a set of genes is known, even less of which it is possible to explain the cause and effect. One of the main problems that makes genetic data difficult to analyze is its dimensionality. This is caused by two inherent properties of this type of data.
Recommended publications
  • Genetic Variants in Humanin Nuclear Isoform Gene Regions Show No Association with Coronary Artery Disease
    Genetic variants in humanin nuclear isoform gene regions show no association with coronary artery disease Mall Eltermaa ( [email protected] ) University of Tartu https://orcid.org/0000-0002-9651-4107 Maili Jakobson University of Tartu Meeme Utt University of Tartu Sulev Kõks Perron Institute for Neurological and Translational Science Reedik Mägi University of Tartu Joel Starkopf Tartu University Hospital Research article Keywords: humanin, humanin-like, coronary artery disease, association study, peptide Posted Date: June 25th, 2019 DOI: https://doi.org/10.21203/rs.2.10582/v1 License: This work is licensed under a Creative Commons Attribution 4.0 International License. Read Full License Version of Record: A version of this preprint was published at BMC Research Notes on November 21st, 2019. See the published version at https://doi.org/10.1186/s13104-019-4807-x. Loading [MathJax]/jax/output/CommonHTML/jax.js Page 1/7 Abstract Background Coronary artery disease contributes to noncommunicable disease deaths worldwide. In order to make preventive methods more accurate, we need to know more about the development and progress of this pathology, including the genetic aspects. Humanin is a small peptide known for its cytoprotective and anti-apoptotic properties. Our study looked for genomic associations between humanin-like nuclear isoform genes and coronary artery disease using CARDIoGRAMplusC4D Consortium data. Results Lookup from meta-analysis datasets gave single nucleotide polymorphisms in all 13 humanin-like nuclear isoform genes with the lowest P-value for rs6151662 from the MTRNR2L2 gene including the 50 kb anking region in both directions (P-value = 0.0037). Within the gene region alone the top variant was rs78083998 from the MTRNR2L13 region (meta-analysis P-value=0.042).
    [Show full text]
  • Psychostimulant-Regulated Plasticity in Interneurons of the Nucleus Accumbens
    Psychostimulant-Regulated Plasticity in Interneurons of the Nucleus Accumbens by David A. Gallegos Department of Neurobiology Duke University Date:_______________________ Approved: ___________________________ Anne E. West, Supervisor ___________________________ Jorg Grandl ___________________________ Debra Silver ___________________________ Gregory Crawford ___________________________ Hiro Matsunami Psychostimulant-Regulated Epigenetic Plasticity in Interneurons of the Nucleus Accumbens submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Department of Neurobiology in the Graduate School of Duke University 2019 ABSTRACT Psychostimulant-Regulated Epigenetic Plasticity in Interneurons of the Nucleus Accumbens by David A. Gallegos Department of Neurobiology Duke University Date:_______________________ Approved: ___________________________ Anne E. West, Supervisor ___________________________ Jorg Grandl ___________________________ Debra Silver ___________________________ Gregory Crawford ___________________________ Hiro Matsunami An abstract of a dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Department of Neurobiology in the Graduate School of Duke University 2019 Copyright by David Andres Gallegos 2019 Abstract Exposure to psychostimulant drugs of abuse exerts lasting influences on brain function via the regulation of immediate and persistent gene transcription. These changes in gene transcription drive the development of addictive-like
    [Show full text]
  • A Clinicopathological and Molecular Genetic Analysis of Low-Grade Glioma in Adults
    A CLINICOPATHOLOGICAL AND MOLECULAR GENETIC ANALYSIS OF LOW-GRADE GLIOMA IN ADULTS Presented by ANUSHREE SINGH MSc A thesis submitted in partial fulfilment of the requirements of the University of Wolverhampton for the degree of Doctor of Philosophy Brain Tumour Research Centre Research Institute in Healthcare Sciences Faculty of Science and Engineering University of Wolverhampton November 2014 i DECLARATION This work or any part thereof has not previously been presented in any form to the University or to any other body whether for the purposes of assessment, publication or for any other purpose (unless otherwise indicated). Save for any express acknowledgments, references and/or bibliographies cited in the work, I confirm that the intellectual content of the work is the result of my own efforts and of no other person. The right of Anushree Singh to be identified as author of this work is asserted in accordance with ss.77 and 78 of the Copyright, Designs and Patents Act 1988. At this date copyright is owned by the author. Signature: Anushree Date: 30th November 2014 ii ABSTRACT The aim of the study was to identify molecular markers that can determine progression of low grade glioma. This was done using various approaches such as IDH1 and IDH2 mutation analysis, MGMT methylation analysis, copy number analysis using array comparative genomic hybridisation and identification of differentially expressed miRNAs using miRNA microarray analysis. IDH1 mutation was present at a frequency of 71% in low grade glioma and was identified as an independent marker for improved OS in a multivariate analysis, which confirms the previous findings in low grade glioma studies.
    [Show full text]
  • Automethylation of PRC2 Promotes H3K27 Methylation and Is Impaired in H3K27M Pediatric Glioma
    Downloaded from genesdev.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press Automethylation of PRC2 promotes H3K27 methylation and is impaired in H3K27M pediatric glioma Chul-Hwan Lee,1,2,7 Jia-Ray Yu,1,2,7 Jeffrey Granat,1,2,7 Ricardo Saldaña-Meyer,1,2 Joshua Andrade,3 Gary LeRoy,1,2 Ying Jin,4 Peder Lund,5 James M. Stafford,1,2,6 Benjamin A. Garcia,5 Beatrix Ueberheide,3 and Danny Reinberg1,2 1Department of Biochemistry and Molecular Pharmacology, New York University School of Medicine, New York, New York 10016, USA; 2Howard Hughes Medical Institute, Chevy Chase, Maryland 20815, USA; 3Proteomics Laboratory, New York University School of Medicine, New York, New York 10016, USA; 4Shared Bioinformatics Core, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA; 5Department of Biochemistry and Molecular Biophysics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA The histone methyltransferase activity of PRC2 is central to the formation of H3K27me3-decorated facultative heterochromatin and gene silencing. In addition, PRC2 has been shown to automethylate its core subunits, EZH1/ EZH2 and SUZ12. Here, we identify the lysine residues at which EZH1/EZH2 are automethylated with EZH2-K510 and EZH2-K514 being the major such sites in vivo. Automethylated EZH2/PRC2 exhibits a higher level of histone methyltransferase activity and is required for attaining proper cellular levels of H3K27me3. While occurring inde- pendently of PRC2 recruitment to chromatin, automethylation promotes PRC2 accessibility to the histone H3 tail. Intriguingly, EZH2 automethylation is significantly reduced in diffuse intrinsic pontine glioma (DIPG) cells that carry a lysine-to-methionine substitution in histone H3 (H3K27M), but not in cells that carry either EZH2 or EED mutants that abrogate PRC2 allosteric activation, indicating that H3K27M impairs the intrinsic activity of PRC2.
    [Show full text]
  • Cellular and Molecular Signatures in the Disease Tissue of Early
    Cellular and Molecular Signatures in the Disease Tissue of Early Rheumatoid Arthritis Stratify Clinical Response to csDMARD-Therapy and Predict Radiographic Progression Frances Humby1,* Myles Lewis1,* Nandhini Ramamoorthi2, Jason Hackney3, Michael Barnes1, Michele Bombardieri1, Francesca Setiadi2, Stephen Kelly1, Fabiola Bene1, Maria di Cicco1, Sudeh Riahi1, Vidalba Rocher-Ros1, Nora Ng1, Ilias Lazorou1, Rebecca E. Hands1, Desiree van der Heijde4, Robert Landewé5, Annette van der Helm-van Mil4, Alberto Cauli6, Iain B. McInnes7, Christopher D. Buckley8, Ernest Choy9, Peter Taylor10, Michael J. Townsend2 & Costantino Pitzalis1 1Centre for Experimental Medicine and Rheumatology, William Harvey Research Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, Charterhouse Square, London EC1M 6BQ, UK. Departments of 2Biomarker Discovery OMNI, 3Bioinformatics and Computational Biology, Genentech Research and Early Development, South San Francisco, California 94080 USA 4Department of Rheumatology, Leiden University Medical Center, The Netherlands 5Department of Clinical Immunology & Rheumatology, Amsterdam Rheumatology & Immunology Center, Amsterdam, The Netherlands 6Rheumatology Unit, Department of Medical Sciences, Policlinico of the University of Cagliari, Cagliari, Italy 7Institute of Infection, Immunity and Inflammation, University of Glasgow, Glasgow G12 8TA, UK 8Rheumatology Research Group, Institute of Inflammation and Ageing (IIA), University of Birmingham, Birmingham B15 2WB, UK 9Institute of
    [Show full text]
  • The Role of the X Chromosome in Embryonic and Postnatal Growth
    The role of the X chromosome in embryonic and postnatal growth Daniel Mark Snell A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy of University College London. Francis Crick Institute/Medical Research Council National Institute for Medical Research University College London January 28, 2018 2 I, Daniel Mark Snell, confirm that the work presented in this thesis is my own. Where information has been derived from other sources, I confirm that this has been indicated in the work. Abstract Women born with only a single X chromosome (XO) have Turner syndrome (TS); and they are invariably of short stature. XO female mice are also small: during embryogenesis, female mice with a paternally-inherited X chromosome (XPO) are smaller than XX littermates; whereas during early postnatal life, both XPO and XMO (maternal) mice are smaller than their XX siblings. Here I look to further understand the genetic bases of these phenotypes, and potentially inform areas of future investigation into TS. Mouse pre-implantation embryos preferentially silence the XP via the non-coding RNA Xist.XPO embryos are smaller than XX littermates at embryonic day (E) 10.5, whereas XMO embryos are not. Two possible hypotheses explain this obser- vation. Inappropriate expression of Xist in XPO embryos may cause transcriptional silencing of the single X chromosome and result in embryos nullizygous for X gene products. Alternatively, there could be imprinted genes on the X chromosome that impact on growth and manifest in growth retarded XPO embryos. In contrast, dur- ing the first three weeks of postnatal development, both XPO and XMO mice show a growth deficit when compared with XX littermates.
    [Show full text]
  • GWAS for Meat and Carcass Traits
    G3: Genes|Genomes|Genetics Early Online, published on July 11, 2019 as doi:10.1534/g3.119.400452 GWAS for meat and carcass traits using imputed sequence level genotypes in pooled F2- designs in pigs Clemens Falker-Gieske,*,1,2 Iulia Blaj,†,2 Siegfried Preuß,‡ Jörn Bennewitz,‡ Georg Thaller,† Jens Tetens*,§ *Department of Animal Sciences, Georg-August-University, 37077 Göttingen, Germany. †Institute of Animal Breeding and Husbandry, Kiel University, 24118 Kiel, Germany. ‡Institute of Animal Husbandry and Breeding, University of Hohenheim, 70599 Stuttgart, Germany. §Center for Integrated Breeding Research, Georg-August-University, 37077 Göttingen, Germany. 1 © The Author(s) 2013. Published by the Genetics Society of America. Running title: Sequence level GWAS in pooled F2 pigs Keywords Genome wide association study Whole genome sequencing Imputation Meat, carcass, and production traits Variant calling 1Corresponding author: Clemens Falker-Gieske, Georg-August-University Goettingen, Department of Animal Sciences, Division Functional Breeding, Burckhardtweg 2, 37077 Goettingen, (+49) 551-39-23669 2contributed equally. 2 ABSTRACT In order to gain insight into the genetic architecture of economically important traits in pigs and to derive suitable genetic markers to improve these traits in breeding programs, many studies have been conducted to map quantitative trait loci. Shortcomings of these studies were low mapping resolution, large confidence intervals for quantitative trait loci-positions and large linkage disequilibrium blocks. Here, we overcome these shortcomings by pooling four large F2 designs to produce smaller linkage disequilibrium blocks and by resequencing the founder generation at high coverage and the F1 generation at low coverage for subsequent imputation of the F2 generation to whole genome sequencing marker density.
    [Show full text]
  • Breast Cancer Type Classification Using Machine Learning
    Journal of Personalized Medicine Article Breast Cancer Type Classification Using Machine Learning Jiande Wu and Chindo Hicks * Department of Genetics, School of Medicine, Louisiana State University Health Sciences Center, 533 Bolivar, New Orleans, LA 70112, USA; [email protected] * Correspondence: [email protected]; Tel.: +1-504-568-2657 Abstract: Background: Breast cancer is a heterogeneous disease defined by molecular types and subtypes. Advances in genomic research have enabled use of precision medicine in clinical man- agement of breast cancer. A critical unmet medical need is distinguishing triple negative breast cancer, the most aggressive and lethal form of breast cancer, from non-triple negative breast cancer. Here we propose use of a machine learning (ML) approach for classification of triple negative breast cancer and non-triple negative breast cancer patients using gene expression data. Methods: We performed analysis of RNA-Sequence data from 110 triple negative and 992 non-triple negative breast cancer tumor samples from The Cancer Genome Atlas to select the features (genes) used in the development and validation of the classification models. We evaluated four different classification models including Support Vector Machines, K-nearest neighbor, Naïve Bayes and Decision tree using features selected at different threshold levels to train the models for classifying the two types of breast cancer. For performance evaluation and validation, the proposed methods were applied to independent gene expression datasets. Results: Among the four ML algorithms evaluated, the Support Vector Machine algorithm was able to classify breast cancer more accurately into triple negative and non-triple negative breast cancer and had less misclassification errors than the other three algorithms evaluated.
    [Show full text]
  • A Screen to Uncover Mediators of Resistance to Liver X Receptor Agonistic Cancer Therapy
    Aus der Medizinische Klinik mit Schwerpunkt Hämatologie, Onkologie und Tumorimmunologie der Medizinischen Fakultät Charité – Universitätsmedizin Berlin DISSERTATION A screen to uncover mediators of resistance to liver X receptor agonistic cancer therapy - Ermittlung potenzieller Vermittler von Resistenz gegen die Liver-X Rezeptor agonistische Krebstherapie zur Erlangung des akademischen Grades Doctor medicinae (Dr. med.) vorgelegt der Medizinischen Fakultät Charité – Universitätsmedizin Berlin von Kimia Nathalie Tafreshian aus Stuttgart, Deutschland Datum der Promotion: 05.03.2021 I Table of contents TABLE OF FIGURES ....................................................................................................................... IV LIST OF TABLES ............................................................................................................................... V LIST OF ABBREVIATIONS ............................................................................................................ VI ABSTRACT ..................................................................................................................................... VIII 1. INTRODUCTION ........................................................................................................................ 1 Colorectal carcinoma ........................................................................................................................ 1 Drug resistance in cancer .................................................................................................................
    [Show full text]
  • Novel Gene Discovery in Primary Ciliary Dyskinesia
    Novel Gene Discovery in Primary Ciliary Dyskinesia Mahmoud Raafat Fassad Genetics and Genomic Medicine Programme Great Ormond Street Institute of Child Health University College London A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy University College London 1 Declaration I, Mahmoud Raafat Fassad, confirm that the work presented in this thesis is my own. Where information has been derived from other sources, I confirm that this has been indicated in the thesis. 2 Abstract Primary Ciliary Dyskinesia (PCD) is one of the ‘ciliopathies’, genetic disorders affecting either cilia structure or function. PCD is a rare recessive disease caused by defective motile cilia. Affected individuals manifest with neonatal respiratory distress, chronic wet cough, upper respiratory tract problems, progressive lung disease resulting in bronchiectasis, laterality problems including heart defects and adult infertility. Early diagnosis and management are essential for better respiratory disease prognosis. PCD is a highly genetically heterogeneous disorder with causal mutations identified in 36 genes that account for the disease in about 70% of PCD cases, suggesting that additional genes remain to be discovered. Targeted next generation sequencing was used for genetic screening of a cohort of patients with confirmed or suggestive PCD diagnosis. The use of multi-gene panel sequencing yielded a high diagnostic output (> 70%) with mutations identified in known PCD genes. Over half of these mutations were novel alleles, expanding the mutation spectrum in PCD genes. The inclusion of patients from various ethnic backgrounds revealed a striking impact of ethnicity on the composition of disease alleles uncovering a significant genetic stratification of PCD in different populations.
    [Show full text]
  • Figure S1. Reverse Transcription‑Quantitative PCR Analysis of ETV5 Mrna Expression Levels in Parental and ETV5 Stable Transfectants
    Figure S1. Reverse transcription‑quantitative PCR analysis of ETV5 mRNA expression levels in parental and ETV5 stable transfectants. (A) Hec1a and Hec1a‑ETV5 EC cell lines; (B) Ishikawa and Ishikawa‑ETV5 EC cell lines. **P<0.005, unpaired Student's t‑test. EC, endometrial cancer; ETV5, ETS variant transcription factor 5. Figure S2. Survival analysis of sample clusters 1‑4. Kaplan Meier graphs for (A) recurrence‑free and (B) overall survival. Survival curves were constructed using the Kaplan‑Meier method, and differences between sample cluster curves were analyzed by log‑rank test. Figure S3. ROC analysis of hub genes. For each gene, ROC curve (left) and mRNA expression levels (right) in control (n=35) and tumor (n=545) samples from The Cancer Genome Atlas Uterine Corpus Endometrioid Cancer cohort are shown. mRNA levels are expressed as Log2(x+1), where ‘x’ is the RSEM normalized expression value. ROC, receiver operating characteristic. Table SI. Clinicopathological characteristics of the GSE17025 dataset. Characteristic n % Atrophic endometrium 12 (postmenopausal) (Control group) Tumor stage I 91 100 Histology Endometrioid adenocarcinoma 79 86.81 Papillary serous 12 13.19 Histological grade Grade 1 30 32.97 Grade 2 36 39.56 Grade 3 25 27.47 Myometrial invasiona Superficial (<50%) 67 74.44 Deep (>50%) 23 25.56 aMyometrial invasion information was available for 90 of 91 tumor samples. Table SII. Clinicopathological characteristics of The Cancer Genome Atlas Uterine Corpus Endometrioid Cancer dataset. Characteristic n % Solid tissue normal 16 Tumor samples Stagea I 226 68.278 II 19 5.740 III 70 21.148 IV 16 4.834 Histology Endometrioid 271 81.381 Mixed 10 3.003 Serous 52 15.616 Histological grade Grade 1 78 23.423 Grade 2 91 27.327 Grade 3 164 49.249 Molecular subtypeb POLE 17 7.328 MSI 65 28.017 CN Low 90 38.793 CN High 60 25.862 CN, copy number; MSI, microsatellite instability; POLE, DNA polymerase ε.
    [Show full text]
  • Nº Ref Uniprot Proteína Péptidos Identificados Por MS/MS 1 P01024
    Document downloaded from http://www.elsevier.es, day 26/09/2021. This copy is for personal use. Any transmission of this document by any media or format is strictly prohibited. Nº Ref Uniprot Proteína Péptidos identificados 1 P01024 CO3_HUMAN Complement C3 OS=Homo sapiens GN=C3 PE=1 SV=2 por 162MS/MS 2 P02751 FINC_HUMAN Fibronectin OS=Homo sapiens GN=FN1 PE=1 SV=4 131 3 P01023 A2MG_HUMAN Alpha-2-macroglobulin OS=Homo sapiens GN=A2M PE=1 SV=3 128 4 P0C0L4 CO4A_HUMAN Complement C4-A OS=Homo sapiens GN=C4A PE=1 SV=1 95 5 P04275 VWF_HUMAN von Willebrand factor OS=Homo sapiens GN=VWF PE=1 SV=4 81 6 P02675 FIBB_HUMAN Fibrinogen beta chain OS=Homo sapiens GN=FGB PE=1 SV=2 78 7 P01031 CO5_HUMAN Complement C5 OS=Homo sapiens GN=C5 PE=1 SV=4 66 8 P02768 ALBU_HUMAN Serum albumin OS=Homo sapiens GN=ALB PE=1 SV=2 66 9 P00450 CERU_HUMAN Ceruloplasmin OS=Homo sapiens GN=CP PE=1 SV=1 64 10 P02671 FIBA_HUMAN Fibrinogen alpha chain OS=Homo sapiens GN=FGA PE=1 SV=2 58 11 P08603 CFAH_HUMAN Complement factor H OS=Homo sapiens GN=CFH PE=1 SV=4 56 12 P02787 TRFE_HUMAN Serotransferrin OS=Homo sapiens GN=TF PE=1 SV=3 54 13 P00747 PLMN_HUMAN Plasminogen OS=Homo sapiens GN=PLG PE=1 SV=2 48 14 P02679 FIBG_HUMAN Fibrinogen gamma chain OS=Homo sapiens GN=FGG PE=1 SV=3 47 15 P01871 IGHM_HUMAN Ig mu chain C region OS=Homo sapiens GN=IGHM PE=1 SV=3 41 16 P04003 C4BPA_HUMAN C4b-binding protein alpha chain OS=Homo sapiens GN=C4BPA PE=1 SV=2 37 17 Q9Y6R7 FCGBP_HUMAN IgGFc-binding protein OS=Homo sapiens GN=FCGBP PE=1 SV=3 30 18 O43866 CD5L_HUMAN CD5 antigen-like OS=Homo
    [Show full text]