Applying Machine Learning to Breast Cancer Gene Expression Data to Predict Survival Likelihood Pegah Tavangar Thesis Submitted T

Total Page:16

File Type:pdf, Size:1020Kb

Applying Machine Learning to Breast Cancer Gene Expression Data to Predict Survival Likelihood Pegah Tavangar Thesis Submitted T Applying Machine Learning to Breast Cancer Gene Expression Data to Predict Survival Likelihood Pegah Tavangar Thesis submitted to the University of Ottawa in partial Fulfillment of the requirements for the Master of Science in Chemistry and Biomolecular Sciences Department of Chemistry and Biomolecular Sciences Faculty of Science University of Ottawa © Pegah Tavangar, Ottawa, Canada, 2020 Abstract Analyzing the expression level of thousands of genes will provide additional information beneficial in improving cancer therapy or synthesizing a new drug. In this project, the expression of 48807 genes from primary human breast tumors cells was analyzed. Humans cannot make sense of such a large volume of gene expression data from each person. Therefore, we used Machine Learning as an automated system that can learn from the data and be able to predict results from the data. This project presents the use of Machine Learning to predict the likelihood of survival in breast cancer patients using gene expression profiling. Machine Learning techniques, such as Logistic Regression, Support Vector Machines, Random Forest, and different Feature Selection techniques were used to find essential genes that lead to breast cancer or help a patient to live longer. This project describes the evaluation of different Machine Learning algorithms to classify breast cancer tumors into two groups of high and low survival. ii Acknowledgments I would like to thank Dr. Jonathan Lee for providing me the opportunity to work with him on an exciting project. I would like to recognize the invaluable counsel that you all provided during my research. It was my honor to work with some other professors in the Faculty of Medicine, such as Dr. Mathieu Lavallée-Adam and Dr. Theodore Perkins. Thank you for the support, advice, and friendship over the years. Their valuable guidance and advice over the past years significantly shaped my work. Finally, I would like to thank all my friends and family who have supported me throughout my graduate studies. iii Table of contents: Contents Abstract ......................................................................................................................................................... ii Acknowledgments ........................................................................................................................................ iii Table of contents: ........................................................................................................................................ iv List of Abbreviations .................................................................................................................................... vi List of Figures .............................................................................................................................................. vii List of Tables ................................................................................................................................................ ix 1: Introduction .............................................................................................................................................. 1 1.1: Cancer .................................................................................................................................................... 1 1.1.1: Breast Cancer ...................................................................................................................................... 2 1.1.2: Illumina bead chip: .............................................................................................................................. 5 1.1.3: Machine Learning ............................................................................................................................... 8 1.1.4: Supervised and Unsupervised Learning .............................................................................................. 9 1:2: Machine learning algorithms ............................................................................................................... 11 1.2.1: Logistic Regression ............................................................................................................................ 11 1.2.2: Loss Function .................................................................................................................................... 11 1.2.3: Random Forest .................................................................................................................................. 12 1.2.4: Support Vector Machines ................................................................................................................. 14 1.2.5: Linear Kernel ..................................................................................................................................... 18 1.2.6: Polynomial kernel ............................................................................................................................. 18 1.2.7: Gaussian kernel ................................................................................................................................. 18 1.2.8: Radial Basis Function (RBF) ............................................................................................................... 19 1.2.9: Sigmoid kernel .................................................................................................................................. 19 1.2.10: Clustering ........................................................................................................................................ 20 1.3: Evaluation Metrics ............................................................................................................................... 21 1.3.1: Accuracy ............................................................................................................................................ 21 1.3.2: Area Under The Curve ....................................................................................................................... 21 1.4: Cross-Validation ................................................................................................................................... 22 1.5: Feature Selection ................................................................................................................................. 25 1.5.1: Recursive Feature Elimination .......................................................................................................... 25 1.5.2: Least Absolute Shrinkage and Selection Operator (Lasso) ............................................................... 26 2: Materials and Methods ....................................................................................................................... 28 iv 2.1: Dataset preparation ......................................................................................................................... 28 2.2: Validation Test ................................................................................................................................. 30 2.3: Logistic Regression performance ..................................................................................................... 30 2.3.1: Logistic Regression Cost Function ................................................................................................. 33 2.4: Scikit-learn ....................................................................................................................................... 34 2.5: Random Forest as a Feature Selection method ............................................................................... 35 2.5.1: Gini Impurity and Entropy ............................................................................................................. 35 2.5.2: Random Forest Hyperparameter tuning using Scikit learn library ............................................... 36 2.6: Support Vector Machine Cost Function ........................................................................................... 39 2.6.1: Tuning Support Vector Machine parameters ............................................................................... 43 2.7: Feature selection ............................................................................................................................. 43 2.7.1: Recursive Feature Elimination ...................................................................................................... 43 2.7.2: Regularization parameter in Lasso................................................................................................ 44 2.8: Evaluation Metrics ........................................................................................................................... 44 3. Results: ............................................................................................................................................ 45 3.1: Logistic Regression ....................................................................................................................... 48 3.2: Random Forest ............................................................................................................................. 49 3.3: SVM .............................................................................................................................................. 50 3.4: Recursive Feature Elimination ....................................................................................................
Recommended publications
  • Doctorat De L'université De Toulouse
    HÈSE En vue de l'obtention du DOCTORAT DE L’UNIVERSITÉ DE TOULOUSE Délivré par l'Université Toulouse III - Paul Sabatier Discipline : Biochimie Présentée et soutenue par Violette GAUTIER Le 18 décembre 2012 Développement de méthodes quantitatives sans marquage pour l’étude protéomique des cellules endothéliales JURY Dr. Philippe Marin, IGF/CNRS (Montpellier) – Rapporteur Dr Thierry Rabilloud, iRTSV/LCBM/CNRS (Grenoble) - Rapporteur Dr Sarah Cianferani-Sanglier, IPHC/CNRS (Strasbourg) - Examinatrice Dr. Bernard Monsarrat, IPBS/CNRS (Toulouse) - Directeur de thèse Dr. Anne Gonzalez de Peredo, IPBS/CNRS (Toulouse) – Co-directrice de thèse Pr. Jean-François Arnal, Université Paul Sabatier (Toulouse) - Président Ecole doctorale : Biologie - Santé - Biotechnologies Unité de recherche : Institut de Pharmacologie et de Biologie Structurale (UMR5089, CNRS) Directeurs de Thèse : Bernard Monsarrat, Anne Gonzalez de Peredo Rapporteurs : Philippe Marin, Thierry Rabilloud RESUME La compréhension du fonctionnement des systèmes biologiques, dont les protéines sont les principaux effecteurs, est un défi majeur en biologie. La protéomique est aujourd’hui l’outil incontournable pour l’étude des protéines. Au cours de ma thèse, j’ai donc utilisé différentes approches protéomiques pour répondre à plusieurs questions biologiques autour des cellules endothéliales, concernant l’étude de mécanismes fonctionnels de protéines d’intérêt ainsi que des processus inflammatoires au sein de ces cellules. Ces différentes études ont nécessité la mise en place et l’optimisation de méthodes de quantification sans marquage (« label free ») essentielles à la fois pour la caractérisation de complexes protéiques et pour l’analyse de protéomes entiers. Cette thèse décrit ainsi dans un premier temps l’utilisation de telles approches pour l’analyse de complexes immunopurifiés dans laquelle un enjeu important consiste souvent à discriminer de façon non ambiguë les composants bona fide du complexe par rapport aux contaminants non-spécifiques.
    [Show full text]
  • Supplementary Information
    Supplementary Information This text file includes: Supplementary Methods Supplementary Figure 1-13, 15-30 Supplementary Table 1-8, 16, 20-21, 23, 25-37, 40-41 1 1. Samples, DNA extraction and genome sequencing 1.1 Ethical statements and sample storage The ethical statements of collecting and processing tissue samples for each species are listed as follows: Myotis myotis: All procedures were carried out in accordance with the ethical guidelines and permits (AREC-13-38-Teeling) delivered by the University College Dublin and the Préfet du Morbihan, awarded to Emma Teeling and Sébastien Puechmaille respectively. A single M. myotis individual was humanely sacrificed given that she had lethal injuries, and dissected. Rhinolophus ferrumequinum: All the procedures were conducted under the license (Natural England 2016-25216-SCI-SCI) issued to Gareth Jones. The individual bat died unexpectedly and suddenly during sampling and was dissected immediately. Pipistrellus kuhlii: The sampling procedure was carried out following all the applicable national guidelines for the care and use of animals. Sampling was done in accordance with all the relevant wildlife legislation and approved by the Ministry of Environment (Ministero della Tutela del Territorio e del Mare, Aut.Prot. N˚: 13040, 26/03/2014). Molossus molossus: All sampling methods were approved by the Ministerio de Ambiente de Panamá (SE/A-29-18) and by the Institutional Animal Care and Use Committee of the Smithsonian Tropical Research Institute (2017-0815-2020). Phyllostomus discolor: P. discolor bats originated from a breeding colony in the Department Biology II of the Ludwig-Maximilians-University in Munich. Approval to keep and breed the bats was issued by the Munich district veterinary office.
    [Show full text]
  • Product Datasheet INAVA Overexpression
    Product Datasheet INAVA Overexpression Lysate NBP2-06832 Unit Size: 0.1 mg Store at -80C. Avoid freeze-thaw cycles. Protocols, Publications, Related Products, Reviews, Research Tools and Images at: www.novusbio.com/NBP2-06832 Updated 3/17/2020 v.20.1 Earn rewards for product reviews and publications. Submit a publication at www.novusbio.com/publications Submit a review at www.novusbio.com/reviews/destination/NBP2-06832 Page 1 of 2 v.20.1 Updated 3/17/2020 NBP2-06832 INAVA Overexpression Lysate Product Information Unit Size 0.1 mg Concentration The exact concentration of the protein of interest cannot be determined for overexpression lysates. Please contact technical support for more information. Storage Store at -80C. Avoid freeze-thaw cycles. Buffer RIPA buffer Target Molecular Weight 72.7 kDa Product Description Description Transient overexpression lysate of chromosome 1 open reading frame 106 (C1orf106), transcript variant 1 The lysate was created in HEK293T cells, using Plasmid ID RC215863 and based on accession number NM_018265. The protein contains a C-MYC/DDK Tag. Gene ID 55765 Gene Symbol C1ORF106 Species Human Notes HEK293T cells in 10-cm dishes were transiently transfected with a non-lipid polymer transfection reagent specially designed and manufactured for large volume DNA transfection. Transfected cells were cultured for 48hrs before collection. The cells were lysed in modified RIPA buffer (25mM Tris-HCl pH7.6, 150mM NaCl, 1% NP-40, 1mM EDTA, 1xProteinase inhibitor cocktail mix, 1mM PMSF and 1mM Na3VO4, and then centrifuged to clarify the lysate. Protein concentration was measured by BCA protein assay kit.This product is manufactured by and sold under license from OriGene Technologies and its use is limited solely for research purposes.
    [Show full text]
  • Identification of New Substrates and Physiological Relevance
    Université de Montréal The Multifaceted Proprotein Convertases PC7 and Furin: Identification of New Substrates and Physiological Relevance Par Stéphanie Duval Biologie Moléculaire, Faculté de médecine Thèse présentée en vue de l’obtention du grade de Philosophiae doctor (Ph.D) en Biologie moléculaire, option médecine cellulaire et moléculaire Avril 2020 © Stéphanie Duval, 2020 Résumé Les proprotéines convertases (PCs) sont responsables de la maturation de plusieurs protéines précurseurs et sont impliquées dans divers processus biologiques importants. Durant les 30 dernières années, plusieurs études sur les PCs se sont traduites en succès cliniques, toutefois les fonctions spécifiques de PC7 demeurent obscures. Afin de comprendre PC7 et d’identifier de nouveaux substrats, nous avons généré une analyse protéomique des protéines sécrétées dans les cellules HuH7. Cette analyse nous a permis d’identifier deux protéines transmembranaires de fonctions inconnues: CASC4 et GPP130/GOLIM4. Au cours de cette thèse, nous nous sommes aussi intéressé au rôle de PC7 dans les troubles comportementaux, grâce à un substrat connu, BDNF. Dans le chapitre premier, je présenterai une revue de la littérature portant entre autres sur les PCs. Dans le chapitre II, l’étude de CASC4 nous a permis de démontrer que cette protéine est clivée au site KR66↓NS par PC7 et Furin dans des compartiments cellulaires acides. Comme CASC4 a été rapporté dans des études de cancer du sein, nous avons généré des cellules MDA- MB-231 exprimant CASC4 de type sauvage et avons démontré une diminution significative de la migration et de l’invasion cellulaire. Ce phénotype est causé notamment par une augmentation du nombre de complexes d’adhésion focale et peut être contrecarré par la surexpression d’une protéine CASC4 mutante ayant un site de clivage optimale par PC7/Furin ou encore en exprimant une protéine contenant uniquement le domaine clivé N-terminal.
    [Show full text]
  • 9-Azido Analogs of Three Sialic Acid Forms for Metabolic Remodeling Of
    Supporting Information 9-Azido Analogs of Three Sialic Acid Forms for Metabolic Remodeling of Cell-Surface Sialoglycans Bo Cheng,†,‡ Lu Dong,†,§ Yuntao Zhu,†,‡ Rongbing Huang,†,‡ Yuting Sun,†,‖ Qiancheng You,†,‡ Qitao Song,†,§ James C. Paton, ∇ Adrienne W. Paton,∇ and Xing Chen*,†,‡,§,⊥,# †College of Chemistry and Molecular Engineering, ‡Beijing National Laboratory for Molecular Sciences, §Peking−Tsinghua Center for Life Sciences,‖Academy for Advanced Interdisciplinary Studies, ⊥Synthetic and Functional Biomolecules Center, and #Key Laboratory of Bioorganic Chemistry and Molecular Engineering of Ministry of Education, Peking University, Beijing 100871, China ∇Research Centre for Infectious Diseases, Department of Molecular and Biomedical Science, University of Adelaide, Adelaide SA 5005, Australia Page S1 Table of Contents: Scheme S1.……………………………………………………….........……………. S3 Figure S1……………………………………………………..………..……………. S3 Figure S2……………………………………………………..………..…………… S4 Figure S3……………………………………………………..………..…………… S4 Figure S4……………………………………………………..………..…………… S5 Figure S5……………………………………………………..………..…………… S6 Figure S6……………………………………………………..………..…………….S7 Figure S7……………………………………………………..………..…………….S8 Figure S8……………………………………………………..………..…………….S9 Experimental Procedures……………………………….…........…………....S10-S27 Table S1………………………………………………..………..…………….S28-S48 Supporting Reference……………………………………………….......………...S49 Page S2 Scheme S1. Synthesis of 9AzNeu5Gc Figure S1: a, b, c, d) Representative scatter plots (FSC vs. SSC) and histograms of flow cytometry analysis
    [Show full text]
  • Transcriptomic and Proteomic Profiling Provides Insight Into
    BASIC RESEARCH www.jasn.org Transcriptomic and Proteomic Profiling Provides Insight into Mesangial Cell Function in IgA Nephropathy † † ‡ Peidi Liu,* Emelie Lassén,* Viji Nair, Celine C. Berthier, Miyuki Suguro, Carina Sihlbom,§ † | † Matthias Kretzler, Christer Betsholtz, ¶ Börje Haraldsson,* Wenjun Ju, Kerstin Ebefors,* and Jenny Nyström* *Department of Physiology, Institute of Neuroscience and Physiology, §Proteomics Core Facility at University of Gothenburg, University of Gothenburg, Gothenburg, Sweden; †Division of Nephrology, Department of Internal Medicine and Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan; ‡Division of Molecular Medicine, Aichi Cancer Center Research Institute, Nagoya, Japan; |Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden; and ¶Integrated Cardio Metabolic Centre, Karolinska Institutet Novum, Huddinge, Sweden ABSTRACT IgA nephropathy (IgAN), the most common GN worldwide, is characterized by circulating galactose-deficient IgA (gd-IgA) that forms immune complexes. The immune complexes are deposited in the glomerular mesangium, leading to inflammation and loss of renal function, but the complete pathophysiology of the disease is not understood. Using an integrated global transcriptomic and proteomic profiling approach, we investigated the role of the mesangium in the onset and progression of IgAN. Global gene expression was investigated by microarray analysis of the glomerular compartment of renal biopsy specimens from patients with IgAN (n=19) and controls (n=22). Using curated glomerular cell type–specific genes from the published literature, we found differential expression of a much higher percentage of mesangial cell–positive standard genes than podocyte-positive standard genes in IgAN. Principal coordinate analysis of expression data revealed clear separation of patient and control samples on the basis of mesangial but not podocyte cell–positive standard genes.
    [Show full text]
  • Investigating a Microrna-499-5P Network During Cardiac Development
    Investigating a microRNA-499-5p network during cardiac development Thesis for a PhD degree Submitted to University of East Anglia by Johannes Gottfried Wittig This copy of the thesis has been supplied on condition that anyone who consults it is understood to recognise that its copyright rests with the author and that use of any information derived therefrom must be in accordance with current UK Copyright Law. In addition, any quotation or extract must include full attribution. Principal Investigator: Prof. Andrea Münsterberg Submission Date: 10.05.2019 Declaration of own work Declaration of own work I, Johannes Wittig, confirm that the work for the report with the title: “Investigating a microRNA-499-5p network during cardiac development” was undertaken by myself and that no help was provided from other sources than those allowed. All sections of the report that use quotes or describe an argument or development investigated by other scientist have been referenced, including all secondary literature used, to show that this material has been adopted to support my report. Place/Date Signature II Acknowledgements Acknowledgements I am very happy that I had the chance to be part of the Münsterberg-lab for my PhD research, therefore I would very much like to thank Andrea Münsterberg for offering me this great position in her lab. I especially want to thank her for her patience with me in all the moments where I was impatient and complained about slow progress. I also would like to say thank you for the incredible freedom I had during my PhD work and the support she gave me in the lab but also the understanding for all my non-science related activities.
    [Show full text]
  • Mouse Inava Conditional Knockout Project (CRISPR/Cas9)
    https://www.alphaknockout.com Mouse Inava Conditional Knockout Project (CRISPR/Cas9) Objective: To create a Inava conditional knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering. Strategy summary: The Inava gene (NCBI Reference Sequence: NM_028872.3 ; Ensembl: ENSMUSG00000041605 ) is located on Mouse chromosome 1. 10 exons are identified, with the ATG start codon in exon 1 and the TAA stop codon in exon 10 (Transcript: ENSMUST00000120339). Exon 2 will be selected as conditional knockout region (cKO region). Deletion of this region should result in the loss of function of the Mouse Inava gene. To engineer the targeting vector, homologous arms and cKO region will be generated by PCR using BAC clone RP23-100N3 as template. Cas9, gRNA and targeting vector will be co-injected into fertilized eggs for cKO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note: Exon 2 starts from about 10.04% of the coding region. The knockout of Exon 2 will result in frameshift of the gene. The size of intron 1 for 5'-loxP site insertion: 6240 bp, and the size of intron 2 for 3'-loxP site insertion: 609 bp. The size of effective cKO region: ~649 bp. The cKO region does not have any other known gene. Page 1 of 8 https://www.alphaknockout.com Overview of the Targeting Strategy Wildtype allele gRNA region 5' gRNA region 3' 1 2 3 4 5 10 Targeting vector Targeted allele Constitutive KO allele (After Cre recombination) Legends Exon of mouse Inava Homology arm cKO region loxP site Page 2 of 8 https://www.alphaknockout.com Overview of the Dot Plot Window size: 10 bp Forward Reverse Complement Sequence 12 Note: The sequence of homologous arms and cKO region is aligned with itself to determine if there are tandem repeats.
    [Show full text]
  • Regulation of the Glutamate/Glutamine Cycle by Nitric Oxide in the Central Nervous System
    University of Pennsylvania ScholarlyCommons Publicly Accessible Penn Dissertations 2015 Regulation of the Glutamate/glutamine Cycle by Nitric Oxide in the Central Nervous System Karthik Anderson Raju University of Pennsylvania, [email protected] Follow this and additional works at: https://repository.upenn.edu/edissertations Part of the Biochemistry Commons, Biology Commons, and the Neuroscience and Neurobiology Commons Recommended Citation Raju, Karthik Anderson, "Regulation of the Glutamate/glutamine Cycle by Nitric Oxide in the Central Nervous System" (2015). Publicly Accessible Penn Dissertations. 1962. https://repository.upenn.edu/edissertations/1962 This paper is posted at ScholarlyCommons. https://repository.upenn.edu/edissertations/1962 For more information, please contact [email protected]. Regulation of the Glutamate/glutamine Cycle by Nitric Oxide in the Central Nervous System Abstract Nitric oxide (˙NO) is a critical contributor to glutamatergic neurotransmission in the central nervous system (CNS). Much of its influence is due ot the ability of this molecule to regulate protein structure and function through its posttranslational modification of cysteine esidues,r a process known as S- nitrosylation. However, little is known about the extent of this modification and its associated functional effects in the brain under physiological conditions. We employed mass spectrometry (MS)-based methodologies to interrogate the S-nitrosocysteine proteome in wild-type (WT), neuronal nitric oxide synthase-deficient (nNOS-/-),
    [Show full text]
  • INAVA Mouse Monoclonal Antibody [Clone ID: OTI6A6] – TA807187
    OriGene Technologies, Inc. 9620 Medical Center Drive, Ste 200 Rockville, MD 20850, US Phone: +1-888-267-4436 [email protected] EU: [email protected] CN: [email protected] Product datasheet for TA807187 INAVA Mouse Monoclonal Antibody [Clone ID: OTI6A6] Product data: Product Type: Primary Antibodies Clone Name: OTI6A6 Applications: IHC, WB Recommend Dilution: WB 1:2000, IHC 1:150 Reactivity: Human Host: Mouse Isotype: IgG1 Clonality: Monoclonal Immunogen: Human recombinant protein fragment corresponding to amino acids 1-270 of human C1ORF106(NP_060735) produced in E.coli. Formulation: PBS (PH 7.3) containing 1% BSA, 50% glycerol and 0.02% sodium azide. Concentration: 1 mg/ml Purification: Purified from mouse ascites fluids or tissue culture supernatant by affinity chromatography (protein A/G) Predicted Protein Size: 72.7 kDa Gene Name: chromosome 1 open reading frame 106 Database Link: NP_060735 Entrez Gene 55765 Human Synonyms: FLJ10901; MGC125608 This product is to be used for laboratory only. Not for diagnostic or therapeutic use. View online » ©2020 OriGene Technologies, Inc., 9620 Medical Center Drive, Ste 200, Rockville, MD 20850, US 1 / 3 INAVA Mouse Monoclonal Antibody [Clone ID: OTI6A6] – TA807187 Product images: HEK293T cells were transfected with the pCMV6- ENTRY control (Left lane) or pCMV6-ENTRY C1orf106 ([RC215863], Right lane) cDNA for 48 hrs and lysed. Equivalent amounts of cell lysates (5 ug per lane) were separated by SDS-PAGE and immunoblotted with anti-C1orf106 (1:2000). Positive lysates [LY413177] (100ug) and [LC413177] (20ug) can be purchased separately from OriGene. Immunohistochemical staining of paraffin- embedded Carcinoma of Human lung tissue using anti-C1orf106 mouse monoclonal antibody.
    [Show full text]
  • The Proteomic Analysis of Breast Cell Line Exosomes Reveals Disease Patterns and Potential Biomarkers Yousef Risha1, Zoran Minic2, Shahrokh M
    www.nature.com/scientificreports OPEN The proteomic analysis of breast cell line exosomes reveals disease patterns and potential biomarkers Yousef Risha1, Zoran Minic2, Shahrokh M. Ghobadloo3 & Maxim V. Berezovski 1,2,3* Cancer cells release small extracellular vesicles, exosomes, that have been shown to contribute to various aspects of cancer development and progression. Diferential analysis of exosomal proteomes from cancerous and non-tumorigenic breast cell lines can provide valuable information related to breast cancer progression and metastasis. Moreover, such a comparison can be explored to fnd potentially new protein biomarkers for early disease detection. In this study, exosomal proteomes of MDA-MB-231, a metastatic breast cancer cell line, and MCF-10A, a non-cancerous epithelial breast cell line, were identifed by nano-liquid chromatography coupled to tandem mass spectrometry. We also tested three exosomes isolation methods (ExoQuick, Ultracentrifugation (UC), and Ultrafltration– Ultracentrifugation) and detergents (n-dodecyl β-d-maltoside, Triton X-100, and Digitonin) for solubilization of exosomal proteins and enhanced detection by mass spectrometry. A total of 1,107 exosomal proteins were identifed in both cell lines, 726 of which were unique to the MDA-MB-231 breast cancer cell line. Among them, 87 proteins were predicted to be relevant to breast cancer and 16 proteins to cancer metastasis. Three exosomal membrane/surface proteins, glucose transporter 1 (GLUT-1), glypican 1 (GPC-1), and disintegrin and metalloproteinase domain-containing protein 10 (ADAM10), were identifed as potential breast cancer biomarkers and validated with Western blotting and high-resolution fow cytometry. We demonstrated that exosomes are a rich source of breast cancer-related proteins and surface biomarkers that may be used for disease diagnosis and prognosis.
    [Show full text]
  • Sequencing of Over 100,000 Individuals Identifies Multiple Genes and Rare Variants Associated with Crohns Disease Susceptibility
    medRxiv preprint doi: https://doi.org/10.1101/2021.06.15.21258641; this version posted June 21, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license . Sequencing of over 100,000 individuals identifies multiple genes and rare variants associated with Crohns disease susceptibility Aleksejs Sazonovs1*, Christine R. Stevens2,3,4*, Guhan R. Venkataraman5*, Kai Yuan3,4*, Brandon Avila3,6, Maria T. Abreu7, Tariq Ahmad8, Matthieu Allez9, Gil Atzmon10,11, Aris Baras12, Jeffrey C. Barrett13, Nir Barzilai11,14, Laurent Beaugerie15, Ashley Beecham16,17, Charles Bernstein18, Alain Bitton19, Bernd Bokemeyer20, Andrew Chan21, Daniel Chung22, Isabelle Cleynen23, Jacques Cosnes24, David J. Cutler25,26,27, Allan Daly28, Oriana M. Damas7, Lisa W. Datta29, Noor Dawany30, Marcella Devoto30,31,32, Angela Dobes33, Sheila Dodge34, Eva Ellinghaus35, Laura Fachal1, Martti Farkkila36, William Faubion37, Manuel Ferreira12, Denis Franchimont38, Stacey B. Gabriel34, Michel Georges39, Kyle Gettler40, Mamta Giri40, Benjamin Glaser41, Siegfried Goerg42, Philippe Goyette43, Daniel Graham44,45,46, Eija Hämäläinen47, Talin Haritunians48, Graham A. Heap8, Mikko Hiltunen49, Marc Hoeppner50, Julie E. Horowitz51, Peter Irving52,53, Vivek Iyer28, Chaim Jalas54, Judith Kelsen30, Hamed Khalili21, Barbara S. Kirschner55, Kimmo Kontula56, Jukka T. Koskela47, Subra Kugathasan26, Juozas Kupcinskas57, Christopher A. Lamb58,59, Matthias Laudes42, Debby Laukens60, Adam P. Levine61, James Lewis33,62, Claire Liefferinckx38, Britt-Sabina Loescher35, Edouard Louis63, John Mansfield58,59, Sandra May35, Jacob L. McCauley16,17, Emebet Mengesha48, Myriam Mni39, Paul Moayyedi64, Christopher J. Moran22, Rodney Newberry65, Sirimon O’Charoen66, David T.
    [Show full text]