Machine Learning Approaches for Breast Cancer Survivability Prediction

Total Page:16

File Type:pdf, Size:1020Kb

Machine Learning Approaches for Breast Cancer Survivability Prediction University of Windsor Scholarship at UWindsor Electronic Theses and Dissertations Theses, Dissertations, and Major Papers 7-7-2020 Machine Learning Approaches for Breast Cancer Survivability Prediction Quang Huy Pham University of Windsor Follow this and additional works at: https://scholar.uwindsor.ca/etd Recommended Citation Pham, Quang Huy, "Machine Learning Approaches for Breast Cancer Survivability Prediction" (2020). Electronic Theses and Dissertations. 8387. https://scholar.uwindsor.ca/etd/8387 This online database contains the full-text of PhD dissertations and Masters’ theses of University of Windsor students from 1954 forward. These documents are made available for personal study and research purposes only, in accordance with the Canadian Copyright Act and the Creative Commons license—CC BY-NC-ND (Attribution, Non-Commercial, No Derivative Works). Under this license, works must always be attributed to the copyright holder (original author), cannot be used for any commercial purposes, and may not be altered. Any other use would require the permission of the copyright holder. Students may inquire about withdrawing their dissertation and/or thesis from this database. For additional inquiries, please contact the repository administrator via email ([email protected]) or by telephone at 519-253-3000ext. 3208. Machine Learning Approaches for Breast Cancer Survivability Prediction by Pham Quang Huy A Dissertation Submitted to the Faculty of Graduate Studies through the School of Computer Science in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy at the University of Windsor Windsor, Ontario, Canada 2020 c Pham Quang Huy 2020 Machine Learning Approaches for Breast Cancer Survivability Prediction by Pham Quang Huy APPROVED BY: S. Houghten, External Examiner Brock University L. Porter Department of Biological Sciences A. Mukhopadhyay School of Computer Science P. Zadeh School of Computer Science A. Ngom, Advisor School of Computer Science L. Rueda, Co-Advisor School of Computer Science April 17, 2020 Declaration of Co-authorship / Previous Publication I. Co-Authorship I hereby declare that this thesis incorporates the outcome of the joint researches, as follows: Chapter 2 is based on a journal paper and a conference poster, co-authored with Eliseos Mucaki, Katherina Baranova, Iman Rezaeian, Dimo Angelov, and Dr. Peter Rogan, under the supervision of Dr. Alioune Ngom and Dr. Luis Rueda. Dr. Peter Rogan, Dr. Alioune Ngom and Dr. Luis Rueda designed the methodology and oversaw the project. SVM feature selection with MATLAB was automated by Dimo Angelov. Eliseos Mucaki and Katherina Baranova selected the initial gene signatures, and performed processing of the METABRIC data using SVM methods. Eliseos Mucaki and Iman Rezaeian performed the preprocessing of the METABRIC dataset using Ramdom Forest. Iman Rezaeian and I designed feature selection and classification modules using WEKA, in which I took the major tasks in conducting the extensive experiments to acquire the results shown in Tables 2, 3, 4, and 5 of the chapter. They are used for about one-third of the discussions in the paper. Dr. Peter Rogan and Eliseos Mucaki wrote the manuscript. All authors contributed in reviewing the manuscript. Chapter 3 is based on three conference papers and a conference poster. For the conference papers, I conducted the key ideas, primary contributions, experimental designs, data analysis, and writing. Dr. Alioune Ngom and Dr. Luis Rueda iii suggested the experimental datasets, and participated in discussing the ideas and provided feedback on the refinement of ideas and results, and reviewing of the manuscript. For the poster, I conducted the key ideas, primary contributions, experimental designs, data analysis, and writing. Mucaki and Rezaeian performed data preprocessing for some of the METABRIC datasets. Dr. Peter Rogan, Dr. Alioune Ngom and Dr. Luis Rueda participated in discussing the ideas and provided feedback on the refinement of ideas and editing of the manuscript. Chapter 4 was co-authored with Ashraf Abou Tabl and Abedalrhman Alkhateeb, under the supervision of Dr. Alioune Ngom and Dr. Luis Rueda. Ashraf Abou Tabl and Abedalrhman Alkhateeb equally contributed in applying the method and verifying the results. Ashraf Abou Tabl, Abedalrhman Alkhateeb, Dr. Alioune Ngom, and Dr. Luis Rueda all participated in discussing the ideas. I participated in an experimental setting, prepared pathway data for analyzing, and searched for gene biomarker evidence based on pathway data, the literature, and online tools. All authors contributed in writing and reviewing the manuscript. Chapter 5 is based on a conference paper, a conference poster, a conference ab- stract, and a paper in preparation. The conference paper was co-author with Jurko Guba, Mousa Gawanmeh, and Dr. Lisa Porter, under the supervision of Dr. Alioune Ngom. Dr. Alioune Ngom and I initiated the ideas and discussed the results. The primary contributions, experimental designs, data analysis, and the writing were performed by the author. Jurko Guba contributed to the data analysis and graphing results. Dr. Lisa Porter and Mousa Gawanmeh contributed iv to the biological analysis. Dr. Alioune Ngom, Dr. Lisa Porter, and Mousa Gawanmeh contributed in editing the manuscript as well. The poster was co-authored with Jurko Guba, and Dr. Lisa Porter, under the supervision of Dr. Alioune Ngom, and Dr. Luis Rueda. Dr. Alioune Ngom and I initiated the study. Dr. Alioune Ngom and Dr. Luis Rueda participated in discussing the ideas and provided feedback on the refinement of ideas and results. I performed data processing, experimental designs, and writing. Jurko Guba, Dr. Lisa Porter, and I participated in data analysis. Dr. Lisa Porter, Dr. Alioune Ngom and Dr. Luis Rueda participated in reviewing the manuscript. The conference abstract was co-authored with William Klassen, under the super- vision of Dr. Alioune Ngom, and Dr. Luis Rueda. Dr. Alioune Ngom, Dr. Luis Rueda, and I initiated the study. I performed data processing, data analysis, and writing. William Klassen conducted the experiments. All authors have contributed in discussing the ideas and reviewing the manuscript. The paper in preparation is co-authored with Mousa Gawanmeh, under the su- pervision of Dr. Alioune Ngom and Dr. Luis Rueda. Dr. Alioune Ngom, Dr. Luis Rueda, and I participated in discussing the ideas and results. The primary contributions and ex- perimental designs were performed by the author. Mousa Gawanmeh and I analyzed the results and prepared the manuscript. All authors participated in reviewing the manuscript. Chapter 6 is based on a conference paper and a paper in preparation under the supervision of Dr. Alioune Ngom and Dr. Luis Rueda. I performed the primary contributions, experimental designs, data analysis, and writing. Dr. Alioune Ngom and v Dr. Luis Rueda participated in discussing the ideas and results and provided feedback on the refinement of ideas and editing of the manuscript. Chapter 7 was co-authored with Tran, D., Duong, N. B., and Dr. Fournier-Viger, P., under the supervision of Dr. Alioune Ngom. I performed the key ideas, primary contributions, experimental designs, and most of the coding. Tran, D. and I conducted a literature survey, data analysis, and writing. Duong, N. B. performed data processing and graphing. Dr. Alioune Ngom and Dr. Fournier-Viger participated in discussing the ideas and provided feedback on the refinement of ideas and editing of the manuscript. Appendix A was co-authored with Mangalakumar Naveen and Abed Alkhateeb, under the supervision of Dr. Alioune Ngom and Dr. Luis Rueda. Dr. Alioune Ngom, Dr. Luis Rueda, and Abed Alkhateeb issued the key ideas. Abed Alkhateeb and Mangalaku- mar Naveen performed experimental designs and data analysis. Mangalakumar Naveen performed graphing and writing. I contributed in discussing the ideas, verifying the cod- ing, and searching for biological insight. All authors participated in discussing the result and reviewing the manuscript. The papers in the corresponding chapters were edited with some extensions or grammatical error corrections while the original structures, ideas, main results, and dis- cussions have been preserved. I am aware of the University of Windsor Senate Policy on Authorship and I certify that I have properly acknowledged the contribution of other researchers to my thesis, and have obtained written permission from each of the co-author(s) to include the above material(s) in my thesis. vi I certify that, with the above qualification, this thesis, and the research to which it refers, is the product of my own. II. Previous Publications This thesis includes 4 original papers that have been previously published/sub- mitted for publication in peer reviewed journals, as follows: Dissertation Publication title Publication chapter status published Mucaki, E. J., Baranova, K., Pham, H. Q., Rezaeian, Chapter 2 (journal) I., Angelov, D., Ngom, A., Rueda, L., and Rogan, P. K. (2016). Predicting outcomes of hormone and chemother- apy in the molecular taxonomy of breast Cancer interna- tional consortium (METABRIC) study by biochemically- inspired machine learning. F1000Research, 5. published Iman Rezaeian, Eliseos Mucaki, Katherina Baranova, (poster) Huy Pham Quang, Dimo Angelov, Lucian Ilie, Alioune Ngom, Luis Rueda and Peter Rogan. Predicting patient outcomes of hormone therapy in the METABRIC breast cancer study. The GLBIO/CCBC Great Lakes Bioinfor- matics and the Canadian Computational Biology Con- ference
Recommended publications
  • Do the Hominid-Specific Regions of X–Y Homology Contain
    Do the Hominid-Specific Regions of X–Y Homology Contain Candidate Genes Potentially Involved in a Critical Event Linked to Speciation? CAROLE A. SARGENT, PATRICIA BLANCO & NABEEL A. AFFARA Summary. It has been postulated that the critical events leading to major differences between humans and the great apes (such as lan- guage and lateralisation of the brain) are associated with major changes on sex chromosomes. Regions of homology between the human sex chromosomes have arisen at different points during mammalian evolution. The two largest blocks are specific to hominids, having appeared at some time after the divergence of humans and chimpanzees. These are the second pairing region found at the telomeres of the sex chromosome long arms and a region of homology between Xq21.3 (X chromosome long arm) and Yp11 (Y chromosome short arm). Questions arise as to whether (1) these regions of the sex chromosomes contain func- tional genes and (2) these genes might be candidates for the differ- ences in cognitive function that distinguish modern humans from their ancestors. Furthermore, divergence between functional sequences on the X and the Y,and the alteration of the immediate environment around a locus by rearrangements, may allow for the acquisition and evolution of genes conferring more subtle sexually dimorphic variation. INTRODUCTION THE MAMMALIAN SEX chromosomes have evolved from an ancestral pair of autosomes (Ohno, 1967). The dominant male-determining gene evolved on the proto-Y, and became genetically sequestered from the rest of the genome through the suppression of recombination over most of the length of the proto-Y with the proto-X.
    [Show full text]
  • A Genetic Screen for Human Genes Suppressing FUS Induced Toxicity
    G3: Genes|Genomes|Genetics Early Online, published on April 10, 2020 as doi:10.1534/g3.120.401164 1 A Genetic Screen for Human Genes Suppressing FUS Induced Toxicity in Yeast 2 3 Elliott Hayden,*,1 Shuzhen Chen,*,1 Abagail Chumley,* Chenyi Xia,† Quan Zhong,*,2 and Shulin Ju*,2 4 5 *Department of Biological Sciences, Wright State University, Dayton, OH 45435 6 †School of Basic Medicine, Shanghai University of Traditional Medicine, Shanghai, China 201203 7 8 1These authors contributed equally to this work. 9 2Corresponding author: E-mail: [email protected]; [email protected] 10 11 ABSTRACT 12 FUS is a nucleic acid binding protein that, when mutated, cause a subset of familial amyotrophic lateral 13 sclerosis (ALS). Expression of FUS in yeast recapitulates several pathological features of the disease- 14 causing mutant proteins, including nuclear to cytoplasmic translocation, formation of cytoplasmic 15 inclusions, and cytotoxicity. Genetic screens using the yeast model of FUS have identified yeast genes 16 and their corresponding human homologs suppressing FUS induced toxicity in yeast, neurons and 17 animal models. To expand the search for human suppressor genes of FUS induced toxicity, we carried 18 out a genome-scale genetic screen using a newly constructed library containing 13570 human genes 19 cloned in an inducible yeast-expression vector. Through multiple rounds of verification, we found 37 20 human genes that, when overexpressed, suppress FUS induced toxicity in yeast. Human genes with 21 DNA or RNA binding functions are overrepresented among the identified suppressor genes, supporting 22 that perturbations of RNA metabolism is a key underlying mechanism of FUS toxicity.
    [Show full text]
  • Elucidating the Constitutional Genetic Basis of Multiple Primary Tumours
    Elucidating the constitutional genetic basis of multiple primary tumours James William Whitworth Gonville and Caius College February 2019 This dissertation is submitted for the degree of Doctor of Philosophy Preface This dissertation is the result of my own work and includes nothing which is the outcome of work done in collaboration except as declared in the Preface and specified in the text. It is not substantially the same as any that I have submitted, or, is being concurrently submitted for a degree or diploma or other qualification at the University of Cambridge or any other University or similar institution except as declared in the Preface and specified in the text. I further state that no substantial part of my dissertation has already been submitted, or, is being concurrently submitted for any such degree, diploma or other qualification at the University of Cambridge or any other University or similar institution except as declared in the Preface and specified in the text It does not exceed the prescribed word limit for the relevant Degree Committee. Acknowledgments I would like to extend my sincere thanks to the following people. To my supervisor, Eamonn Maher, for his mentorship over a time period longer than the programme of study outlined in this thesis. I am also grateful to Marc Tischkowitz, my second supervisor, for his encouragement and input. To my colleagues past and present in the Academic Department of Medical Genetics for their insights, collaboration, advice, and camaraderie over the past few years. In particular Ruth Casey, Graeme Clark, France Docquier, Ellie Fewings, Benoit Lang-Leung, Ezequiel Martin, Eguzkine Ochoa, Faye Rodger, Phil Smith and Hannah West.
    [Show full text]
  • Evolutionary Epigenomics – Identifying Functional Genome Elements by Epigenetic Footprints in the Dna
    EVOLUTIONARY EPIGENOMICS – IDENTIFYING FUNCTIONAL GENOME ELEMENTS BY EPIGENETIC FOOTPRINTS IN THE DNA DISSERTATION ZUR ERLANGUNG DES GRADES DES DOKTORS DER NATURWISSENSCHAFTEN DER NATURWISSENSCHAFTLICHEN -TECHNISCHEN FAKULTÄTEN DER UNIVERSITÄT DES SAARLANDES EINGEREICHT VON LARS FEUERBACH SAARBRÜCKEN , 2014 Tag des Kolloquiums: 16.1.2014 Dekan der Fakultät: Prof. Dr. Mark Groves Vorsitzender des Prüfungsausschusses: Prof. Dr. Gerhard Weikum Gutachter: Prof. Dr. Dr. Thomas Lengauer Prof. Dr. Jotun Hein Beisitzer: Dr. Glenn Lawyer ii Abstract Over the last decade, advances in genome sequencing have substantially increased the amount of genomic DNA sequences available. While these rich resources have improved our understanding of genome function, research of the epigenome as a transient but heritable memory system of the cell has only profited from this development indirectly. Although epigenetic information in the form of DNA methylation is not directly encoded in the genomic nucleotide sequence, it increases the mutation rate of cytosine-guanine dinucleotides by the CpG decay effect, and thus leaves epigenetic footprints in the DNA. This thesis proposes four approaches to facilitate this information for research. For largely uncharacterized genomes, CgiHunter presents an exhaustive algorithm for an unbiased DNA sequence-based annotation of CpG islands as regions that are protected from CpG decay . For species with well characterized point mutation frequencies, EqiScore identifies regions that evolve under distinct DNA methylation levels. Furthermore, the derived equilibrium distributions for methylated and unmethylated genome regions predict the evolutionary robustness of transcription factor binding site motifs against the CpG decay effect. The AluJudge annotation and underlying L-score provide a method to identify putative active copies of CpG-rich transposable elements within genomes.
    [Show full text]
  • Integrated High-Resolution Physical and Comparative Gene
    INTEGRATED HIGH-RESOLUTION PHYSICAL AND COMPARATIVE GENE MAPS IN HORSES A Dissertation by CANDICE LEA BRINKMEYER LANGFORD Submitted to the Office of Graduate Studies of Texas A&M University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY December 2006 Major Subject: Genetics INTEGRATED HIGH-RESOLUTION PHYSICAL AND COMPARATIVE GENE MAPS IN HORSES A Dissertation by CANDICE LEA BRINKMEYER LANGFORD Submitted to the Office of Graduate Studies of Texas A&M University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Approved by: Chair of Committee, Bhanu P. Chowdhary Committee Members, James E. Womack Loren C. Skow Terje Raudsepp William J. Murphy Chair of Genetics Faculty, James Wild December 2006 Major Subject: Genetics iii ABSTRACT Integrated High-Resolution Physical and Comparative Gene Maps in Horses. (December 2006) Candice Lea Brinkmeyer Langford, B.S., Texas A&M University Chair of Advisory Committee: Dr. Bhanu Chowdhary High-resolution physically ordered gene maps for the horse (Equus caballus, ECA) are essential to the identification of genes associated with hereditary diseases and traits of interest like fertility, coat color, and disease resistance or susceptibility. Such maps also serve as foundations for genome comparisons across species and form the basis to study chromosome evolution. In this study seven equine chromosomes (ECA6, 7, 10, 15, 18, 21 and X) corresponding to human chromosomes (HSA) 2, 19 and X were selected for high-resolution mapping on the basis of their potential involvement in diseases and conditions of importance to horses. To accomplish this, gene- and sequence-specific markers were generated and genotyped on the TAMU 5000rad horse x hamster RH panel.
    [Show full text]
  • List of the Differentially Expressed Genes Supplemantary Table 5
    List of the Differentially Expressed Genes Supplemantary Table 5: Manually curated description of all the differentially expressed genes (DEGs) for all the group comarisons (DEX vs. CTR and DEX-CLEN vs. CTR) DEGs (DEX vs. CTR), FDR 0-5% Glucocorticoid/DEX related genes ENTREZ Gene Gene Full Name FC * NCBI Gene Card ID cytochrome P450, subfamily I (aromatic http://www.genecards.org/cgi- CYP1A1 282870 -10.41 http://www.ncbi.nlm.nih.gov/gene/?term=282870 compound-inducible), polypeptide 1 bin/carddisp.pl?gene=CYP1A1&search=282870 he protein encoded by this gene is a member of the immunophilin protein family, which play a role in immunoregulation and basic cellular processes involving FKBP5 (FK506 binding protein 5) is a protein-coding gene. Diseases protein folding and trafficking. This encoded protein is a cis- associated with FKBP5 include major depressive disorder, and trans prolyl isomerase that binds to the glucocorticoid resistance, and among its related super-pathways FKBP5 535704 FK506 binding protein 5 -8.65 immunosuppressants FK506 and rapamycin. It is thought to are Development Endothelin-1/EDNRA signaling and Integrated mediate calcineurin inhibition. It also interacts functionally Cancer pathway. GO annotations related to this gene include FK506 with mature hetero-oligomeric progesterone receptor binding and heat shock protein binding complexes along with the 90 kDa heat shock protein and P23 protein. LYVE1 (lymphatic vessel endothelial hyaluronan receptor 1) is a protein-coding gene. Diseases associated with LYVE1 include androgen insensitivity syndrome, partial, and complete androgen lymphatic vessel endothelial hyaluronan LYVE1 404179 -5.74 http://www.ncbi.nlm.nih.gov/gene/?term=404179 insensitivity syndrome, and among its related super-pathways are receptor 1 Metabolism of carbohydrates and Hyaluronan metabolism.
    [Show full text]