Machine Learning Approaches for Breast Cancer Survivability Prediction

University of Windsor Scholarship at UWindsor Electronic Theses and Dissertations Theses, Dissertations, and Major Papers 7-7-2020 Machine Learning Approaches for Breast Cancer Survivability Prediction Quang Huy Pham University of Windsor Follow this and additional works at: https://scholar.uwindsor.ca/etd Recommended Citation Pham, Quang Huy, "Machine Learning Approaches for Breast Cancer Survivability Prediction" (2020). Electronic Theses and Dissertations. 8387. https://scholar.uwindsor.ca/etd/8387 This online database contains the full-text of PhD dissertations and Masters’ theses of University of Windsor students from 1954 forward. These documents are made available for personal study and research purposes only, in accordance with the Canadian Copyright Act and the Creative Commons license—CC BY-NC-ND (Attribution, Non-Commercial, No Derivative Works). Under this license, works must always be attributed to the copyright holder (original author), cannot be used for any commercial purposes, and may not be altered. Any other use would require the permission of the copyright holder. Students may inquire about withdrawing their dissertation and/or thesis from this database. For additional inquiries, please contact the repository administrator via email ([email protected]) or by telephone at 519-253-3000ext. 3208. Machine Learning Approaches for Breast Cancer Survivability Prediction by Pham Quang Huy A Dissertation Submitted to the Faculty of Graduate Studies through the School of Computer Science in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy at the University of Windsor Windsor, Ontario, Canada 2020 c Pham Quang Huy 2020 Machine Learning Approaches for Breast Cancer Survivability Prediction by Pham Quang Huy APPROVED BY: S. Houghten, External Examiner Brock University L. Porter Department of Biological Sciences A. Mukhopadhyay School of Computer Science P. Zadeh School of Computer Science A. Ngom, Advisor School of Computer Science L. Rueda, Co-Advisor School of Computer Science April 17, 2020 Declaration of Co-authorship / Previous Publication I. Co-Authorship I hereby declare that this thesis incorporates the outcome of the joint researches, as follows: Chapter 2 is based on a journal paper and a conference poster, co-authored with Eliseos Mucaki, Katherina Baranova, Iman Rezaeian, Dimo Angelov, and Dr. Peter Rogan, under the supervision of Dr. Alioune Ngom and Dr. Luis Rueda. Dr. Peter Rogan, Dr. Alioune Ngom and Dr. Luis Rueda designed the methodology and oversaw the project. SVM feature selection with MATLAB was automated by Dimo Angelov. Eliseos Mucaki and Katherina Baranova selected the initial gene signatures, and performed processing of the METABRIC data using SVM methods. Eliseos Mucaki and Iman Rezaeian performed the preprocessing of the METABRIC dataset using Ramdom Forest. Iman Rezaeian and I designed feature selection and classification modules using WEKA, in which I took the major tasks in conducting the extensive experiments to acquire the results shown in Tables 2, 3, 4, and 5 of the chapter. They are used for about one-third of the discussions in the paper. Dr. Peter Rogan and Eliseos Mucaki wrote the manuscript. All authors contributed in reviewing the manuscript. Chapter 3 is based on three conference papers and a conference poster. For the conference papers, I conducted the key ideas, primary contributions, experimental designs, data analysis, and writing. Dr. Alioune Ngom and Dr. Luis Rueda iii suggested the experimental datasets, and participated in discussing the ideas and provided feedback on the refinement of ideas and results, and reviewing of the manuscript. For the poster, I conducted the key ideas, primary contributions, experimental designs, data analysis, and writing. Mucaki and Rezaeian performed data preprocessing for some of the METABRIC datasets. Dr. Peter Rogan, Dr. Alioune Ngom and Dr. Luis Rueda participated in discussing the ideas and provided feedback on the refinement of ideas and editing of the manuscript. Chapter 4 was co-authored with Ashraf Abou Tabl and Abedalrhman Alkhateeb, under the supervision of Dr. Alioune Ngom and Dr. Luis Rueda. Ashraf Abou Tabl and Abedalrhman Alkhateeb equally contributed in applying the method and verifying the results. Ashraf Abou Tabl, Abedalrhman Alkhateeb, Dr. Alioune Ngom, and Dr. Luis Rueda all participated in discussing the ideas. I participated in an experimental setting, prepared pathway data for analyzing, and searched for gene biomarker evidence based on pathway data, the literature, and online tools. All authors contributed in writing and reviewing the manuscript. Chapter 5 is based on a conference paper, a conference poster, a conference abstract, and a paper in preparation. The conference paper was co-author with Jurko Guba, Mousa Gawanmeh, and Dr. Lisa Porter, under the supervision of Dr. Alioune Ngom. Dr. Alioune Ngom and I initiated the ideas and discussed the results. The primary contributions, experimental designs, data analysis, and the writing were performed by the author. Jurko Guba contributed to the data analysis and graphing results. Dr. Lisa Porter and Mousa Gawanmeh contributed iv to the biological analysis. Dr. Alioune Ngom, Dr. Lisa Porter, and Mousa Gawanmeh contributed in editing the manuscript as well. The poster was co-authored with Jurko Guba, and Dr. Lisa Porter, under the supervision of Dr. Alioune Ngom, and Dr. Luis Rueda. Dr. Alioune Ngom and I initiated the study. Dr. Alioune Ngom and Dr. Luis Rueda participated in discussing the ideas and provided feedback on the refinement of ideas and results. I performed data processing, experimental designs, and writing. Jurko Guba, Dr. Lisa Porter, and I participated in data analysis. Dr. Lisa Porter, Dr. Alioune Ngom and Dr. Luis Rueda participated in reviewing the manuscript. The conference abstract was co-authored with William Klassen, under the supervision of Dr. Alioune Ngom, and Dr. Luis Rueda. Dr. Alioune Ngom, Dr. Luis Rueda, and I initiated the study. I performed data processing, data analysis, and writing. William Klassen conducted the experiments. All authors have contributed in discussing the ideas and reviewing the manuscript. The paper in preparation is co-authored with Mousa Gawanmeh, under the supervision of Dr. Alioune Ngom and Dr. Luis Rueda. Dr. Alioune Ngom, Dr. Luis Rueda, and I participated in discussing the ideas and results. The primary contributions and experimental designs were performed by the author. Mousa Gawanmeh and I analyzed the results and prepared the manuscript. All authors participated in reviewing the manuscript. Chapter 6 is based on a conference paper and a paper in preparation under the supervision of Dr. Alioune Ngom and Dr. Luis Rueda. I performed the primary contributions, experimental designs, data analysis, and writing. Dr. Alioune Ngom and v Dr. Luis Rueda participated in discussing the ideas and results and provided feedback on the refinement of ideas and editing of the manuscript. Chapter 7 was co-authored with Tran, D., Duong, N. B., and Dr. Fournier-Viger, P., under the supervision of Dr. Alioune Ngom. I performed the key ideas, primary contributions, experimental designs, and most of the coding. Tran, D. and I conducted a literature survey, data analysis, and writing. Duong, N. B. performed data processing and graphing. Dr. Alioune Ngom and Dr. Fournier-Viger participated in discussing the ideas and provided feedback on the refinement of ideas and editing of the manuscript. Appendix A was co-authored with Mangalakumar Naveen and Abed Alkhateeb, under the supervision of Dr. Alioune Ngom and Dr. Luis Rueda. Dr. Alioune Ngom, Dr. Luis Rueda, and Abed Alkhateeb issued the key ideas. Abed Alkhateeb and Mangalaku- mar Naveen performed experimental designs and data analysis. Mangalakumar Naveen performed graphing and writing. I contributed in discussing the ideas, verifying the coding, and searching for biological insight. All authors participated in discussing the result and reviewing the manuscript. The papers in the corresponding chapters were edited with some extensions or grammatical error corrections while the original structures, ideas, main results, and discussions have been preserved. I am aware of the University of Windsor Senate Policy on Authorship and I certify that I have properly acknowledged the contribution of other researchers to my thesis, and have obtained written permission from each of the co-author(s) to include the above material(s) in my thesis. vi I certify that, with the above qualification, this thesis, and the research to which it refers, is the product of my own. II. Previous Publications This thesis includes 4 original papers that have been previously published/submitted for publication in peer reviewed journals, as follows: Dissertation Publication title Publication chapter status published Mucaki, E. J., Baranova, K., Pham, H. Q., Rezaeian, Chapter 2 (journal) I., Angelov, D., Ngom, A., Rueda, L., and Rogan, P. K. (2016). Predicting outcomes of hormone and chemother- apy in the molecular taxonomy of breast Cancer interna- tional consortium (METABRIC) study by biochemically- inspired machine learning. F1000Research, 5. published Iman Rezaeian, Eliseos Mucaki, Katherina Baranova, (poster) Huy Pham Quang, Dimo Angelov, Lucian Ilie, Alioune Ngom, Luis Rueda and Peter Rogan. Predicting patient outcomes of hormone therapy in the METABRIC breast cancer study. The GLBIO/CCBC Great Lakes Bioinfor- matics and the Canadian Computational Biology Con- ference

Load more