Computational Methods for Breast Cancer Diagnosis, Prognosis, and Treatment Prediction
Total Page:16
File Type:pdf, Size:1020Kb
Computational Methods for Breast Cancer Diagnosis, Prognosis, and Treatment Prediction By Ashish Saini Bachelor of IT (Honours) Deakin University, Australia Submitted in fulfilment of the requirements for the degree of Doctor of Philosophy Deakin University August, 2014 Acknowledgements I would like to express my sincere gratitude to my principle supervisor Prof. Wanlei Zhou and associate supervisor Dr. Jingyu Hou for their constant support, valuable suggestions and invaluable guidance. I have learnt so much from them, including the research process, how to write papers, the publication process for papers, teaching methods, and much more. Without their everlasting support, inspiring enthusiasm, and encouragement, this thesis could not have been completed. I feel fortunate to work with such wonderful supervisors and will always be grateful for their supervision and encouragement for the rest of my life. I would also like to thank the academic and general research staff of the School of Information Technology, Deakin University, Australia. They are Dr. Jingyu Hou for offering the chance to gain teaching experience, Mr. Nghia Dang for IT support, and Ms. Alison Carr, Ms. Lauren Fisher and Ms. Kathy Giulieri for their general management and secretarial work which enabled me to attend academic conferences and workshops. I would also like to thank my research colleagues who helped me in my research and daily life in general. They are Dr. Sheng Wen, Dr. Tianqing Zhu, Mr. Faizal Raid-ud-Din, Dr. Longxiang Gao, Mr. Yongqing Jiang, and Mr. Biplob Ray. For those whose names are not mentioned here, I have not forgotten you the help you have given me. Good friends are difficult to come by but their love pulled me through many difficult times, and I owe many thanks to them. Last but not the least, I am very grateful to my family, my brothers, and my parents who always believed in me and encouraged me. I am also very grateful for their long distance support and love. I could not have made it this far without their support as they did everything possible to help me. I would like to dedicate this thesis to my family, and especially to my parents for the sacrifices they have made iv for me to achieve this success. Your love and encouragement has always been and will remain a great source of inspiration in my life. Ashish Saini Burwood, VIC 3125, Australia August, 2014 v Publication List Refereed Journal Articles 1. Jingyu Hou, Ashish Saini, “Semantically Assessing the Reliability of Protein Interactions”, Mathematical Biosciences, vol. 245, 2013, pp. 226- 234. (Impact Factor: 1.45) 2. Ashish Saini, Jingyu Hou, “Progressive Clustering Based Method for Protein Function Prediction”, Bulletin of Mathematical Biology, vol. 75, 2013, pp. 331-350. (Impact Factor: 2.02) 3. Ashish Saini, Jingyu Hou, Wanlei Zhou, “Hub-Based Reliable Gene Expression Algorithm to Classify ER+ and ER- Breast Cancer Subtypes”, International Journal of Bioscience, Biochemistry and Bioinformatics, vol. 3, 2013, pp. 20-26. 4. Ashish Saini, Jingyu Hou, Wanlei Zhou, “RRHGE: A novel approach to classify the estrogen-receptor based breast cancer subtypes”, Scientific World Journal, vol. 2014, 2014. (Impact Factor: 1.73) 5. Ashish Saini, Jingyu Hou, Wanlei Zhou, “Breast Cancer Prognosis Risk Estimation using Integrated Gene Expression and Clinical Data”, Biomed Research International, vol. 2014, 2014. (Impact Factor: 2.88) Book Chapter 1. Ashish Saini, Jingyu Hou, Wanlei Zhou, “IPA: Integrated Predictive Gene Signature from Gene Expression based Breast Cancer Patient Samples”, Innovative Approach in Stem Cell Research, Cancer Biology and Applied Biotechnology (ISBN: 978-93-83083-79-4), 2014. vi Submitted Papers 1. Ashish Saini, Jingyu Hou, Wanlei Zhou, “Breast Cancer Classification: Application of Microarray Gene Expression Data”, Current Bioinformatics, Submitted May 2014. (Impact Factor: 2.02) 2. Ashish Saini, Jingyu Hou, Wanlei Zhou, “Treatment Response Prediction of Neoadjuvant Chemotherapy Regimens for Breast Cancer Patients”, BMC Cancer, Submitted July. 2014. (Impact Factor: 3.33) vii To my family viii Table of Contents Acknowledgements ............................................................................................... iv Publication List ..................................................................................................... vi Table of Contents .................................................................................................. ix List of Tables ...................................................................................................... xiii List of Figures ...................................................................................................... xv Abstract ............................................................................................................ xviii 1 Introduction ....................................................................................................... 1 1.1 Background and Motivation .......................................................................... 1 1.1.1 Clinical Covariates of Breast Cancer ..................................................... 3 1.1.2 Subtypes of Breast Cancer ..................................................................... 4 1.2 Research Objectives ....................................................................................... 7 1.3 Main Contribution ......................................................................................... 8 1.4 Thesis Organization ....................................................................................... 9 2 Preliminaries .................................................................................................... 11 2.1 Microarray Technology ............................................................................... 11 2.1.1 Issues with the Microarray Data ........................................................... 16 2.1.2 Pre-Processing of Microarray Data ...................................................... 17 2.1.3 Microarray Data Analysis .................................................................... 19 2.2 Breast Cancer Classifiers ............................................................................. 21 2.2.1 Existing Approaches ............................................................................. 24 2.2.2 Previous Popular Studies ...................................................................... 37 2.2.3 Issues with the Existing Approaches .................................................... 42 2.3 Evaluation Methods ..................................................................................... 45 2.3.1 Statistical Evaluation Methods ............................................................. 45 ix 2.3.2 Biological Validation Methods ............................................................ 50 2.4 Summary ...................................................................................................... 53 3 Semantically Assessing the Reliability of Protein Interactions ................... 56 3.1 Introduction .................................................................................................. 56 3.2 Semantic Reliability ..................................................................................... 59 3.2.1 Semantic Interaction Reliability 1 (R1) ................................................ 59 3.2.2 Semantic Interaction Reliability 2 (R2) ................................................ 61 3.2.3 Semantic Reliability (SR) Method ....................................................... 62 3.3 Results.......................................................................................................... 64 3.4 Summary ...................................................................................................... 73 4 Diagnosis of Estrogen Receptor based Breast Cancer Subtypes ................ 75 4.1 Introduction .................................................................................................. 75 4.2 Materials ...................................................................................................... 78 4.2.1 Breast Cancer Gene Expression Datasets ............................................. 78 4.2.2 Transformation of PPI datasets ............................................................ 80 4.2.3 Training and Testing sets ..................................................................... 81 4.3 Algorithm ..................................................................................................... 82 4.3.1 Reliability Metrics ................................................................................ 82 4.3.2 Gene Expression Metrics ...................................................................... 85 4.3.3 Reliable Gene Expression Metrics ....................................................... 85 4.3.4 Robust Reliability based Hub Gene Expression Algorithm (RRHGE) 86 4.3.5 Classification Process and Performance Assessment ........................... 89 4.4 Results.......................................................................................................... 90 4.4.1 Classification Performance ................................................................... 93 4.4.2 Signature Stability with Existing Gene Signatures .............................. 98 4.4.3 Biological Analysis of RRHGE Gene Signature .................................. 99 4.5 Summary ...................................................................................................