30-Days All-Cause Prediction Model for Readmissions for Heart Failure Patients: a Comparative Study of Machine Learning Approaches
Total Page:16
File Type:pdf, Size:1020Kb
30-DAYS All-CAUSE PREDICTION MODEL FOR READMISSIONS FOR HEART FAILURE PATIENTS A COMPARATIVE STUDY OF MACHINE LEARNING APPROACHES A Dissertation Presented By Amal Abdullah Bukhari to The Department of Engineering in partial fulfillment of requirements for the degree of Doctor of Philosophy In the field of Interdisciplinary Engineering Northeastern University Boston, Massachusetts November, 2019 ii Northeastern University Graduate School of Engineering Dissertation Signature Page Dissertation Title: 30-Days All-Cause Prediction Model for Readmissions For Heart Failure Patients: A Comparative Study of Machine Learning Approaches Author: Amal Bukhari. NUID: 000034724. Department: The Department of Engineering – Interdisciplinary Engineering Approved for Dissertation Requirement for the Doctor of Philosophy Degree Dissertation Advisor Professor. Sagar Kamarthi. ____________________________________________________________________ Print Name,Title Signature Date Dissertation Committee Member Professor. Kal Bugrara. _______________________________________________________________________ Print Name,Title Signature Date Dissertation Committee Member Dr. Kamal Jethwani. _______________________________________________________________________ Print Name,Title Signature Date Dissertation Committee Member Dr. Stephen Agboola. _______________________________________________________________________ Print Name,Title Signature Date Department Chair _______________________________________________________________________ Print Name,Title Signature Date Associate Dean of the Graduate School ________________________________________________________________________ Senior Associate Dean for Academic Affairs Signature Date iii ACKNOWLEDGMENTS I would like to express my special appreciation and thanks to my advisor Professor Sagar Kamarthi, for the patient guidance, encouragement and advice he has provided throughout my time as his student. I also would like to thank the members of my dissertation committee, Professor Kal Bugrara, Dr. Stephen Agboola and Dr. Kamal Jethwani, for their contribution and suggestion in general. I gratefully acknowledge the scholarship I received from the Saudi Arabian Cultural Mission and University of Jeddah. Lastly, I owe my deepest gratitude to my lovely family for their support and encouragement during my Ph.D. journey and for always believing in me and encouraging me to follow my dreams. iv I dedicate this dissertation to my beloved family, my father, my mother, my sister and brothers for their constant support and unconditional love. I love you all dearly. v TABLE OF CONTENTS TABLE OF CONTENTS .......................................................................................................... v LIST OF TABLES .................................................................................................................. vii LIST OF FIGURES ............................................................................................................... viii LIST OF ABBREVIATION .................................................................................................. ix ABSTRACT ................................................................................................................................. x CHAPTER 1 ............................................................................................................................... 1 INTRODUCTION AND OVERVIEW ................................................................................. 1 1.1 Heart failure Overview .............................................................................................. 1 1.2 Research goal and objectives ................................................................................... 8 1.3 Structure of the Thesis ............................................................................................. 10 CHAPTER 2 ............................................................................................................................. 11 2.1 Heart Failure Hospitalization ................................................................................ 11 2.2 Risk Prediction Models of Readmission for Heart Failure .......................... 14 2.2.1 Risk factors ........................................................................................................................... 16 2.2.2 Model Development and Performance ..................................................................... 20 CHAPTER 3 ............................................................................................................................. 27 METHODOLOGY ................................................................................................................. 27 3.1 Aims and objectives .................................................................................................. 27 3.2 Data Mining Software Selection ............................................................................ 31 3.3 Data Description ........................................................................................................ 31 3.4 Inclusion / Exclusion Criteria ................................................................................ 33 3.5 Data Preprocessing ................................................................................................... 34 3.5.1 Data Wrangling ................................................................................................................... 34 3.5.3 Data Cleaning ....................................................................................................................... 36 3.5.3 Data Transforming ............................................................................................................ 38 3.6 Dataset ........................................................................................................................... 39 3.7 Label Definition / Outcome Definition ............................................................... 50 Definition of Index Admission ................................................................................................. 50 3.8 Modeling ..................................................................................................................... 51 3.8.1 Feature Selection ............................................................................................................... 51 3.8.2 Imbalance Data Class Imbalance ................................................................................. 56 3.8.3 Experiments and selected algorithms ...................................................................... 57 vi 3.9 Validation Set Approach (Data Split) .................................................................. 60 3.10 Evaluation / Performance Metrics .................................................................... 61 3.10.1 Accuracy (Acc) .................................................................................................................. 62 3.10.2 Precision (p) ...................................................................................................................... 62 3.10.3 Sensetivity or Recall (r) ................................................................................................ 62 3.10.4 Specificity .......................................................................................................................... 63 3.10.5 F-Measure (FM) ............................................................................................................... 63 3.10.6 Area under the ROC Curve (AUC) ............................................................................. 63 3.11 Summary .................................................................................................................... 64 CHAPTER 4 ............................................................................................................................. 65 RESULT AND ANALYSIS .................................................................................................. 65 4.1 Logistic Regression ................................................................................................... 66 4.2 Decision Tree .............................................................................................................. 67 4.3 Random Forest ........................................................................................................... 69 4.4 Naïve Bayes .................................................................................................................. 71 4.5 Support Vector Machine .......................................................................................... 72 4.6 Xboost ............................................................................................................................ 73 4.7 Summary ....................................................................................................................... 74 CHAPTER 5 ............................................................................................................................. 75 CONCLUSION AND FUTURE WORK ........................................................................... 75 REFERENCES ........................................................................................................................ 78 vii LIST OF TABLES Table1. 1 Common medications .......................................................................................................... 3 Table1.