Developing a Machine Learning Framework for 24-Hour Data Analysis Aimed at Early Detection of Cardiac Arrhythmias As a Guiding Tool for Physicians

DEVELOPING A MACHINE LEARNING FRAMEWORK FOR 24-HOUR DATA ANALYSIS AIMED AT EARLY DETECTION OF CARDIAC ARRHYTHMIAS AS A GUIDING TOOL FOR PHYSICIANS by A. YASHAR TASHAKKOR M.D., University of British Columbia, 2010 Thesis Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Applied Science in the School of Engineering Science Faculty of Applied Sciences A. YASHAR TASHAKKOR 2019 SIMON FRASER UNIVERSITY Summer 2019 Copyright in this work rests with the author. Please ensure that any reproduction or re-use is done in accordance with the relevant national copyright legislation. Approval Name: A. YASHAR TASHAKKOR Degree: Master of Applied Science Title: DEVELOPING A MACHINE LEARNING FRAMEWORK FOR 24-HOUR DATA ANALYSIS AIMED AT EARLY DETECTION OF CARDIAC ARRHYTHMIAS AS A GUIDING TOOL FOR PHYSICIANS Examining Committee: Chair: Michael Sjoersdma Senior Lecturer Andrew Rawicz Senior Supervisor Professor Craig Scratchley Supervisor Senior Lecturer Ash Parameswaran Internal Examiner Professor Date Defended: May 20th, 2019 ii Abstract Cardiovascular diseases (CVD), defined as a spectrum of disorders primarily impacting the heart and the circulatory system, account for a substantial fraction of worldwide morbidity and mortality. Electrocardiograms (ECGs) are routinely implemented in a patient’s diagnosis, both in hospitals and outpatient settings. They serve as one of the primary diagnostic tools as patients encounter medical personnel, particularly in suspected CVD. A cardiac holter monitor is a medical diagnostic device, connected to the patient via several conduc- tive leads placed across the chest, and "worn" on a strap across the shoulder. A holter is applied to record continuous ECG data (typically 24 hours). With recently emerging applications of Machine Learning (ML) in data analysis techniques, the need for human expertise and potential human error could be minimized, and prediction accuracy optimized considerably. Hence, the objective of this research is to develop a machine learning frame-work to eventually aid physicians with their decisions as a powerful guiding/assisting tool to analyse the ECG information reported by holter monitors. Fur- thermore, we aim to develop a computer aided diagnostic system that can assists expert cardiologists by providing intelligent, cost effective, and time saving diagnosis. In this thesis, we implement a deep learning-based solution to analyse readings of the holter monitors. In our proposed solution, we train neural networks to extract high-level features from temporal signal recordings of holter monitors. We present a supervised neural network framework to predict the physician’s final interpretations based on the holter recorded signals. The outputs of the network contain the likelihood of four possible scenarios of Normal and three types of arrhythmia. The high classification performance of the proposed methodology emphasizes the capability of this framework to be used as an assisting tool alongside the physicians to interpret holter reports. Keywords: Holter monitor, Machine learning, ECG, Neural network iii Dedication This humble work is dedicated to my family, most especially my brother, who inspired this academic endeavour. It is also dedicated to Professor Andrew Rawicz, a gentleman and a scholar. I first met Professor Rawicz years ago, quite literally the minute I became a medical doctor; it seems whenever he is around, I somehow inherit extra abbreviations. I sincerely thank you Andrew. iv Acknowledgements It is with immense gratitude that I acknowledge the support and inestimable guidance of Professor Andrew Rawicz on this work. I further owe my deepest gratitude to my committee members, reviewers, colleagues, mentors, and friends, all of whom dedicated valuable hours towards this dissertation. v Table of Contents Approval ii Abstract iii Dedication iv Acknowledgements v Table of Contents vi viii List of Tables ix List of Figures Acronyms xi 1 Introduction 1 1.1 Motivation . 1 1.2 Objectives . 3 1.3 Proposed Solution . 3 1.4 Contributions . 4 1.5 Thesis Outline . 5 2 Background 6 2.1 Introduction . 6 2.2 Electrocardiogram . 6 2.3 Arrythmia types . 7 2.4 Heart Rate Variability . 9 2.5 Holter Monitors . 10 2.5.1 Holter System Description . 13 3 Machine Learning Approaches 16 3.1 Introduction . 16 3.2 Artificial Intelligence . 16 3.3 Supervised Machine Learning . 17 3.4 Deep Learning (DL) and Related Work . 19 vi 4 Methodology 22 4.1 Ethics . 22 4.2 Materials . 22 4.2.1 Block Diagram . 22 4.3 Data . 23 4.3.1 Data Acquisition . 23 4.3.2 Data Description . 25 4.3.3 Data (ROI) Selection . 28 4.3.4 Labeling . 29 4.4 Prepossessing . 31 4.4.1 Data Augmentation . 31 4.5 Network Architecture . 31 4.5.1 Activation Function . 32 4.5.2 Automatic Feature Learning . 33 4.5.3 Classification Prediction (Labeling) . 34 4.5.4 Classification Framework . 34 4.5.5 Fine Tuning . 34 4.5.6 Optimization . 36 4.5.7 Implementation . 36 5 Results and Verification 40 5.1 Network Analysis . 40 5.1.1 Error Analysis . 40 5.1.2 Network Accuracy . 41 5.2 Classification Performance . 42 5.3 Test phase . 45 5.3.1 Classification Verification . 45 5.4 Discussion . 47 6 Conclusion and Future work 48 6.1 Summary of Contributions . 49 6.2 Challenges and Limitations . 49 6.3 Future Work . 50 Bibliography 52 Appendix 61 vii List of Tables Table 4.1 Network Input: Features derived from holter’s generate report . 35 Table 4.2 Output labels: the predictable Normal condition and arrhythmia types. 35 Table 5.1 Evaluation of network . 44 Table 5.2 Evaluation of network on 47 new patients . 46 viii List of Figures Figure 2.1 Schematic representation of an ECG signal. ECG wave forms P, QRS and T waves and standard features extracted from a single cardiac beat [45]. 7 Figure 2.2 Definitions of ECG components and their "normal" duration [48]. 8 Figure 2.3 Atrial Fibrillation pattern on ECG. Note absence of characteristic P, QRS, T pattern seen on a normal ECG. Also note inconsistency in the R-R duration (An ECG representing an fib waveform.) . 9 Figure 2.4 An ECG illustrating First Degree AV Block. Note prolonged duration of the PR interval compared to a normal ECG. 10 Figure 2.5 A vector view of the standard 12 Lead ECG. The frontal leads are light blue and the pre-cordial leads are dark blue. [73]. 11 Figure 2.6 a) Einthoven’s triangle, formed by the right arm, left arm and left leg electrodes. These three electrodes form the basis for the frontal axis. [66]., b) The frontal axis of the ECG consists of six vectors derived from three electrodes: left arm (L), right arm (R), and left leg (F). [66]. c) The electrodes V1-V6 are placed on the chest roughly in a semilunar line. [66]. 12 Figure 2.7 Simplified functional block diagram of a holter monitor. [108]. 13 Figure 2.8 Block diagram of a sample holter recorder system. [52]. 14 Figure 2.9 a) Circuits of ECG amplifier, high-and low-pass filters [52], b) Sug- gested R-peak detection circuit [52]. 15 Figure 3.1 Main machine learning methods used for ECG classification. (a) Sup- port vector machine (b) Random forest classification using n decision trees. (c) Hidden Markov model (d) Neural network with two hidden layer [65]. 18 Figure 3.2 A Typical neural network with working of a single neuron explained separately [17]. 19 ix Figure 4.1 An illustration of proposed methodology for automatic arrhythmia classification of holter reports. Main tasks include: a) data collection, b) Prepossessing of data and Machine Learning frame work, c) Predicting results) . 23 Figure 4.2 Distribution of holter monitor data: a)Gender distribution, b)Age range. 24 Figure 4.3 ScotCare holter used for data collection throughout the study. a) Chroma2 version with 5 leads, b) Close view of holter monitor [25] 26 Figure 4.4 Representation of a sample report generated report by ScotCare built-in software, holterCareTM..................... 27 Figure 4.5 Samples of the physician’s comments on holter reports; representing the diagnosis based on ECG readings, the key words form the selected red boxes are extracted as labels: a) Sample of a Normal reading. b)sample of Benign, C) sample of AV nodal block reading, d) Sample of AFB reading . 30 Figure 4.6 Schematic of a proposed network, nodes and layers connectivity. The number of inputs is chosen based on the available information form the reported holter readings, therefore the input layer contains 10 nodes. The output layer of the network includes four classes to predict; Normal, Benign, AV block, AFB. 38 Figure 4.7 Network structure representation. The number of inputs and param- eters for each layer as well as the type of each layer. 39 Figure 4.8 Distributions of data set used for training and testing a)suggested methods for cross validation and testing [56], b)The 5-k fold methodology used for testing and training of data . 39 Figure 5.1 Learning curve: Training and validation losses during training and test phase . 41 Figure 5.2 Learning curve graph: Training and validation losses during training and test phase . 42 Figure 5.3 Accuracy graph indicating: categortical accuracy and validation categorical accuracy . 43 Figure 5.4 Categorical confusion Matrix: original data set . 44 Figure 5.5 Categorical confusion matrix: Test phase-47 new patients . 46 x Acronyms CVD Cardiovascular Diseases WHO World Health Organization ECG Electrocardiogram HMM Hidden Markov Models ML Machine Learning NSR Normal Sinus Rhythm AF Atrial Fibrillation HRV Heart rate variability AI Artificial Intelligencen CNN Convolutional Neural Networks AAMI Advancement of Medical Instrumentation PVC premature ventricular contraction ANNs artificial neural networks based CHD coronary heart disease DCNNs deep convolutional neural networks VPB Ventricular Premature Beat or Premature Ventricular Complex SVPB Supraventricular Premature Beats PAF paroxysmal atrial fibrillation xi Chapter 1 Introduction 1.1 Motivation Cardiovascular Diseases (CVD), defined as a spectrum of disorders primarily impacting the heart and the circulatory system, account for a substantial fraction of worldwide morbidity and mortality.

Load more