Retinal Image Analysis and Diagnosis of Retinal Blood Vascular Diseases using Deep Learning Model

Bismita Choudhury

(Student ID: 100063702)

A thesis submitted in fulfilment of the requirements for the degree of

Doctor of Philosophy (PhD)

(Computer Science and Engineering)

Faculty of Engineering, Computing, and Science

Swinburne University of Technology Sarawak Campus

May/2019

“Research is what I’m doing when I don’t know what I’m doing.”

– Wernher von Braun

THESIS – DOCTOR OF PHILOSOPHY (PHD)

Abstract

The retina is not only an important part of the visual system, but it also has the potential to indicate the general health of the other parts of the human body. In addition to the eye disease, various retinal abnormalities can be indicative of other health issues. Recent studies have show n that these retinal abnormalities associated with the blood vascular disease are predictive to several major diseases, viz., , Cardiovascular diseases like Hypertension and Coronary heart disease, Kidney disease, and . Among the various blood vascular diseases, Diabetic Retinopathy (DR) and Retinal Occlusion (RVO) are the two leading causes of blindness worldwide. The main causes for both of these sight-threatening retinal diseases are the age, and sedentary lifestyle of people. As these factors are beyond controllable to avoid such diseases, therefore, it is particularly important to detect these retinal abnormalities as early as possible and prevent the visual imparity. The recent years have seen the increased interest in diagnosing various diseases through Computer Aided Diagnosis (CAD) of the digital images. In past decades, several such methods have been proposed for diagnosing DR. Majority of the methods face the challenge in detecting DR at the earliest stage. Again, there is a substantial lack of research in automatic detection of RVO given the fact that it is the second most popular reason of vision loss and an indication to the possible blockage in cardiac and nerves in the human brain as well. The literatures on CAD for diagnosing retinal abnormality are disease specific. The algorithms for detecting one type of disease either cannot or fail to detect other types of disease due to similar intra and interclass variability of the features. And, the majority of these methods are dependent on hand designed feature engineering. Therefore, the performance of these methods for classifying diseases depends on the performance of the segmentation and feature extraction methods. To overcome these problems, in this dissertation, deep learning methods have been proposed to analyse retinal image and diagnose retinal abnormalities to detect blood vascular diseases, particularly DR and RVO. The proposed Deep Learning methods utilize the advantages of Convolutional Neural

Swinburne University of Technology Sarawak Campus | Abstract i

THESIS – DOCTOR OF PHILOSOPHY (PHD)

Network (CNN) to analyse the retinal image and detect possible retinal blood vascular diseases associated with the abnormal appearance of the retina.

A novel architecture of CNN has been proposed to particularly work with retina image and detect DR and RVO. A design hypothesis has been set to design CNN from scratch for the particular problem at hand. The main focus has been on overcoming the barrier of using deep models since the popular, widely used CNN models introduce extra complexity. Generally, the deep learning models are complicated and require a huge amount of training samples, memory, and time for providing supervised learning. In this research, a simple CNN architecture has been carefully designed to extract and analyse the normal and abnormal features from the raw image pixels of the whole color fundus image, and detect possible retinal blood vascular disease as early as possible. The careful design, careful selection of hyperparameters, and learning algorithm have made the proposed model a simple yet a potential model to diagnose retinal abnormalities without compromising the performance. This CNN can detect DR at the earliest stage and grade DR into the mild to moderate Non-proliferative DR (NPDR) and severe NPDR to Proliferative DR (PDR). The experiments have been conducted for two databases, viz., Messidor and Kaggle database. The proposed model can detect early stage DR or mild-NPDR with an accuracy of 98.11%, sensitivity of 100%, and specificity of 96.2% for Messidor database. For Kaggle database, the proposed model has achieved 96.6% accuracy, 99.7% sensitivity, and 93.2% specificity. For DR severity grading, the experiments have been conducted on STARE and Messidor Database. The proposed model can classify DR into three classes; viz., normal, mild to moderate NPDR, and severe NPDR to PDR with an accuracy of 98.2%, sensitivity of 100%, and specificity of 98%. Again, the proposed model can detect RVO and its two types, viz., Central Retinal Vein Occlusion (CRVO) and Branch Retinal Vein Occlusion (BRVO). The RVO images have been collected from two databases, viz. STARE and Retinal Image Bank and as per the conducted experiments, the proposed model can classify normal, BRVO and CRVO images with accuracy 97%, sensitivity 96.1%, and specificity 98%. In addition to that, this model is also capable of analysing similar features of different diseases; hence, it can distinguish both DR and RVO irrespective of their common lesions. For this particular experiment, the images have been mixed and matched from multiple databases. The DR images (both NPDR and PDR) are

Swinburne University of Technology Sarawak Campus | Abstract ii

THESIS – DOCTOR OF PHILOSOPHY (PHD) collected from STARE, DRIVE and Messidor database, the RVO images (both BRVO and CRVO) from STARE and Retinal Image Bank, and Normal images from STARE, Messidor, DRIVE, and Dr. Hossain Rabbani). For the 3-class classification of DR, RVO and normal image, the proposed model has obtained 98.8% accuracy, 100% sensitivity, and 98.3% specificity.

This dissertation has made an attempt to fulfil the gap in the research for automatic diagnosis of Retinal Vein Occlusion and detect all its variants, viz., Central Retinal Vein Occlusion (CRVO), Branch Retinal Vein Occlusion (BRVO), and Hemiretinal Vein Occlusion (HRVO). A Cascaded Convolutional Neural Network (CCNN) has been proposed, which is a chain of three CNNs of same proposed configuration. This special novel deep architecture carefully analyses the ambiguous features of HRVO. In this chain of three CNNs, every two CNNs carefully investigate the features of each type of RVO. The Cascaded Convolutional Neural Network (CCNN) takes a final decision on RVO types based on the result of two internal CNNs. There is no such algorithm proposed till date to detect all three types of RVO. Therefore, this proposed method is one of its first kinds to detect all three types of RVO. For the experiment, the RVO images and normal images have been collected from multiple databases, viz., DRIVE, Messidor, STARE, and Dr. Hossain Rabbani. The proposed Cascaded Convolutional Neural Network can successfully classify BRVO, CRVO, HRVO, and normal images with an accuracy of 96%, sensitivity of 97%, and specificity of 95.2%.

The proposed deep learning methodologies have efficiently overcome the multiple levels of challenges in the field of retinal image analysis; computer-aided automated diagnosis methods, and deep learning. Various experiments have been carried out on multiple publicly available databases, and both of the proposed deep learning based methods have performed outstandingly in each individual task. Therefore, the proposed model is a potential model for diagnosing retinal blood vascular diseases, and can help the ophthalmologists to detect especially DR and RVO at the early stage, thus, can prevent the total vision loss of the patients. It would be a cost effective method and patients at distance areas would be able to diagnose remotely.

Swinburne University of Technology Sarawak Campus | Abstract iii

THESIS – DOCTOR OF PHILOSOPHY (PHD)

Acknowledgement

First and foremost, I would like to express my sincere gratitude to my principal-supervisor, Prof. Patrick HH Then for his continuous support, motivation, enthusiasm, guidance and sharing his invaluable and vast knowledge in the field of research. His profound insights and attention to details have been a true inspiration to my research. I could not have imagined of having better advisor and mentor for my PhD research.

I would also like to thank my co-supervisor Dr. Valliappan Raman for his technical knowledge sharing and constructive criticism during the entire span of my PhD research. His insightful comments have helped me a lot in shaping this dissertation.

My sincere thanks and gratitude goes to my previous co-supervisor Dr. Manas Kumar Haldar and my external supervisor Dr. Biju Issac of Northumbria University, UK, and for their continuous encouragement and valuable advices to widen my research ideas from various perspectives.

I would also like to take this opportunity to thank all my friends and colleagues Brian Loh, Wan Tze Vong, Mohmd. Yuzrie, Michelle Gian, Clement Ting, Emily Rogos and Akshay Kakar for their continuous help, encouragement and unforgettable moments that we had in the last four years.

Last but not the least; I would like to thank my parents and elder brother for all their love, blessings, encouragement and support. Especially I would like to mention my Father who trusted me and supported me throughout all of my pursuits.

Bismita Choudhury

Swinburne University of Technology, Sarawak

Swinburne University of Technology Sarawak Campus | Acknowledgement iv

THESIS – DOCTOR OF PHILOSOPHY (PHD)

Author Declaration

I hereby declare that this thesis entitled “Retinal Image Analysis and Diagnosis of Retinal Blood Vascular Diseases using Deep Learning Model” is the result of my own research work except for quotations and citations, which have been duly acknowledged. I also declare that is, it has not been previously or concurrently submitted for any other doctor of philosophy students at Swinburne University of Technology Sarawak Campus.

Candidate Signature:

______

Name: Bismita Choudhury

Swinburne University of Technology Sarawak Campus | Author Declaration v

THESIS – DOCTOR OF PHILOSOPHY (PHD)

List of Publications

Book Chapter: Choudhury B., Then P.H.H. Raman V. , Automatic Detection Of Retinal Vein Occlusion ( RVO ) Using Convolutional Neural Network ( CNN ), Big Data Vis. Anal. Springer, Cham. (2017) 1–21. https://doi.org/10.1007/978-3-319-63917-8_1

Journal Publication: Choudhury B., Then P.H.H. Raman V., Automated Diabetic Retinopathy Detection using Deep Learning, Journal of Industrial Information Technology and Application, March, 2018, 2(1), pp-39-49. DOI: 10.22664/ISITA.2018.2.1.39, http://jiita.org/v2n107/

Paper Presentation: Automated Diabetic Retinopathy Detection using Deep Learning, International Symposium on Innovation in Information Technology and Application, Kota Kinabalu, Malaysia, Jan 30-Feb 2, 2018.

Poster Presentation: Automatic Detection of Retinal Vein Occlusion using Convolutional Neural Network, Swinburne Celebrates Research Conference, 2017, Melbourne, Australia.

Others: 1. B Choudhury, P Then, V Raman, B Issac, MK Haldar, “Cancelable Iris Biometrics based on Data Hiding Schemes”, IEEE Student Conference on Research and Development (SCOReD), 2016, 1-6.

2. B Choudhury, P Then, B Issac, V Raman, MK Haldar, A Survey on Biometrics and Cancelable Biometrics Systems. International Journal of Image and Graphics 18 (01), 1850006.

Swinburne University of Technology Sarawak Campus | List of Publications vi

THESIS – DOCTOR OF PHILOSOPHY (PHD)

Table of Contents

Abstract ...... i

Acknowledgement ...... iv

Author Declaration ...... v

List of Publications ...... vi

Table of Contents ...... vii

List of Figures ...... xi

List of Table ...... xiv

List of Acronyms ...... xvi

1 Introduction ...... 1

1.1 Research Overview:...... 2

1.2 Research Gaps: ...... 6

1.3 Research Challenges ...... 7

1.4 Research Hypothesis ...... 9

1.5 Research Objective ...... 11

1.6 Research Contribution ...... 12

1.7 Thesis Organization ...... 13

2 Background...... 15

2.1. Eye Anatomy ...... 15

2.2. Retina ...... 16

2.3. Retinal Image Screening Techniques ...... 18

2.4. Retinal Blood Vascular Disease ...... 21

2.1.1 Risk Factors and Symptoms...... 22

2.5. Diabetic Retinopathy ...... 22

2.5.1. Risk Factors and Symptoms...... 23

2.5.2. Clinical Signs ...... 23

Swinburne University of Technology Sarawak Campus | Table of Contents vii

THESIS – DOCTOR OF PHILOSOPHY (PHD)

2.5.3. Types ...... 25

2.5.4. Diagnosis and Treatment ...... 27

2.6. Retinal Vein Occlusion (RVO) ...... 28

2.6.1. Risk Factors and Symptoms...... 29

2.6.2. Clinical Signs ...... 30

2.6.3. Types of RVO ...... 31

2.6.4. Diagnosis and Treatment ...... 38

2.7. Computer Aided Diagnosis (CAD) System ...... 39

2.8. Chapter Summary ...... 41

3 Literature Review ...... 43

3.1. State-of-the-Art of DR Detection ...... 43

3.1.1. Segmentation of Bright Lesions: ...... 44

3.1.2. Segmentation of Red Lesions ...... 48

3.1.3. Diabetic Retinopathy Screening Methods ...... 53

3.1.4. Deep Learning Methods ...... 57

3.1.5. Discussion ...... 58

3.2. State-of-the-Art of RVO Detection ...... 59

3.2.1. Feature Representation ...... 59

3.2.2. Texture Analysis ...... 61

3.2.3. Deep Learning Approach ...... 63

3.2.4. Discussion ...... 63

3.3. Discussion on Existing Methodologies for Retinal Blood Vascular Disease Detection ...... 65

3.4. Chapter Summary ...... 66

4 Materials and Methods ...... 67

4.1. Deep Learning ...... 67

4.2. Convolutional Neural Network (CNN) ...... 76

Swinburne University of Technology Sarawak Campus | Table of Contents viii

THESIS – DOCTOR OF PHILOSOPHY (PHD)

4.2.1. Overview ...... 76

4.2.2. Types of CNN ...... 78

4.2.3. Limitations and Challenges ...... 80

4.3. Proposed Methodology for Diagnosing Retinal Abnormality ...... 81

4.3.1. Image Pre-processing ...... 82

4.3.2. Designing CNN for Retinal Blood Vascular Diseases Classification ... 85

4.3.3. Novelty of the Proposed CNN ...... 115

4.4. Cascaded-CNN for Diagnosing Retinal Vein Occlusion (RVO) ...... 121

4.4.1. Design Strategy for Deep Cascaded Network ...... 123

4.4.2. Function of Each CNN in the Cascaded Network: ...... 126

4.4.3. Contribution ...... 130

4.4.4. Novelty of the Proposed Cascaded CNN...... 131

4.5. Contribution of This Research on Retinal Abnormality Detection ...... 136

4.6. Chapter Summary ...... 138

5 Experimental Validation ...... 140

5.1. Experimental Environment ...... 140

5.2. Databases ...... 140

5.3. Performance Measures...... 143

5.4. Performance Evaluation of the Proposed CNN ...... 145

5.4.1. Detection of DR at the Earliest Stage ...... 145

5.4.2. Grading of DR Severity ...... 151

5.4.3. Detection of RVO...... 153

5.4.4. Detection of DR and RVO ...... 154

5.5. Performance Evaluation of the Proposed Cascaded CNN ...... 156

5.5.1. Training and Testing...... 156

5.5.2. Experimental Results ...... 158

Swinburne University of Technology Sarawak Campus | Table of Contents ix

THESIS – DOCTOR OF PHILOSOPHY (PHD)

5.6. Performance Comparison of the Proposed CNN based Method with the State-of-the-Art Methods ...... 165

5.6.1. Comparison with the State-of-the-Art Methods for DR Detection ..... 166

5.6.2. Comparison with the State-of-the-Art Methods for RVO Detection .. 173

5.7. Discussion ...... 182

5.8. Chapter Summary ...... 185

6 Conclusion and Future Works ...... 187

6.1. Conclusion ...... 187

6.2. Future Works ...... 191

References ...... 192

Swinburne University of Technology Sarawak Campus | Table of Contents x

THESIS – DOCTOR OF PHILOSOPHY (PHD)

List of Figures

Figure 2.1 Anatomy of Human Eye ...... 16 Figure 2.2 Retina and its different components ...... 17 Figure 2.3 Organization of different layers in Retina ...... 18 Figure 2.4 Two types of non-mydriatic retinal image cameras: (a) Canon CR6-45NM and (b) Topcon TRC-NW6S ...... 19 Figure 2.5 Digital Fluorescing Image of Retina ...... 20 Figure 2.6 Colour Fundus Photography...... 20 Figure 2.7 Red Free Photography...... 21 Figure 2.8 (a) Mild-NPDR, (b) Moderate-NPDR, and (c) Severe NPDR ...... 26 Figure 2.9 Proliferative DR ...... 27 Figure 2.10 Non-ischemic CRVO ...... 32 Figure 2.11 Ischemic CRVO ...... 33 Figure 2.12 Non-ischemic BRVO ...... 35 Figure 2.13 Ischemic BRVO ...... 36 Figure 2.14 Non ischemic HRVO ...... 37 Figure 2.15 Ischemic HRVO ...... 37 Figure 2.16 Block diagram of Traditional Computer Aided Detection (CAD) System ...... 40 Figure 4.1: Machine Learning Method ...... 68 Figure 4.2: k-NN Algorithm Explanation ...... 69 Figure 4.3: Support Vector Machine (SVM) Explanation ...... 70 Figure 4.4: SVM Mapping Non-Separable Data to High Dimensional Space...... 71 Figure 4.5: Decision Tree ...... 72 Figure 4.6: Linear Regression ...... 73 Figure 4.7: Logistic Regression ...... 73 Figure 4.8: Artificial Neurons inspired by Human Brain Neurons...... 74 Figure 4.9: Artificial Neural Network (ANN) ...... 75 Figure 4.10 Basic CNN Architecture ...... 78 Figure 4.11 The Overview of the Proposed Computer Aided Diagnosis of Retinal Abnormality ...... 82 Figure 4.12 Pre-processing Steps for Image Quality Enhancement ...... 85 Swinburne University of Technology Sarawak Campus | List of Figures xi

THESIS – DOCTOR OF PHILOSOPHY (PHD)

Figure 4.13: Convolution function ...... 91 Figure 4.14: The size of receptive filed increases in the deeper level...... 91 Figure 4.15: Function of Sub-sampling Layer ...... 93 Figure 4.16: Sigmoid Function and its derivatives (Isaac Changhau)...... 95 Figure 4.17: Tanh Function and its derivatives (Isaac Changhau) ...... 95 Figure 4.18: ReLU function and its gradients (Isaac Changhau) ...... 96 Figure 4.19: Effect of ReLU ...... 96 Figure 4.20: Function of Fully Connected Layer ...... 99 Figure 4.21: CNN structure with Filter Size ...... 102 Figure 4.22 Topology of the Designed CNN ...... 105 Figure 4.23 Output Data Visualization ...... 108 Figure 4.24 : Gradient Descent Principle...... 112 Figure 4.25: Effect of large and small learning rate ...... 113 Figure 4.26 The Architecture of LeNet-5 ...... 117 Figure 4.27: Pre-processing of RVO images...... 122 Figure 4.28 The Flow Chart for Proposed Cascaded Convolutional Neural Network for RVO Detection ...... 126 Figure 4.29 Architecture of CNN-1...... 127 Figure 4.30 Architecture of CNN-2...... 128 Figure 4.31 Architecture of CNN-3...... 128 Figure 5.1: ROC Curve for Stage-1 DR Detection in Messidor Database ...... 147 Figure 5.2: Normal Images Misclassified as Stage-1 DR (Messidor Database) ...... 147 Figure 5.3 : Samples of the Discarded Poor Quality Stage-1 DR images of Kaggle Database ...... 148 Figure 5.4: ROC Curve of Stage-1 DR Detection for Kaggle Database ...... 149 Figure 5.5 : The Stage-1 DR or Mild NPDR Misclassified as Normal Image (Kaggle Database)...... 150 Figure 5.6: Samples of Normal Images Misclassified as Stage-1 DR (Kaggle Database)...... 150 Figure 5.7: Misclassified Moderate NPDR image ...... 152 Figure 5.8: Misclassified Normal Images ...... 152 Figure 5.9: RVO Image misclassified as Normal Image ...... 155 Figure 5.10: Misclassified Normal Image ...... 156 Figure 5.11: Confusion Matrix for CNN-1 ...... 159 Swinburne University of Technology Sarawak Campus | List of Figures xii

THESIS – DOCTOR OF PHILOSOPHY (PHD)

Figure 5.12: Confusion Matrix for CNN-2 ...... 160 Figure 5.13: Confusion Matrix for CNN-3 ...... 161 Figure 5.14: CRVO Image Misclassified as BRVO Image by CNN-1 ...... 163 Figure 5.15: Normal Image Misclassified as BRVO Image by CNN-1 ...... 164 Figure 5.16: BRVO Image Misclassified as HRVO by CNN-2 ...... 164 Figure 5.17: CRVO Image Misclassified as HRVO by CNN-3 ...... 164 Figure 5.18: Confusion matrix of CCNN for 4-class classification ...... 165 Figure 5.19: Performance Comparison of Existing Methods and the Proposed Method for Early DR Detection ...... 169 Figure 5.20: Performance Comparison with Methods using MESSIDOR Database 169 Figure 5.21: Performance Comparison of Mild NPDR Detection ...... 170 Figure 5.22: Performance Comparison with Deep Learning Methods using Kaggle Database ...... 171 Figure 5.23: Performance Comparison for 3-class Classification of DR ...... 171 Figure 5.24: Accuracy Comparison of State-of-the Art methods and Proposed Method for BRVO Detection ...... 175 Figure 5.25: Fractal Dimension of Normal and RVO Images ...... 180 Figure 5.26: Confusion Matrix of RVO detection using VGG16 ...... 184 Figure 5.27: Confusion Matrix of RVO detection using Inception V3 ...... 185

Swinburne University of Technology Sarawak Campus | List of Figures xiii

THESIS – DOCTOR OF PHILOSOPHY (PHD)

List of Table

Table 1.1: Prevalence Rate based on Age, Ethnicity in Asia (S. Rogers et al. 2010) ... 4 Table 1.2: Prevalence Rate based on Age, Ethnicity in Australia (S. Rogers et al. 2010) ...... 5 Table 1.3: Population based Prevalence Rate of CRVO, BRVO and Any RVO (S. Rogers et al. 2010) ...... 5 Table 3.1 State-of-the-art for DR Detection using Exudates Segmentation ...... 48 Table 3.2 State-of-the-art for DR Detection using Red Lesions ...... 52 Table 3.3 State-of-the-art for DR Screening Methods ...... 56 Table 3.4 State-of-the-Art of RVO Detection ...... 64 Table 3.5 Significant Existing Techniques for RVO Detection ...... 64 Table 4.1 The Proposed CNN Configuration for RVO Detection ...... 107 Table 4.2 LeNet-5 Architecture vs. Proposed CNN architecture ...... 118 Table 4.3 State-of-the Art Deep Learning Model for DR detection vs. Proposed CNN ...... 118 Table 4.4: The CNN Model Proposed in Camino et al. vs. Proposed CNN Model .. 120 Table 4.5 State-of-the-art CNN Architecture used for RVO detection vs. Proposed Cascaded CNN for RVO detection...... 132 Table 5.1 Details of Collected Images ...... 143 Table 5.2 Details of the Images used for Training and Testing for Detection of Stage- 1 DR (Messidor Database) ...... 146 Table 5.3 Performance Evaluation for the Detection of DR at the Earliest Stage (Messidor Database) ...... 147 Table 5.4 Details of the Images used for Training and Testing for Detection of Stage- 1 DR (Kaggle Database) ...... 148 Table 5.5. Performance Evaluation for the Detection of DR at the Earliest Stage (Kaggle Database) ...... 149 Table 5.6 Details of Training and Testing Images for DR Severity Detection ...... 151 Table 5.7 Performance Evaluation for 3-Class Classification of DR ...... 152 Table 5.8 Details of the Images Selected for Training and Testing ...... 153 Table 5.9 Performance Evaluation for RVO Detection ...... 153

Swinburne University of Technology Sarawak Campus | List of Table xiv

THESIS – DOCTOR OF PHILOSOPHY (PHD)

Table 5.10 Details of the Images used for Training and Testing for Retinal Vascular Diseases ...... 155 Table 5.11 Performance Evaluation for DR and RVO Detection ...... 155 Table 5.12 Details of the Images Selected for Training and Testing ...... 157 Table 5.13 Details of the Images used for Training and Testing after Data Augmentation ...... 158 Table 5.14 Performance Evaluation of the individual CNNs in Designed Cascade Network ...... 161 Table 5.15 Performance Evaluation for Detection of RVO Types by Cascade Network ...... 163 Table 5.16 The Performance Comparison of State-of-the-Art Methods and Proposed Method for DR Diagnosis ...... 167 Table 5.17 Performance Comparison of the Designed Cascade Network of CNNs with the existing methods ...... 174 Table 5.18: Performance Comparison for RVO Detection ...... 176

Swinburne University of Technology Sarawak Campus | List of Table xv

THESIS – DOCTOR OF PHILOSOPHY (PHD)

List of Acronyms

DR Diabetic Retinopathy

NPDR Non-Proliferative Diabetic Retinopathy

PDR Proliferative Diabetic Retinopathy

RVO Retinal Vein Occlusion

BRVO Branch Retinal Vein Occlusion

CRVO Central Retinal Vein Occlusion

HRVO Hemiretinal Vein Occlusion

BL Bright Lesions

RL Red Lesions

MA Microaneurysms

HA Hemorrhages

CW Cotton Wool spots

ME Macular Edema

CAD Computer Aided Diagnosis

FA Fluorescent Angiography

AI Artificial Intelligence

DL Deep Learning

ML Machine Learning

CNN Convolutional Neural Network

CCNN Deep Cascaded Network

ReLU Rectified Linear Unit

CLAHE Contrast Limited Adaptive Histogram Equalization

PPV Positive Predictive Value

NPV Negative Predictive Value Swinburne University of Technology Sarawak Campus | List of Acronyms xvi

THESIS – DOCTOR OF PHILOSOPHY (PHD)

Chapter-1

1 Introduction

In recent years, the interest in the use of digital image processing for analysing medical and biomedical images for the purpose of diagnosis and follow-up treatment has grown exponentially. The reasons for this can be attributed to the advancement in camera and computer technology. The advancement in camera technology has enabled the biological tissues to be captured with a higher resolution. Again, the advancement in computer technology has ensured that there is enough computer power for pattern recognition and image analysis processes to be applied to the images. In addition, there is also increasing awareness among the researchers and medical practitioners to have a better understanding of the underlying causes and progression of a specific disease, because the better understanding of disease progression can lead to better general health care.

Currently, manual assessment by an expert is used to extract the information from the image. Generally, the expert will use his/her knowledge, memory, intuition, and diligence to recognize specific image patterns or object of interest and from there use his/her reasoning skills to detect the relationship between the perceived patterns or object and potential disease. This toilsome process is subjected to inter and intra- observer variability (Bock et al. 2010; Popović & Thomas 2017) and errors such as perceptual errors and errors in judgement (Berlin 1996; Bruno, Michael A. Walker & Abujudeh 2015; Brady & Brady 2017). Factors such as attitude, beliefs, expectation, preconceptions, and fatigue have been known to contribute to these errors.

Over the last two centuries, several attempts have been made by the researchers to automate the process. The requirement of novel healthcare solutions has pushed the researchers in the Biological Sciences to continuously put efforts in minimizing the biological bases of pathologies and have extensive research on it

Swinburne University of Technology Sarawak Campus | Introduction 1

THESIS – DOCTOR OF PHILOSOPHY (PHD)

(Coleman 2007; Hickey et al. 2019). The biological systems are highly complex, diverse, and high-dimensional; so considering its intrinsic complexity and noise contaminations during image acquisition, it is a huge challenge to infer meaningful conclusions from the acquired data (Marx 2013; Gligorijevic 2015; Mahmud & Vassanelli 2016). Therefore, it is essential that the novel instruments are robust, reusable, reliable, and accurate in processing and analysing biological data (Li & Chen 2014; Lee et al. 2018). This leads to the phenomenal progress in biological and biomedical research as the several scientists from both life science and computing science disciplines are motivated to commence a multidisciplinary approach to elucidate functions and dynamics of living organisms (Wickware 2000; Cvijovic et al. 2016). Therefore, several methods of Artificial Intelligence (AI), particularly machine learning (ML), have been proposed over time to facilitate classification, prediction, and detection of patterns in biological data (Tarca et al. 2007; Pesapane, Codari & Sardanelli 2018; Lee et al. 2018).

1.1 Research Overview

The retina is not only an important part of the visual system, but it has the potential to indicate the general health of the other parts of the human body. The images of the retina provide helpful information about various eye diseases, and thus, the retinal image analysis is considered as a traditional approach to diagnose eye diseases. However, the different types of lesions found in the case of eye disease, and their quantities, can also be associated with other non-eye diseases. The retina is the sole location where blood vessel-related and other specific lesions can be observed in vivo, and recent studies have shown that these retinal abnormalities are predictive to various major diseases, viz. Diabetes, Cardiovascular diseases like Hypertension and Coronary heart disease, Kidney disease and Stroke (Besenczi, Tóth & Hajdu 2016). Retinal vascular disorders refer to the variety of eye diseases that affect the blood vessels in the eye and damage the retina. The major blood vascular diseases are Diabetic Retinopathy, Hypertensive Retinopathy, Retinal Vein Occlusion, and Central Retinal Occlusion. The two most popular sight-threatening disorders are Diabetic Retinopathy (DR) and Retinal Vein Occlusion (RVO).

Swinburne University of Technology Sarawak Campus | Introduction 2

THESIS – DOCTOR OF PHILOSOPHY (PHD)

Diabetic Retinopathy (DR) is a disease instigated by Diabetes Mellitus, which causes changes in the blood vessels of the retina. Due to the increased glucose level in the blood, the capillaries in the retina get damaged and start leaking blood and fluid (Hu 2003; Nayak et al. 2008; Lechner, O’Leary & Stitt 2017; Scanlon 2019). According to the world health organization (WHO) diabetes mellitus is an enduring disorder triggered when either there is insufficient production of insulin (type 1 diabetes) or when the body fails to utilize the benefits of insulin produced by the pancreas (type 2 diabetes) (Cheung, Mitchell & Wong 2010; Scanlon 2019). Since DR is a lifelong disease, during the first twenty years of disease, approximately all patients having type-1 diabetes and more than 60% of patients having type-2 diabetes suffer from retinopathy (Fong et al. 2004; Lechner, O’Leary & Stitt 2017; Scanlon 2019). The principal risk factors of diabetes are increasing age, obesity, and sedentary lifestyle (Nayak et al. 2008; Scanlon 2019). DR is the prime cause of visual impairment among adults within the age range 20-74 year. The estimated prevalence rate of diabetes is expected to increase from 2.8% to 4.4% in the time period of 2000 − 2030. The overall number of people suffered from diabetes is expected to increase from 171 million in 2000 to 360 million in 2030 (Wild et al. 2004; Sabanayagam et al. 2019). Retinopathy is characterized by the presence of microaneurysms, hemorrhages, hard exudates, Cotton Wool Spots, and Venous Loops (Nayak et al. 2008; Lechner, O’Leary & Stitt 2017); and detection of these red and bright lesions is crucial for diabetic retinopathy screening systems (Salamat, Missen & Rashid 2019; Dai et al. 2019). DR is generally classified into 2 types: Proliferative Diabetic Retinopathy (PDR) and Non-Proliferative Diabetic Retinopathy (NPDR). NPDR can be further sub-classified as mild, moderate, and severe NPDR. To prevent severe visual loss, it is compulsory for the diabetic patients to attend regular diabetic eye screening programs and receive optimal treatments (Yen & Leong 2008; Lechner, O’Leary & Stitt 2017).

Now, Retinal Vein Occlusion (RVO) is the second most popular reason for vision loss after Diabetic Retinopathy (DR). RVO is a blood vascular disease caused due to thrombosis formation in the blood vessels. There are two types of retinal veins: one central vein and many smaller branch veins. Based on the location of the blockage, RVO can be categorised into Branch Retinal Vein Occlusion (BRVO), Central Retinal Vein Occlusion (CRVO), and Hemi-retinal Vein Occlusion (HRVO).

Swinburne University of Technology Sarawak Campus | Introduction 3

THESIS – DOCTOR OF PHILOSOPHY (PHD)

The risk factors for retinal vein occlusion include high , diabetes, high blood pressure (hypertension), glaucoma, age-related blood vessel disorder and certain blood disorders (Hykin 2015). According to Australian data, the prevalence rate of RVO is 0.7% for people younger than 60 years, 1.2% for people in the age 60 – 69 years, 2.1% for people in the age 70 – 79 years, and it increases to 4.6% in people aged 80 years or above (Hykin 2015; Sivaprasad et al. 2015). According to the report of the year 2015-2016, conducted by the IEDC (International Eye Disease Consortium), about 16.4 million adults are affected by RVO across the world. The estimated incidence rate of RVO ranges between 0.53 and 1.6/1000 persons/year (Sivaprasad et al. 2015). The prevalence rate of RVO in Asia and Australia in terms of age and ethnicity is shown in Table 1.1 and Table 1.2. The population based prevalence rate of BRVO, CRVO and any other RVO is shown in Table 1.3. Furthermore, RVO is associated with a significantly increased long-term risk of cardiovascular morbidity and mortality (Liew et al. 2011). The recent studies have shown that the blockage in the retinal vein can be an indication to the possible blockage in cardiac veins and nerves in the human brain as well (Flammer et al. 2013; K.H. Cho et al. 2017). RVO causes abrupt, painless vision loss in one eye first, then spread to the other eye and ensures complete blindness over the time. The clinical characteristics of RVO include dilated and tortuous veins, dot and flame-shaped hemorrhages, Cotton Wool spots, and neovascularization (Hykin 2015). However, the initial symptoms of RVO are so subtle that manually perceiving those signs in the retina image and detecting the types of RVO is a labour-intensive and time- consuming process.

Table 1.1: Prevalence Rate based on Age, Ethnicity in Asia (S. Rogers et al. 2010)

Standardized Prevalence Organization Age Range Ethnicity (/1000) Beijing Eye Study (China) 40-101 100% Chinese 5.27 HandanEye Study (China) 30 - 97 100% Chinese 6.70 Funagata Study (Japan) 34 -96 100% Japanese 4.09 Hisayama Study (Japan) 40 - 96 100% Japanese 10.09 ShihpaiEye Study (Taiwan) 65 - 90 100% Chinese 3.83 Singapore Malay Eye 40 - 80 100% Malay 3.56 (Singapore)

Swinburne University of Technology Sarawak Campus | Introduction 4

THESIS – DOCTOR OF PHILOSOPHY (PHD)

Table 1.2: Prevalence Rate based on Age, Ethnicity in Australia (S. Rogers et al. 2010)

Organization Age Range Ethnicity Standardized Prevalence (/1000)

Blue Mountains Eye 45 - 97 99% White 7.14 Study (Australia)

Table 1.3: Population based Prevalence Rate of CRVO, BRVO and Any RVO (S. Rogers et al. 2010)

Region Age Range RVO (All) BRVO CRVO

Australia ≥ 49 1.6 1.1 0.5 China ≥ 40 1.3 1.2 0.12 Europe ≥ 65 0.8 0.6 0.2 Japan ≥ 40 2.1 2.0 0.2 Singapore 40-80 0.7 0.6 0.2 US (Beaver Damn Eye 43-84 0.8 0.6 0.1 Study) US (Multi-Ethnic Study of 45-84 1.1 0.9 0.2 ) Global ≥ 30 0.52 0.44 0.08

The direct management of these sight-threatening retinal diseases and related complications in patients mostly involves a promising treatment called intravitreal anti-vascular endothelial growth factor drugs. The other potential strategies used by the ophthalmologists include laser therapy, surgery, the administration of intravitreal steroids or hemodilution (Ageno & Squizzato 2011; Peng et al. 2019). If the patients receive the therapy in proper time, there would be a high impact on the chances of curing the disease successfully; and the stratification of patients according to the type of retinal diseases such as DR and RVO would help in managing these diseases more effectively as the initial stratification can improve the evaluation of the clinical outcomes. Therefore, it has become utmost important to detect the retinal abnormalities caused due to DR or/and RVO as early as possible.

Swinburne University of Technology Sarawak Campus | Introduction 5

THESIS – DOCTOR OF PHILOSOPHY (PHD)

If these blood vascular diseases are detected in the initial stage, it is possible to treat the diseases; otherwise, these can lead to permanent vision loss. Unfortunately, many parts of the world lack proper medical facilities and medical specialists capable of detecting these diseases are unavailable. Therefore, in this research, an automated diagnosing methodology has been proposed for retinal image analysis and for detecting the causes of abnormalities as early as possible. The proposed automated method can avoid human errors and, thereby, it can help the ophthalmologist to provide early treatment and avoid total vision loss. This method will make it possible to provide services to remote areas as well. This dissertation is mainly focused on analysing retinal images and diagnose the early signs of retinal abnormalities caused by retinal blood vascular diseases such as DR or/and RVO. In addition to that, it is concerned with detecting types of the disease and grading the severity level.

1.2 Research Gaps

The literature on the detection of retinal blood vascular diseases is mostly dominated by the research works for DR detection. A few research groups have focused on the detection of other retinal blood vascular diseases. Various algorithms for Hypertensive Retinopathy detection, using blood vessels as features, are proposed by the researchers in (Ortíz et al. 2010; Agurto et al. 2014; Irshad et al. 2015; Khitran et al. 2015; Syahputra et al. 2017). The Central Retinal Artery Occlusion detection methodology is presented in (Foroozan, Savino & Sergott 2002; Riccardi, Siniscalchi & Lerza 2016).

For the past ten years, several research works have been carried out for the development of automated DR diagnosis using different clinical features such as microaneurysms, hemorrhages, exudates, blood vessels, node points, and textures etc. Some of the popular screening tools for DR have been developed by (Usher et al. 2003; Sinthanayothin et al. 2003; Aptel et al. 2008; Reza & Eswaran 2011; Gardner et al. 1996; Kahai, Namuduri & Thompson 2006; Osareh, Shadgar & Markham 2009; Quellec et al. 2008; Franklin & Rajan 2014; Archana et al. 2015; Felfeli et al. 2019). Most of the existing works are mainly dependent on the segmentation and feature extraction methods for proper classification. The state-of-the-art available is mostly 2- class classification methods, i.e. detecting Normal or DR image. These methods are

Swinburne University of Technology Sarawak Campus | Introduction 6

THESIS – DOCTOR OF PHILOSOPHY (PHD) dependent on the segmentation of the abnormal lesions those occur in much severe stage. Then, some of the methods focussed on grading the DR (Acharya et al. 2009; Yun et al. 2008; Usman Akram et al. 2014; Antal & Hajdu 2014; Sudhir. S. Kanade 2015). While focussing mainly on grading the severity of DR, the importance of the early detection of DR is overlooked. The majority of the existing methods have shown poor performance in detecting mild-NPDR as these methods rely on segmentation of the microaneurysms, which are difficult to distinguish from tiny blood vessels. The majority of the DR detection/screening methods have been dependent on the segmentation and feature engineering algorithms. The overall performance of those DR detection/classification methods is highly dependent on the performance of the individual performances of segmentation and feature extraction algorithms.

In comparison to the vast research works available for automatic detection of ocular diseases like Diabetic Retinopathy, Glaucoma etc., RVO is still the least exploited vascular disease in the area of automated diagnostic methods. The state-of- the-art for automatic detection of RVO is very limited. There is a lack of adequate research on automatic detection of RVO in the early stage. Currently, the available methods are mostly for BRVO detection. It is noteworthy to mention over here is that there is no such method has been proposed till date, which is able to successfully detect all three types of RVO, viz., BRVO, CRVO, and HRVO.

1.3 Research Challenges

The main challenge in the retinal image analysis is that the textural change in the retina due to both eye and non-eye disease is similar. Similar changes can occur in the blood vessels, or similar lesions might appear for multiple disease or other complications. The prominent lesions in abnormal retina include red lesions and yellow-white lesions, and these lesions can be caused by a wide range of conditions. For example, the cause of yellow-white retinal lesions could be Drusen and Drusen like conditions, Hard Exudates, Cotton Wool Spots, Retinal Infiltrate, Choroidal Neovascularization etc. There is minimal texture difference among these whitish lesions. Similarly, the possible cause of retinal Exudates can be vascular disorders such as Diabetic Retinopathy, Retinal Vein Occlusion, Hypertensive Retinopathy and other infectious disorders such as Syphilis, Toxoplasmosis, and viral disease

Swinburne University of Technology Sarawak Campus | Introduction 7

THESIS – DOCTOR OF PHILOSOPHY (PHD) etc. Again, Cotton Wool Spots can be seen in the ischemic stage of Retinal Vein Occlusion, hypertension, diabetes, anaemia, etc. In the same way, dark red lesions can be microaneurysms and haemorrhages that might appear due to various conditions. For example, the haemorrhages are the common clinical feature that appears in Diabetic Retinopathy (DR), Retinal Vein Occlusion (RVO), and Hypertensive Retinopathy etc. Besides, the appearance of all these lesions varies greatly according to the disease and its severity. Moreover, the blood vascular structure changes during Vein Occlusion, hypertension, and other vascular disorders. Therefore, for diagnosing the retinal abnormalities, the diagnostic systems are greatly relying on the segmentation and feature extraction of these heterogeneous lesions having similar intra-texture. Therefore, identifying or detecting the particular disease based on these collective clinical signs is elusive.

The clinical signs of DR include microaneurysms, haemorrhages, hard and soft exudates, and neovascularization. The clinical features of RVO include dot and flame-shaped haemorrhages, dilated tortuous veins, cotton wool spot, and neovascularization in the ischemic phase. The literatures in automatic detection of these clinical features are quite discrete. For e.g., the literature does not provide clear explanations of the various types of haemorrhages and their detection methods. Numerous methods are available for detection of microaneurysms in case of DR. But, the literature lacks methods of identifying other types of haemorrhages. The haemorrhages in case of RVO vary in size, color, and texture. The types of haemorrhages that can be found in RVO are either dot haemorrhages or flame- shaped haemorrhages. There is no such algorithm available for detecting dot and flame-shaped haemorrhages. The appearance of dot haemorrhages is quite similar to the microaneurysms. Hence, the available segmentation methods fail to distinguish between the microaneurysms and the dot haemorrhage. That means the same segmentation algorithms cannot be used for dot haemorrhages, because the existing automated microaneurysm detection processes detect the dot haemorrhages as microaneurysm by default. There are plenty of methods for automatic detection of hard exudates, whereas a few methods have discussed the detection of cotton wool spots (Niemeijer et al. 2007; Sreng et al. 2019). The state-of-the-art methods for blood vessel detection mainly use various filters, vector geometry, and various statistical distribution techniques for vessel segmentation; and then use machine

Swinburne University of Technology Sarawak Campus | Introduction 8

THESIS – DOCTOR OF PHILOSOPHY (PHD) learning models for vessel detection. Again, it is computationally expensive to compute the vessel dilation and the tortuosity. For all these available methods, the hand-designed feature extraction algorithms or heuristic assumptions play a pivotal role in solving the particular problem. The inherent weaknesses of these methods are that they are not generalized to learn pattern attributes from the data itself, and that makes the performance of the whole method vulnerable. Therefore, the challenges, here, are: how to quantify all these clinical features to detect the RVO and its types? Although there is a plenty of research work available for detecting DR, the earliest possible detection of DR is still a great challenge.

1.4 Research Hypothesis

As discussed earlier, the retinal blood vasculature is rich in texture and its texture changes when affected by any eye disease or other health condition. Therefore, the texture analysis is considered to be suitable for analysing retinal images. However, it is worth mentioning that clinicians themselves depend on the visual texture to identify and describe abnormalities in an image. Now, there is evidence that it is difficult for the human visual system to discriminate the textural information related to higher-order statistics or spectral properties of an image (Julesz et al. 1973). Therefore, the automated texture analysis techniques can be a potential complement to the visual skills of the clinicians as those techniques can extract the visually unextractable features relevant to the problem at hand.

One of the major challenges in analysing medical images is extracting the relevant information from the image, and using it for further processing such as for quantifying specific disease and abnormalities. One of the ways whereby the information can be extracted is through segmentation of specific structures or objects. Based on the segmented object several parameters can be obtained and used to quantify the disease. Another way used for quantifying the disease from an image is by assessing the overall appearances of the image. Using this approach, a unique image pattern is extracted from the image and reasoning is made to identify the relationship between the observed patterns and probable diagnosis. Now, how to design a computer algorithm that can analyse the texture of retina and understand the retinal abnormalities caused due to retinal blood vascular diseases such as DR and

Swinburne University of Technology Sarawak Campus | Introduction 9

THESIS – DOCTOR OF PHILOSOPHY (PHD)

RVO? How the algorithm can understand the difference between the lesions appearing due to DR or RVO as they possess similar lesions? How to choose and extract structural features so that the machine can distinguish the types of DR and RVO?

There exists Computer-Aided Detection (CAD) method that uses various algorithms to process the digital image and evaluates the possible disease by highlighting the suspicious region in the image. This computer algorithm usually comprises several steps, including image processing, segmentation, feature extraction, and data classification (Castellino 2005; Shiraishi et al. 2011; Yanase & Triantaphyllou 2019). However, most of these traditional automatic disease detection methods suffer from loss of classification accuracy due to inefficient segmentation and feature extraction algorithm. These segmentation and feature extraction algorithms are vastly dependent on the image quality, reflection, noise present in the images, and the camera position. Thus, the classification accuracy is immensely affected when anything goes wrong at the time of image acquisition. Therefore, a Deep Learning (DL) based method has been proposed to solve these issues and help the clinicians to detect the retinal abnormalities caused by DR and RVO as early as possible irrespective of the image quality. The fundamental conception behind deep learning is that it learns abstract representations of the data through multiple levels. In a hierarchical fashion, it extracts a more abstract representation of the image in each level. Since it allows a system to comprehend and learn a complex representation directly from the raw data, this type of tiered learning method is very powerful for analysing the retinal image and carefully investigate the texture to detect any abnormality. (Mahmud et al. 2018; Bengio 2009)

Among various architectures of Deep Learning, Convolutional Neural Network (CNN) has been selected for this research as a Convolutional Neural Network (CNN) is particularly well suited for texture analysis. Therefore, a CNN can an excellent model for analysing retinal images as it learns abstract representations with a higher degree of semantics. It uses a series of learnable filters with shared weights and local connections, and hence the design of CNN is naturally perfect for texture analysis, which enables to detect patterns at all locations in the image (Andrearczyk & Whelan 2017). Therefore, CNN can help to learn and recognize texture patterns of various complexity and scales present in the DR and RVO images. Swinburne University of Technology Sarawak Campus | Introduction 10

THESIS – DOCTOR OF PHILOSOPHY (PHD)

By using a hierarchical architecture of CNN combined with a powerful learning algorithm, we can get rid of extra segmentation and handcrafted feature extraction algorithms. In deeper layers, CNN starts to detect global structures and shapes over simple texture patterns. Therefore, CNN can help to analyse different clinical signs of the retina caused due to DR and RVO and detect the disease as early as possible.

1.5 Research Objective

The aforementioned research challenges described in Section 1.4 are addressed to analyse the retinal abnormalities in a more effective way and detect textural changes related to RVO and DR as early as possible. The aim of the research is to develop a fully automated detection method for retinal image analysis and to detect retinal abnormalities caused by retinal blood vascular diseases, particularly RVO and DR, using Deep Learning approach. Therefore, the major research objectives in this dissertation are to:

1. Study the color fundus images of the retina affected by Retinal Vein Occlusion (RVO) and Diabetic Retinopathy (DR). Investigate the texture of normal retina features and the abnormal features caused by RVO and DR in order to utilize them for early diagnosis. 2. Study the deep learning based basic architecture of the Convolutional Neural Network (CNN), and establish a hypothesis about designing CNN, the relation between input size and network for computing the complexity and computational cost. 3. Design a standard Deep Learning based algorithm to diagnose RVO and its types irrespective of the image quality and compare its performance with the state-of-the-art techniques. 4. Design a standard Deep Learning based algorithm to detect DR as early as possible and grade the severity level irrespective of the image quality. 5. Develop a Deep Learning based Computer Aided Detection (CAD) method for retinal blood vascular disease classification (DR and RVO) irrespective of their common visual features.

Swinburne University of Technology Sarawak Campus | Introduction 11

THESIS – DOCTOR OF PHILOSOPHY (PHD)

1.6 Research Contribution

This research on diagnosing retinal blood vascular diseases has following main contributions:

1. This research would help the ophthalmologists detect major vascular diseases at the early stage, which is to prevent further deterioration of retinal health that leads to visual impairment and any other related health issues.

2. The fundamentals of designing CNN have been analysed in depth, and a novel architecture has been designed based on LeNet-5 architecture. The proposed model is a 13 layered CNN consisting of 5 Convolution layers, 3 Max-pooling layers, 3 Batch normalization layers, 1 Rectified Liner Unit (ReLu) layer, and 1 Fully Connected layer for detecting retinal blood vascular diseases. This proposed model is computationally inexpensive and efficient in diagnosing retinal abnormalities related to DR and RVO.

3. The hypothesis for designing CNN has been set and explained how image size, filter size, parameters, and hyperparameters affect the size or deepness of CNN models.

4. A hypothesis has been set and proved, which says, It is possible to use simple yet effective CNN models for some particular tasks rather than using deep models; if the number of target class is not more than 5, it is better to use simple task specific CNN model to avoid complexity, computational, memory, and time cost.

5. The efficient learning algorithm can detect DR at the earliest stage effectively, which make it a potential model to stop the disease progression and prevent the blindness among the diabetes affected patients.

6. A novel Deep Cascaded Network has been proposed, which is a chain of three identical CNN of same proposed configuration. This architecture has been specially designed to diagnose RVO, and it is the first of its kind to detect all

Swinburne University of Technology Sarawak Campus | Introduction 12

THESIS – DOCTOR OF PHILOSOPHY (PHD)

the variants of RVO. The carefully designed learning algorithm has made it an efficient architecture for classifying ambiguous features of HRVO.

7. The proposed deep learning based method provides a generalized model for diagnosing retinal blood vascular diseases, which is capable of differentiating disease with similar lesions. While the literature on retinal abnormality detection methods are individualistic, the proposed deep learning model has the potential to diagnose multiple diseases.

1.7 Thesis Organization

This dissertation has been organized as follows: Chapter-1 has briefly introduced the research preview, research challenges, the research hypothesis and the objectives of this dissertation. The rest of the thesis has been organized into five chapters. The layout of this dissertation is as follows:

Chapter 2: Background

In this chapter, there is a brief introduction to the anatomy of the eye and retina, different blood vascular diseases, and tools for capturing retinal images. This chapter will also explain more about Retinal Vein Occlusion (RVO), Diabetic Retinopathy (DR) and their types, risk factors, diagnosis and treatment. There is brief a introduction about Computer Aided Detection (CAD) systems and Deep Learning.

Chapter 3: Literature Review

In this chapter, the state- of-the-art is explored for automatic detection of DR and RVO in two different subsections. Along with the detailed existing literature, a tabular comparison of performances of various approaches for both DR and RVO detection has been provided.

Chapter 4: Material and Methods

In this chapter, a brief introduction of Deep Learning (DL) is provided and different available Deep Learning models are explored. Mainly, the architecture of the Convolutional Neural Network (CNN) has been explored in detail. The architecture of

Swinburne University of Technology Sarawak Campus | Introduction 13

THESIS – DOCTOR OF PHILOSOPHY (PHD) the basic CNN is exploited to analyse the retinal abnormality and detect DR and RVO. The challenges in detecting RVO and DR using deep learning models are evaluated. After that, the hypothesis about designing CNN architecture has been set. A simple CNN is proposed for diagnosing retinal abnormality caused by DR and RVO. Then, a Cascaded CNN is proposed, which is a chain of 3 CNNs of the same configuration to particularly detect all three types of RVO.

Chapter 5: Experimental Results

In this chapter, the designed CNN is evaluated with various experiments. The details about the databases used for experiments are provided. The verification and validation are done for the proposed deep learning based method by testing DR and RVO images individually. The ability of the proposed architecture of CNN has been further evaluated if it can efficiently distinguish DR and RVO images as they possess similar lesions. Three training methods are elaborated for successful classification of RVO types on the test sets. The proposed method has been further evaluated by comparing it with the existing state-of-the-art for DR and RVO detection.

Chapter 6: Conclusion and Future Work

This chapter presents the conclusions which are derived from the proposed methodology on deep learning for detecting retinal abnormality with more emphasis on achievements and limitations. The scopes for future research works are also highlighted at the end.

Swinburne University of Technology Sarawak Campus | Introduction 14

THESIS – DOCTOR OF PHILOSOPHY (PHD)

Chapter-2

2 Background

In this chapter, the background of the research is explained. Here, the anatomy of the eye, retina, different imaging techniques, and retinal blood vascular diseases are described. Section 2.1 briefly describes the anatomy of the human eye and the different layers of the retina. Section 2.2 introduces different tools available for capturing the retinal image. Section 2.3 discusses about various abnormal lesions of the retina, such as microaneurysms, haemorrhages, exudates, and drusen. Section 2.4 describes Diabetic Retinopathy (DR) and its manifestations; Section 2.5 describes Retinal Vein Occlusion (RVO) and its manifestations.

2.1. Eye Anatomy

The human eye is the most complex part of the body as the information about the surrounding are viewed through the eyes. An eye is like a camera that gathers light and converts into image. The cornea is the outermost lens of the eye that comes in contact with the lights. The lights are filtered and refracted by the cornea and pass through the iris and the pupil to converge the image on the retina. The iris regulates the amount of light that can go inside the eye by contraction or dilation in order to adjust the pupil size. The auxiliary muscles help the lens to change its shape so that the lens can bring objects into focus. The lens further improves the already refined image coming through the cornea and projects it onto the retina (Gross, Blechinger & Achtner 2008; J. Anderson 2017). A cross section of the human eye anatomy is presented in Fig. 2.1. This research is mainly concerned with the retina and its texture; therefore, the other ocular structures will not be discussed any further. The structure of the retina is illustrated in depth in the following section.

Swinburne University of Technology Sarawak Campus | Background 15

THESIS – DOCTOR OF PHILOSOPHY (PHD)

Figure 2.1 Anatomy of Human Eye

2.2. Retina

The retina is a thin membrane that lies in the innermost surface of the eye. It is a thin layer of light-sensitive tissue that receives the light rays passed through the cornea and lens. Those lights are converted into electrical impulses by the retina and are sent to the brain via the optic nerve and then the brain expounds them as an image. The cornea and lens act like the camera lens, while the retina acts like the film. Therefore, the retina captures blurry images when the cornea and lens do not focus the images properly. The retina comprises two types of photoreceptors, rods and the cones, responsible for visual phototransduction (Stein, Stein & Freeman 2006; J. Anderson 2017). Fig. 2.2 shows the retinal color fundus image and its different components such as optic disc, blood vessels, fovea, and macula.

The retina is composed of ten distinguishable layers as shown in Fig. 2.3. The different layers of the retina are organized as follows (Garhart & Lakshminarayanan 2016):

1. The Inner Limiting Membrane (ILM) is the innermost layer of the retina that separates it from the vitreous body.

Swinburne University of Technology Sarawak Campus | Background 16

THESIS – DOCTOR OF PHILOSOPHY (PHD)

2. Nerve Fibre Layer (NFL) is formed by the ganglion cell axons, which is the expansion of the fibres of the optic disc. 3. Ganglion Cell Layer (GCL) encompasses the ganglion cells and projects axons toward the optic nerve. 4. Inner Plexiform Layer (IPL) contains the synaptic connections between the axons of bipolar cell and dendrites of ganglion cells. 5. Inner Nuclear Layer (INL) consists of bipolar cell, horizontal cells, and amacrine cells. 6. Outer Plexiform Layer (OPL) is a layer of neuronal synapses and consists of bipolar cells, horizontal and synapses of photoreceptors. 7. Outer Nuclear Layer (ONL) is the light detecting portion of the eye consisting of the nuclei of photoreceptors. 8. Outer Limiting Membrane (OLM) situated at the bases of photoreceptors and separates the nuclei from the inner and outer segments of photoreceptor. 9. Photoreceptor Layer contains the rods and cones. 10. The Pigment Epithelium Layer is the outermost layer that contains pigmented cells called Melanin and is attached to the choroid that nourishes the retinal cells.

Figure 2.2 Retina and its different components

Swinburne University of Technology Sarawak Campus | Background 17

THESIS – DOCTOR OF PHILOSOPHY (PHD)

Figure 2.3 Organization of different layers in Retina

2.3. Retinal Image Screening Techniques

There is a wide range of cameras to capture the retinal image. For obtaining the retinal fundus images, the cameras are classified into mydriatic and non-mydriatic retinal image cameras. To capture the retina images using the mydriatic camera, it is required to dilate the retina by administrating dilatation drops into the eyes of the patient. This camera is used especially in those cases where the pupil of the patient is ≤ 4 mm. A non-mydriatic camera is the most commonly used tool for acquiring the retinal fundus images. Fig. 2.4 shows two very popular non-mydriatic cameras: (a) Canon CR6-45NM (Canon USA, Lake Success, EEUU) and (b) Topcon TRC-NW6S (Topcon America Corp., Paramus, EEUU). The ophthalmologists and optometrists can immediately capture ultra-high-resolution digital images of the retina using these cameras.

Swinburne University of Technology Sarawak Campus | Background 18

THESIS – DOCTOR OF PHILOSOPHY (PHD)

(a) (b)

Figure 2.4 Two types of non-mydriatic retinal image cameras: (a) Canon CR6-45NM and (b) Topcon TRC-NW6S

Fundus retinal image acquisition techniques are roughly classified into the following types

1. Digital Fluorescein Angiography: The digital fluorescing angiography is the retinal image acquisition technique that uses dye tracing method. Before capturing the retina image, Sodium fluorescein is injected into the blood stream. When the emitted fluorescence illuminates the retinal blood vessels with blue light at a wavelength of 490 nanometres, the photograph of the retina is obtained. The fluorescein dye stays in the bloodstream for a long period. It can cause the urine of the patient yellow-green in appearance after 12-24 hours. This dye helps the ophthalmologist to evaluate retinal blood vessel patterns and find any pathologic change, such as abnormal vessels, diabetic retinopathy, staining, and tumours etc. However, the use of the dye has adverse side effects such as nausea, vomiting, upset stomach, skin allergy, and headache etc. It is, in fact, the most invasive technique for the patient. Fig. 2.5 shows the retina image captured by fluorescein angiography technique.

Swinburne University of Technology Sarawak Campus | Background 19

THESIS – DOCTOR OF PHILOSOPHY (PHD)

Figure 2.5 Digital Fluorescing Angiography Image of Retina

2. Digital Colour Fundus Photography: The digital color fundus photography is acquired using a customised camera, which is attached to a microscope with sophisticated lenses and mirrors. The camera flash passes through the cornea, pupil, and lens to focus on the retina. Then, the ophthalmologist visualises the back of the eye captured by the high-powered lenses. Fundus photography is used to evaluate the health of the retina and retinal components such as optic disc, blood vessels, vitreous, and macula. Fig. 2.6 shows the digital color fundus image of the retina.

Figure 2.6 Colour Fundus Photography

Swinburne University of Technology Sarawak Campus | Background 20

THESIS – DOCTOR OF PHILOSOPHY (PHD)

3. Digital red free photography: The digital red-free photography is acquired by focusing invisible infrared light to illuminate the retina during infection. The images are captured using a mild white Zenon flash. Therefore, the patient does not need to go through the blinding white light experience during this process. Fig. 2.7 shows red free photography.

Figure 2.7 Red Free Photography

After the digital technology has enabled the easy data processing, storage as well as telemedicine (Yannuzzi et al. 2004; Sim et al. 2015); the different retinal image acquisition techniques have been introduced, including optical coherence tomography (OCT), indocyanine angiography, and scanning laser ophthalmoscopy (SLO).

2.4. Retinal Blood Vascular Disease

Retinal blood vascular disorders refer to the range of eye disorders that affect the blood carrying vessels in the eye. These disorders mainly restrict the blood circulation in the eye. Each of these disorders affects vision and functions of the eye in different ways. Some cause a change in the structure of blood vasculature, some change the blood flow in the vessels, whereas some affect the consistency of the Swinburne University of Technology Sarawak Campus | Background 21

THESIS – DOCTOR OF PHILOSOPHY (PHD) blood itself (Joussen et al. 2007a; Semeraro et al. 2015). The major blood vascular diseases are:

 Diabetic Retinopathy (DR).  Retinal Vein Occlusion (RVO).  Central Retinal Artery Occlusion (CRAO).  Hypertensive Retinopathy.

2.1.1 Risk Factors and Symptoms

Retinal blood vascular diseases are mainly triggered by various medical conditions, such as diabetes, high blood pressure, cardiovascular disorders, history of stroke, atherosclerosis or the thickening of blood vessels, bleeding, clotting, and autoimmune disorders. Some additional factors that can increase a person’s likelihood of developing retinal vascular diseases are increasing age, high cholesterol, obesity, use of oral contraceptives, and . (Joussen et al. 2007b; Shen et al. 2016).

Depending on the types of disorder, retinal vascular diseases can be signalled by a variety of symptoms. Some of the most common symptoms are:

 Headache.  Pain in the eye.  Sudden painless loss of vision.  Floaters and blurred vision – common during the early stages of diabetic retinopathy.

This dissertation is mainly concerned with Retinal Vein Occlusion (RVO) and Diabetic Retinopathy (DR) as these two vascular disorders are leading causes of blindness worldwide. In the next sections, the elaborated details about RVO and DR have been provided.

2.5. Diabetic Retinopathy

Diabetic retinopathy (DR) is a retinal blood vascular disease instigated by diabetes mellitus. Diabetes is a sickness that develops when the pancreas is unable to

Swinburne University of Technology Sarawak Campus | Background 22

THESIS – DOCTOR OF PHILOSOPHY (PHD) discharge enough insulin or the human body is not able to process it properly. This disease primarily affects the blood circulatory system of the body and then, gradually affects that of the retina. As the disease progresses, the visual capability of a patient starts to degrade and ultimately leads to retinopathy. Because of increased level of the glucose in the blood, the capillaries in the retina get damaged and start leaking blood and fluid (Hu 2003; Nayak et al. 2008; Jenkins et al. 2015; Scanlon 2019). As a result of these fluid leakages, various lesions appear such as microaneurysms, hemorrhages, Cotton Wool Spots, hard exudates, and Venous Loops. These common visual features are used to detect DR (Nayak et al. 2008; Jenkins et al. 2015). DR is the foremost cause of visual impairment among adults aged 20-74 year (Mookiah et al. 2013; Scanlon 2019).

2.5.1. Risk Factors and Symptoms

All the patients suffering from diabetes are at risk of developing retinopathy. The following factors increase the chances of developing retinopathy (Hartnett, Baehr & Le 2017):

 Duration of diabetes — the longer the duration of diabetes, the greater is the risk of developing diabetic retinopathy.  Poor control of blood sugar level.  High cholesterol.  High blood pressure.  Pregnancy.  Use of Tobacco.  Being black, Hispanic or Native American.

2.5.2. Clinical Signs

Diabetic retinopathy typically affects both eyes at the same time. The clinical signs of DR are described as follows (Hartnett, Baehr & Le 2017; Solomon et al. 2017):

Swinburne University of Technology Sarawak Campus | Background 23

THESIS – DOCTOR OF PHILOSOPHY (PHD)

1. Microaneurysms (MA): Hyperglycaemia causes damage to the retinal capillaries and forms tiny lumps in blood vessels. These tiny round shaped lesions are called microaneurysms. Microaneurysms are the earliest visible sign of diabetic retinopathy (Walter et al. 2007; Pappuru et al. 2019; Jenkins et al. 2015).

2. Haemorrhages (HA): The rupture of the microaneurysms in the deeper layers of the retina causes the formation of haemorrhages. It appears as a red spot with asymmetrical border and ⁄or uneven density. The haemorrhages in case of DR can be dot and blot haemorrhages. The appearance of the dot and blot haemorrhages is similar to microaneurysms (Jitpakdee, Aimmanee & Uyyanonvara 2012; Jenkins et al. 2015).

3. Hard exudates: When the disease progresses, the weakened vessels start leaking waxy yellow deposits, called hard exudates. These are the lipoproteins and other proteins with sharp margins. They are sited in the outer layer of the retina and often vary from small to large patches, and later evolve into rings or circinates (Osareh, Shadgar & Markham 2009; Jenkins et al. 2015; Sadek 2016).

4. Soft exudates or Cotton Wool Spots (CWS): With the disease progression, the occlusion occurs in the arteriole. Due to the occlusion, the axoplasmic flow starts accumulating within the retinal nerve fibre layer. This disruption of axoplasmic flow and accumulation of axoplasmic materials forms the fluffy greyish-white lesions, called as Cotton wool spots or soft exudates. This lesion has indistinct broader unlike hard exudates (Niemeijer et al. 2007; Jenkins et al. 2015; Sreng et al. 2019).

5. Neovascularization (NV): In the proliferative stage of DR, the new tiny, abnormal blood vessels develop on the inner surface of the retina. These leaky new blood vessels are weak and lead to lipid deposition, inflammation, and scarring; and thus threatening to the visual acuity (Fong et al. 2004; Wan et al. 2015).

6. Macular Edema (ME): Macular edema is the condition when the leakage of fluid from the fragile blood vessels leads to the swelling around the macula region. It is Swinburne University of Technology Sarawak Campus | Background 24

THESIS – DOCTOR OF PHILOSOPHY (PHD)

responsible for blocking the central vision and is the main reason for vision loss in patients with diabetes (Fong et al. 2004; Bodnar, Desai & Akduman 2016).

2.5.3. Types

Diabetic retinopathy can be mainly classified into two types:

1. Non-proliferative diabetic retinopathy (NPDR): It is the early stage of DR, also known as background retinopathy. In NPDR, hyperglycaemia damages the retinal blood vessels. This weakens the capillaries and forms tiny bulges in the capillaries, called microaneurysms. Over the time, the microaneurysms rupture and the fluid and blood leakage cause the formation of haemorrhages and exudates. This leakage also causes swelling of the macula. According to the severity, the NPDR stage can be further sub-classified into three classes, viz., mild, moderate, and severe NPDR. The visual features of the mild stage are indistinct to non-existent.

Mild NPDR: Studies have found that subtle changes occur in the retinal vasculature pattern as the earliest sign of retinopathy (Talu, Calugaru & Lapascu 2015; Scanlon 2019). The most common way of identifying mild-NPDR is the presence of at least one microaneurysm.

Moderate NPDR: In the Moderate NPDR stage, more microaneurysms and haemorrhages can be found in one to three quadrants. This stage also includes the presence of cotton wool spots, hard exudates, venous bleeding, and mild macular edema (Hickey et al. 2019).

Severe NPDR: In the Severe stage, there are severe microaneurysms, and hemorrhages in all four quadrants, venous bleeding in at least two quadrants, and intra-retinal micro-vascular abnormalities in at least one quadrant (Madhura Jagannath Paranjpe & M N Kakatkar 2014; Scanlon 2019).

Fig. 2.8 (a), (b) and (c) shows the mild, moderate and severe NPDR respectively.

Swinburne University of Technology Sarawak Campus | Background 25

THESIS – DOCTOR OF PHILOSOPHY (PHD)

(a)

(b)

(c) Figure 2.8 (a) Mild-NPDR, (b) Moderate-NPDR, and (c) Severe NPDR

Swinburne University of Technology Sarawak Campus | Background 26

THESIS – DOCTOR OF PHILOSOPHY (PHD)

Proliferative diabetic retinopathy (PDR): It is the malignant stage of the disease. At this stage, due to the obstruction in the blood flow the hypoxia condition arises. It stimulates the vascular endothelial growth factor (VEGF) in the retina and triggers the formation of new, fragile blood vessels. These newly formed vessels grow along the internal surface of the retina and into the vitreous gel that fills the back of the eye. This condition is known as neovascularization. Neovascularization is the hallmark of PDR. The new blood vessels leak more-blood and fluid into the vitreous causing the retinal detachment, which can lead to the permanent vision loss. Fig. 2.9 shows PDR image (Nentwich 2015; Scanlon 2019).

Figure 2.9 Proliferative DR

1.5.4. Diagnosis and Treatment

The following imaging techniques are used in the diagnosis of diabetic retinopathy (Duh, Sun & Stitt 2017; Stitt et al. 2016):

 Color Funduscopy or Ophthalmoscopy: It is the most common type of diagnostic tool for examining the structures within the eye. This instrument allows the ophthalmologists to check the retina, optic disc, blood vessels, and any abnormal lesions caused due to diabetic retinopathy.

Swinburne University of Technology Sarawak Campus | Background 27

THESIS – DOCTOR OF PHILOSOPHY (PHD)

 Fluorescein angiography (FA): As discussed in the Section 2.3., the angiography image of the retina is captured by injecting a fluorescent dye. It helps to investigate the progression from NPDR to PDR. It clearly pinpoints the microaneurysms, blockage in the vessels, leakage of blood and fluids.  Optical coherence tomography (OCT) scanning: OCT is an instrument that uses an array of light to scan the image of the retina. It captures the internal tissue layers of the retina. It measures the thickness of the retina and compares with that of the healthy retina to find the presence of swelling within the retina, as well as macular edema.  B-scan ultrasonography: This imaging technique uses a high-frequency sound wave to capture the eye image. The sound wave is sent to the target tissue through a transducer and constructs a 2-D image of the eye. It is mainly useful for diagnosing PDR.

2.6. Retinal Vein Occlusion (RVO)

Retinal Vein Occlusion (RVO) is the second leading reason of blindness after DR. A retinal vein occlusion is also referred to as an “eye stroke”. It is an obstruction in the small vessels that carry blood away from the retina. When the vessels are congested, pressure builds up in the capillaries and cannot drain the blood. This causes internal bleeding forming haemorrhages and leakage of fluid. If the leakage occurs near the macula, it develops the condition of the macular edema. In the ischemic stage, neovascularization occurs, which can lead to neovascular glaucoma, vitreous haemorrhage, and retinal detachment in later stage or severe cases. Visual morbidity and blindness in case of RVO are the consequences of macular edema, retinal haemorrhage, macular , and neovascular glaucoma (Hykin 2015; Sivaprasad et al. 2015; Lang & Lang 2018). RVO usually occurs in middle-aged to elderly people irrespective of gender and race. However, several aspects of the pathogenesis and management of RVO still remain indeterminate. In 2007, the Canadian Journal of Ophthalmology noted that "Research into CRVO is fraught with challenges, from accurate disease classification to its treatment; even the most prestigious trials have become controversial" (Madhusudhana & Newsom 2007). The recent studies have shown that the blockage in the retinal vein can be an indication to

Swinburne University of Technology Sarawak Campus | Background 28

THESIS – DOCTOR OF PHILOSOPHY (PHD) the possible blockage in cardiac veins and nerves in the human brain as well (Flammer et al. 2013; Kwan Hyuk Cho et al. 2017; Lang & Lang 2018; Park et al. 2015).

2.6.1. Risk Factors and Symptoms

Retinal vein occlusions usually caused by hardening of the and formation of a blood clot, much like a stroke. The people who are naturally having narrow blood vessels, or those with chronic conditions that damage the blood vessels, tend to have a higher risk of suffering from vein occlusion. Risk factors for RVO include (Hykin 2015; Lang & Lang 2018):

 Atherosclerosis, a condition where the arteries grow thick  High blood pressure  High cholesterol  Diabetes  Macular edema  Glaucoma, a condition where the intraocular pressure damages the optic nerve.  Health issues that affect blood circulation  People over the age of 60  Smoking  Obesity

The symptoms of RVO can range from subtle to prominent. It starts with the painless blurring or loss of vision just in one eye. Initially, the blurring or diminishing visual acuity might be insignificant, but it deteriorates over the next few hours or days. In some cases, the patient might lose the vision permanently almost instantaneously. Therefore, it is very important to visit ophthalmologists immediately if there is any symptom of retinal vein occlusion as it can also lead to other health issues (Sivaprasad et al. 2015; Macdonald 2014; Park et al. 2015).

Swinburne University of Technology Sarawak Campus | Background 29

THESIS – DOCTOR OF PHILOSOPHY (PHD)

2.6.2. Clinical Signs

The clinical signs of RVO include dilated and tortuous vessels, blot and flame- shaped retinal haemorrhages, cotton wool spots, optic disc swelling, and macular edema (Hykin 2015; Lang & Lang 2018).

1. Dilated and Tortious Veins: When thrombosis occurs in the retinal veins, the vessels become more tortuous than usual and some vessels get dilated. This condition is considered as an early clinical sign of RVO. It has been reported in many studies that there is a significant association between larger venous calibre and increased risk of (Wong et al. 2005). The study conducted by Atherosclerosis Risk In Communities (ARIC) has shown that dilated venules, narrowed arterioles, or both are linked to the risk of incident stroke events and coronary heart disease events (Wong et al. 2005; Woo, Lip & Lip 2016; Lang & Lang 2018). Retinal vein dilation can also be considered as the effect of hypoxia, inflammation and endothelial dysfunction (Wong et al. 2005; Bucciarelli et al. 2017; Woo, Lip & Lip 2016). The dilated, tortuous veins are alarming as the RVO can be a possible indication of other health issues such as hypertension, diabetes, arteriosclerosis, cardiovascular disease, hyper-viscosity states, collagen vascular disease or Sickle cell disease.

2. Haemorrhages: In case of RVO, the haemorrhages are mostly dot and flame- shaped haemorrhages. When the veins are blocked, due to pressure in the vessels the blood starts leaking causing the haemorrhages. Depending on the severity of the vessel occlusion, there may be a few scattered haemorrhages or extensive haemorrhages spreading in a large portion of the fundus (Hykin 2015; Lang & Lang 2018).

3. Cotton Wools Spots (CWS): The fluid and lipid leaked from the veins cause the Cotton Wools Spots in RVO. It is the focal interruption of axoplasmic flow caused due to occlusion. Cotton wool spots, in general, are visually asymptomatic; however, patients can lose the vision if the fovea is involved. These are yellowish white or greyish white, slightly elevated lesions having cloud-like, linear or winding structure (Hykin 2015; Pielen, Junker & Feltgen 2016).

Swinburne University of Technology Sarawak Campus | Background 30

THESIS – DOCTOR OF PHILOSOPHY (PHD)

4. Optic Disc Swelling: The optic disc is the place where the optic nerve fibres coming from the brain connect inside the eye. Optic disc swelling can be caused by a number of conditions including papilledema. In RVO, when there is an occlusion in the main vein, the optic disc swells due to the increased pressure in the blood flow (Hykin 2015; Lang & Lang 2018).

5. Macular Edema: The increased pressure in the blood vessels due to the thrombosis causes the retinal capillaries more permeable, leading to leakage of fluid and blood into the retina. Co-existent retinal ischaemia aggravates the leakage by producing the growth factor. Leakage into the extracellular space stimulates the development of macular edema. Similar to DR, macular edema is the most common cause of visual impairment in case of RVO, followed by foveal ischaemia (Hykin 2015; Pielen, Junker & Feltgen 2016; Lang & Lang 2018).

2.6.3. Types of RVO

The RVO is classified into three categories according to the location of the occlusion, viz. Central Retinal Vein Occlusion (CRVO), Branch Retinal Vein Occlusion (BRVO), and Hemiretinal Vein Occlusion (HRVO). The CRVO occurs, when the occlusion occurs in the central retinal vein. The BRVO occurs due to the blockage at any more distal branch of the retinal vein. The HRVO is the obstruction at the primary superior branch or primary inferior branch including almost half of the retina (Hykin 2015; Pielen, Junker & Feltgen 2016; Lang & Lang 2018).

1. Central Retinal Vein Occlusion (CRVO)

Central retinal vein occlusion (CRVO) is the thrombosis in the central retinal vein located at the lamina cribrosa (Hykin 2015). CRVO affects the entire retina and causes more severe visual loss than any other type of RVO. The diagnostic criteria for CRVO are characterized by venous dilation and increased tortuosity of the retinal veins, dot or punctate and flame-shaped retinal haemorrhages, or both in all four quadrants of the retina, and optic disc swelling (Noma 2013; Lang & Lang 2018). CRVO is graded into ischemic and non-ischemic phase. Macular edema is the key reason of vision loss.

Swinburne University of Technology Sarawak Campus | Background 31

THESIS – DOCTOR OF PHILOSOPHY (PHD) a) Non-ischemic CRVO: It has been reported that about 70% of the patients suffer from non-ischemic CRVO (McAllister 2012; Pielen, Junker & Feltgen 2016). Over 3 years, 34% of perfused eyes progressed to ischemic CRVO. There is a low risk of neovascularization in case of non-ischemic CRVO. The retina remains relatively normal in non-ischemic CRVO. The clinical features found in case of non-ischemic CRVO are as follows:

 Vision acuity >20/200.  Dot and blot hemorrhages.  The lower risk of neovascularization.  There is no Afferent Pupillary Defect (APD).

Fig. 2.10 shows the image of the retina affected by non-ischemic CRVO.

Figure 2.10 Non-ischemic CRVO

b) Ischemic CRVO: According to the fluorescein angiographic evidence, Ischemic CRVO can be described as capillary non-perfusion of more than 10 disc areas on seven-field fundus fluorescein angiography (Sivaprasad et al. 2015). It is associated with an increased risk of neovascularization and has a worse prognosis (Sivaprasad et al. 2015; Hayreh, Podhajsky & Zimmerman 2011). There is almost

Swinburne University of Technology Sarawak Campus | Background 32

THESIS – DOCTOR OF PHILOSOPHY (PHD)

30% chance that non-ischemic CRVO transforms to ischemic CRVO (Sivaprasad et al. 2015). In case of ischemic CRVO, the final visual acuity is 6/60 or worse for more than 90% of the patients (Hayreh et al. 2001; Lang & Lang 2018). The clinical features found in case of ischemic CRVO are as follows:

 Visual acuity is less than 20/200  Extensive superficial haemorrhages.  Venous dilation and tortuous vessels.  Numerous Cotton Wool Spots.  Meagre capillary perfusion.  Opaque, edematous, orange colored retina.  At high risk of forming neovascularization.  Poor prognosis  High Relative Afferent Pupillary Defect (+RAPD).

Fig. 2.11 shows the image of the retina affected by ischemic CRVO.

Figure 2.11 Ischemic CRVO

Swinburne University of Technology Sarawak Campus | Background 33

THESIS – DOCTOR OF PHILOSOPHY (PHD)

2. Branch Retinal Vein Occlusion (BRVO)

Branch retinal vein occlusion (BRVO) is triggered by thrombosis sited at an arteriovenous crossing point where a tiny artery and a tiny vein crosses over with a shared vascular sheath (Rehak & Rehak 2008; Hayreh & Zimmerman 2014; Pielen, Junker & Feltgen 2016). Due to atherosclerosis, the vein gets compressed and capillaries start leaking. Macular edema is the primary cause of vision loss in BRVO as well. The other causes are macular capillary non-perfusion, vitreous haemorrhage, tractional retinal detachment, and neovascularization glaucoma. BRVO can be associated with other medical conditions such as hypertension, diabetes, hyperlipidaemia, atherosclerosis, blood hyper-viscosity, carotid artery disease etc. (S. L. Rogers et al. 2010; Woo, Lip & Lip 2016). The incidence rate of BRVO is three times more than CRVO. BRVO can be classified according to the three locations: i) Hemispheric: Occlusion occurs before the first bifurcation. ii) Intermediate: Occlusion occurs after the first bifurcation, which is the typical BRVO formation location. iii) Twig: Occlusion involves the macular region only.

The diagnostic criteria for BRVO initial stage include either dot or flame- shaped haemorrhages at the site of arteriovenous crossing. Later, because of these retinal haemorrhages new vessels or collateral vessels are developed (S. L. Rogers et al. 2010). Compared to CRVO, the prognosis of BRVO is better as approximately 50 – 60% of untreated BRVO cases, the patients retain a visual acuity ≥ 6/12 after one year (Hykin 2015; Lang & Lang 2018). Just like CRVO, BRVO can be ischemic and non-ischemic. a) Non-ischemic BRVO: There are 70%-80% cases of non-ischemic BRVO. The clinical signs are as follows:

 Visual acuity > 20/200 normally.  Blot and flame-shaped haemorrhages.  Rare Cotton wool spots.  Collateral vessel formation. Swinburne University of Technology Sarawak Campus | Background 34

THESIS – DOCTOR OF PHILOSOPHY (PHD)

 Mild macular edema.

Fig. 2.12 shows the retina image affected by non-ischemic BRVO.

Figure 2.12 Non-ischemic BRVO

b) Ischemic BRVO: There exist 20%-30% cases of patients with ischemic BRVO. Ischemic BRVO can be described as 5 disc diameters of retinal vessel non-perfusion on fluorescein angiography (Rehak & Rehak 2008; Sivaprasad et al. 2015). The clinical signs are as follows:  Visual Acuity < 20/200.  Both blot and flame shaped haemorrhages.  Macular edema.  Cotton wool spots.  Neovascularization at the optic disc and anywhere else.  Vitreous haemorrhage as it leads to the retinal detachment and it can cause permanent severe vision loss.  Macular capillary non-perfusion

Fig. 2.13 shows the image of the retina affected by ischemic BRVO.

Swinburne University of Technology Sarawak Campus | Background 35

THESIS – DOCTOR OF PHILOSOPHY (PHD)

Figure 2.13 Ischemic BRVO

3. Hemiretinal Vein Occlusion (HRVO)

Hemi-retinal vein occlusion (HRVO) occurs at the tiny vessels. It causes damage to either the inferior or superior retinal hemisphere, and the retinal haemorrhages are almost equal to the nasal and temporal aspects that involved hemisphere of the retina (Hykin 2015). HRVO carries the features of both BRVO and CRVO. Pathophysiologically, it has most close resemblance to CRVO. On the other hand, it most closely resembles BRVO in terms of natural history and neovascular complications. HRVO involves superior or inferior drainage of retina only. It can be ischemic or non-ischemic. About 23% of patients suffer from non-ischemic HRVO, about 67% of patients suffer from ischemic, or about 10% patients suffer from indeterminate phase (Hykin 2015). Macular edema and infarct and tractional retinal detachment are the typical causes of severe and permanent vision loss in patients suffering from HRVO. The diagnostic criteria for HRVO are same as BRVO, but it involves the superior or inferior half of the retina (Sivaprasad et al. 2015). a) Non-ischemic HRVO: In Non-ischemic HRVO, the probability of developing neovascularization is very low (Hykin 2015). Fig. 2.14 shows the retina image with non-ischemic HRVO.

Swinburne University of Technology Sarawak Campus | Background 36

THESIS – DOCTOR OF PHILOSOPHY (PHD) b) Ischemic HRVO: Ischemic HRVO has a high propensity to develop neovascularization at the optic disc or other parts of the retina, and a low to moderate risk to develop neovascularization at the iris and angle (Hykin 2015). Fig. 2.15 shows the ischemic HRVO.

Figure 2.14 Non ischemic HRVO

Figure 2.15 Ischemic HRVO

Swinburne University of Technology Sarawak Campus | Background 37

THESIS – DOCTOR OF PHILOSOPHY (PHD)

2.6.4. Diagnosis and Treatment

RVO is diagnosed through a wide-range of eye examinations. Starting with the vision and pressure checks, the surfaces and blood vasculature of the eye is examined (Macdonald 2014; Tah et al. 2015). The imaging techniques used for diagnosing retinal vein occlusion include:

 Optical coherence tomography (OCT): This type of instrument capture the retina image by a scanning ophthalmoscope with a resolution of 5 microns. This type of image can help the doctors to determine the occurrence of optic nerve swelling and macular edema by calculating the thickness of the retina. Therefore, the specialist can refer to the OCT images as document and check the progression of the disease throughout the course of treatment.

 Ophthalmoscopy: Using an ophthalmoscope, the doctors can capture the color fundus image of the retina and examine the textural changes caused by RVO.

 Fluorescein angiography: The angiographic images help the clinicians to have a clear picture of the RVO progression. The injected dye illuminates through the entire blood vessels and helps to check for vessel blockage or any other changes in the vasculature.

Unfortunately, there is no proven solution to unblock the occlusion at retinal veins. Generally, treatment of RVO focuses on issues arising from the occlusion. Sometimes, the patient might get back the vision after treatment. Around 1/3 cases show some improvement, about 1/3 cases shows no improvement, and about 1/3 cases show the gradual improvement. However, it might take more than one year to understand the final outcome of the treatment (Hykin 2015; Lang & Lang 2018). Some of the available treatments for retinal vein occlusion are as follows:

 Intravitreal injection of anti-vascular endothelial growth factor (VEGF) drugs: These special drugs restrain VEGF, which is the primary growth factor that seeds macular edema.

Swinburne University of Technology Sarawak Campus | Background 38

THESIS – DOCTOR OF PHILOSOPHY (PHD)

 Intravitreal injection of corticosteroid drugs: These specific drugs fight the erythrogenic components, which lead to edema.

 Focal laser therapy: This treatment renders laser treatment to the areas of swelling and reduces edema.

 Pan-retinal photocoagulation therapy: This treatment is given to the patients when neovascularization occurs due the retinal vein occlusion.

2.7. Computer Aided Diagnosis (CAD) System

Computer Aided Diagnosis (CAD) method is mainly designed as the assistance mechanism for medical practitioner in the assessment of medical images (Doi 2005; Shiraishi et al. 2011; Yanase & Triantaphyllou 2019). CAD is swiftly expanding in the field of radiology and consistently improving its image interpretation capacity with higher accuracy. The main goal of a CAD is to detect the earliest symptoms of a particular disease from a medical image that the physicians can barely detect with the naked eyes. In a CAD system, a set of algorithms identify the suspicious region in an image and evaluate them to predict the possible disease. The CAD systems have been used for assisting the radiologist to assess various imaging modalities such as Mammograms, Computed Tomography (CT), Magnetic Resonance Imaging (MRI), ultrasound imaging, retinal fundus image, etc.

An automated computer-aided diagnostic method involves three main phases with different technologies (Shiraishi et al. 2011; Yanase & Triantaphyllou 2019) as described below:

1. Image Processing and Segmentation: In this module, the quality of the medical image is enhanced by removing noise and improving visuals. There are several image enhancement techniques available. Then, the segmentation method is used to detect suspicious candidate lesions or patterns. Some of the common techniques used for segmentation are morphological filters, Fourier analysis, different image analysis techniques, wave analysis, and artificial neural network (ANN).

Swinburne University of Technology Sarawak Campus | Background 39

THESIS – DOCTOR OF PHILOSOPHY (PHD)

2. Feature Extraction: After the image enhancement and segmentation process, the different features are quantified in terms of shape, size, and contrast from the selected candidates. Multiple features can be extracted using mathematical formulations. Initially, the observation made by CAD is based on the physicians’ own observation. The knowledge of the physician is fed to the CAD so that it can differentiate between abnormal features/lesions and the normal features/structures.

3. Classification: In the last step, based on the extracted features, the data is analysed to distinguish the normal and abnormal patterns. Generally, a rule-based method is applied to understand the abnormal and normal features/lesions. Apart from the rule-based approach, different other classification methods such as discriminant analysis, decision tree, and neural network are mostly used.

The block diagram of a traditional CAD system is shown in Fig. 2.16.

Figure 2.16 Block diagram of Traditional Computer Aided Detection (CAD) System

Swinburne University of Technology Sarawak Campus | Background 40

THESIS – DOCTOR OF PHILOSOPHY (PHD)

The CAD method processes a digital image, analyses the typical appearance, and highlights any suspicious lesion or region present in the image to evaluate the possible disease. The CAD focuses on detecting relatively subtle symptoms and the earliest signs of abnormality, which are difficult to observe by the naked eye or the radiologists might miss them. It uses a series of pattern recognition software to automatically detect abnormal features in the medical image. Thus, CAD acts as a support system or a “second opinion” to the radiologist and reduces the false negative reading. CAD uses multifaceted pattern recognition algorithms supported by Machine Learning (ML) techniques (Shiraishi et al. 2011; Komura & Ishikawa 2018) using supervised or unsupervised way. The supervised learning models use a set of known annotations/ attributes/ features to classify objects. On the other hand, the unsupervised learning models analyse the internal structure in a pool of objects to form groups/ clusters among the objects based on their similarities, and then, utilize them in identifying the unknown objects. Popular supervised methods (Yanase & Triantaphyllou 2019) include Support Vector Machines (C.~Cortes et al. 1995) and linear classifiers (Yuan, Ho & Lin 2012), Bayesian Statistics (Heckerman 1998), k- Nearest Neighbors (Cover & Hart 1967), Hidden Markov Model (Rabiner & Juang 1986), Decision Trees (Kohavi & Quinlan 2002), Artificial Neural Network (ANN) (Hopfield 1988) and its variants. Also, popular unsupervised methods include Auto- encoders (Hinton 1989), Expectation Maximization (Dempster, Laird & Rubin 1977), Self-Organizing Maps (Kohonen 1982), k-Means (Ball & Hall 1965), and Fuzzy (Dunn 1973) and Density-based clustering (Hartigan 1975).

2.8. Chapter Summary

In this chapter, the background of research has been provided, including vast details about human eye anatomy, retina, and different retina imaging techniques. The detailed introduction of Diabetic Retinopathy and Retinal Vein Occlusion has been provided along with an illustration about the types, disease’s diagnosis methods, and available treatments. From this chapter, it can be well understood that early detection of DR and RVO is utmost important as these two eye diseases are associated with other health issues other than permanent vision loss. The detailed description of the Computer Aided Diagnosis System has also been given in this chapter and popular

Swinburne University of Technology Sarawak Campus | Background 41

THESIS – DOCTOR OF PHILOSOPHY (PHD) machine learning models are mentioned. As CAD is very useful for diagnosing diseases in various platforms, the main goal of this dissertation is to design a CAD to diagnose retinal blood vascular diseases and particularly detect DR and RVO in their early stages and classify their variants.

Swinburne University of Technology Sarawak Campus | Background 42

THESIS – DOCTOR OF PHILOSOPHY (PHD)

Chapter-3

3 Literature Review

The literature on the detection of retinal blood vascular disease is mostly dominated by the research works for Diabetic Retinopathy (DR) detection. A few research groups have focused on the detection of other retinal blood vascular diseases. Various algorithms for Hypertensive Retinopathy detection, using blood vessels as features, are proposed by the researchers in (Ortíz et al. 2010; Agurto et al. 2014; Irshad et al. 2015; Khitran et al. 2015; Syahputra et al. 2017). The Central Retinal Artery Occlusion detection methodology is presented by (Foroozan, Savino & Sergott 2002; Riccardi, Siniscalchi & Lerza 2016). In this dissertation, the research is mostly concerned about two leading causes of blindness, viz., Diabetic Retinopathy and Retinal Vein Occlusion, therefore, the literature review provided here is limited to these two particular diseases only. The state-of-the-art for DR detection and RVO detection has been explored in two subsections. In each subsection, the existing methods have been discussed and summarized in tabular form.

3.1. State-of-the-Art of DR Detection

In the past ten years, several research works have been carried out in order to develop automated DR diagnosis methods using different clinical features such as microaneurysms, haemorrhages, exudates, blood vessels, node points, and textures, etc. Those methodologies are mainly dependent on the segmentation of bright lesions and red lesions, comprising several steps. By following the general rules of any CAD method, first, pre-processing is carried out to boost the quality of the image by normalizing the original retinal image (Spencer et al. 1996; Takahashi & Kajikawa 2017). Second, segmentation methods are performed to identify the anatomical

Swinburne University of Technology Sarawak Campus | Literature Review 43

THESIS – DOCTOR OF PHILOSOPHY (PHD) components of the retina, such as the optic disk and blood vessels (Jelinek et al. 2007; Salamat, Missen & Rashid 2019). Finally, after removing the normal features of the retina, the residual features are extracted as the possible pathologies of DR for the particular classification task. In this section, an extensive survey has been provided on available algorithms for automatic retinal image analysis to detect DR features. Different methodologies have been discussed individually as the algorithms for bright lesion segmentation, algorithms for red lesion segmentation, DR screening system, and deep learning methods; and are illustrated in the following subsections.

3.1.1. Segmentation of Bright Lesions:

The bright lesions found in DR are mainly Hard-Exudates and Cotton Wool Spots. These bright lesions are lipoprotein deposited in the retina due to vascular leakage (Ali et al. 2013; Scanlon 2019). As discussed in the Chapter-2, Section 2.5.2; these bright lesions appear as yellow-whitish lesions. The hard exudates have well- defined edges, whereas, the cotton wool spots have indistinct edges, thus also called soft-exudates. The size, shape, brightness, and location of these lesions may vary among different patients (Fleming et al. 2007; Scanlon 2019). The researchers have provided various methods for detection of exudates using morphological operation, region growing, wavelet-based, and neural network. These methods can be categorized into four groups (Giancardo et al. 2012; Amin, Sharif & Yasmin 2016)

1. Region Growing Methods

The region growing method of segmentation is the dividing an image into homogenous regions of connected pixels based on the similarity criteria of the candidate pixels. The similarity criteria mainly include the shared characteristics of the candidate pixels, such as intensity, color, and texture. These techniques involve initialization of a seed point from a particular pixel point. Then, the images are segmented based on the similar intensity values of the neighbouring pixels. The process iterates until all the homogenous pixels are included in a region starting from the seed point. A few researchers used this technique to segment exudates (Sinthanayothin et al. 2003; Lariche, Kalkhajeh & Lodariche 2017). Sinthanayothin et al. have proposed a thresholding based method to identify the homogeneous pixels in

Swinburne University of Technology Sarawak Campus | Literature Review 44

THESIS – DOCTOR OF PHILOSOPHY (PHD) the exudates. Then, using recursive region growing technique the exudates were detected in as pixels with similar grey level (defined by the manually selected threshold) are considered as pixels belonging to the same exudate region (Sinthanayothin et al. 2003)

2. Thresholding Methods

These types of methods segment the suspicious region using the difference between the foreground and background intensity level. The adaptive gray level analysis is the most common method used by various researchers to detect exudates. Sánchez et al. proposed a color feature-based approach, where the intensity of various pixels is projected to a new color space by modifying the RGB model (Sánchez et al. 2006, 2008). Jaafar et al. have proposed an adaptive thresholding method, where the image has been divided into homogeneous regions and then, adaptive thresholding has been applied to each individual region (Jaafar, Nandi & Al-Nuaimy 2011b). Win et al. have used thresholding method to detect exudate by eliminating the optic disc considering the fact that the appearance of optic disc resembles the exudates (bright yellowish region). By cropping the optical disc manually, a template was created with the calculated histogram of the optic disc. With the help of histogram matching the optic disc region was removed from the retina image. Then, the retina image was divided into left and right part. In order to detect exudates, Otsu thresholding method has been used by feeding the histogram difference of the right and left half of the optic disc removed retina (Win & Choomchuay 2017). Long et al. used a combination of dynamic thresholding and global thresholding method based on Fuzzy C-means clustering to detect the candidate exudates. The retina image was divided into sub- images and then Fuzzy C-means clustering was used to get the local dynamic threshold in each sub-image. Each pixel was assigned to different category in a sub- image. In this way, entire retina image was classified to generate the global threshold matrix. This combination is used to obtain the final threshold to segment candidate exudates region in the color retinal images. To segment the exudates from the exudate region, six texture features are extracted, viz., mean green channel intensity, gray intensity, energy, mean hue, mean gradient magnitude, and standard deviation. These features are fed to a support vector machine to classify the exudates and no exudate region (Long et al. 2019).

Swinburne University of Technology Sarawak Campus | Literature Review 45

THESIS – DOCTOR OF PHILOSOPHY (PHD)

3. Mathematical Morphology Operations

Morphological operators are useful to detect structures with definite shapes. Here, each pixel value is adjusted relative to the values of the neighbouring pixels in order to construct a morphological operation sensitive to specific shapes in an image, known as a structuring element. Based on the characteristics of the target shape the structuring elements are encoded and process the image with certain mathematical operations (e.g. erosion and dilation). Therefore, it is a popular method for the segmentation of different shapes. A two-scale segmentation method is proposed for exudates detection in (Sopharak et al. 2008; X. Zhang et al. 2014). A top-hat operator is applied to the green channel of the retinal image to identify small exudates. The large exudates are detected by filtering and thresholding method after performing a morphological restoration in (X. Zhang et al. 2014). Ghaffar et al. proposed a morphological tree for segmentation of exudates. Initially, blobbing technique is used to identify all the connected pixels as a single blob. The blobs with very large area are discarded by passing all the blobs through an area filter. The remaining blobs are further categorised into small, medium, and large. The medium blobs are pre- processed in order to remove the strong boundaries and get the candidate suspected regions/location. All the candidate blobs are passed through the morphological compact tree, which contains a series of filters with different criteria to remove the non-exudate regions. The filters those separate exudate and non-exudate regions are: area of the blob, mean, minimum and maximum hue of the blob, minimum and maximum intensity of the gray color image and red and green color channels of the image, mean, maximum, and minimum saturation of the blob. The threshold of each of these filters is set manually for different images (Ghaffar et al. 2016).

4. Classification Methods:

These methods typically use machine learning approach to classify different patterns from the input features. Some of the popular classifiers are Support Vector Machine, Artificial Neural Network, Radial Basis Function, Decision Tree, k-Nearest Neighbouring, etc. With the help of various feature engineering processes as discussed above, the candidate features are fed to a classifier, then, the classifier learns the internal pattern to classify them into the target classes. In case of DR detection, various classifiers have been used to separate hard exudates from other

Swinburne University of Technology Sarawak Campus | Literature Review 46

THESIS – DOCTOR OF PHILOSOPHY (PHD) kinds of bright lesions such as Cotton Wool Spots and drusen. In (Osareh, Shadgar & Markham 2009) candidate exudates are selected from the LUV color space and fuzzy c-means clustering is used as an efficient coarse to fine segmentation. Garcia et al. applied a similar approach for coarse segmentation of bright image regions, with a combined global and adaptive histogram thresholding method. After feature extraction, the performance is assessed by three classifiers: Multilayer perceptron, radial basis function, and support vector machine (García et al. 2010)

Fleming et al. have used a multi-scale morphological process for candidate exudates detection. The candidate regions are classified as exudates, drusen or background by an SVM (Fleming et al. 2007). Niemeijer et al. have proposed a clustering method where pixels are grouped into exudates and non-exudates through a lesion probability map generated by an assigned probability value for each pixel. Based on the cluster characteristics, each pixel has been classified as either exudates or non-exudates region and then, a k-NN classifier and a linear discriminant classifier is used to classify bright lesions into hard exudates, soft exudates, and drusen. (Niemeijer et al. 2007).

Ruba et al. have used Gabor features and GLCM features to extract the texture information of the exudates region. Gabor filter, which is used mainly as edge detection filter, has been used to extract 24 features including variance, mean, and standard deviation from the different orientation of the image. Then Grey Level Occurrence Matrix (GLCM) has been calculated to further extract 12 statistical features, viz., Entropy, Homogeneity, Dissimilarity, Energy, Cluster prominence, Cluster shade, Sum of squares, Auto correlation, Maximum probability, Inverse different Moment, Contrast, and Correlation, Then, These features are fed to the SVM to detect exudate and normal region (Ruba & Ramalakshmi 2015).

Xiao et al. have used background estimation method and SVM for classifying exudates. At first, the bright lesions are extracted using background estimation methods. Then, Kirsch operation is used to gather the edge information and remove optical disc and extract the candidate hard exudates region. Then SVM is used to detect the exudate with help of shape feature, phase feature and histogram statistics (Xiao et al. 2015). In Table 3.1, information regarding the results of these methods and the databases used in each study has been summarized.

Swinburne University of Technology Sarawak Campus | Literature Review 47

THESIS – DOCTOR OF PHILOSOPHY (PHD)

Table 3.1 State-of-the-art for DR Detection using Exudates Segmentation

Author Features Methods Target Class Database Performance Sensitivity= 95% Fleming et al. In house Exudates SVM DR, Normal Specificity = (2007) database 84.6% Neimeijer et In house Sensitivity= 95% Exudates k-NN DR, Normal al. (2007) data Specificity= 86% Recursive Sensitivity= Sinthanayothin Region In house 88.5% Exudates DR, Normal et al. (2008) Growing data Specificity= Segmentation 99.7% Morphology, Sensitivity= 88% Sopharak et al. Hospital Exudates Naïve Bayes, DR, Normal Specificity= (2008) database SVM 99.5% Sensitivity= 100% Sanchez et al. Thresholding, In house Exudates DR, Normal Specificity= (2008) Fisher data 100% Sensitivity= 100% Gracia et al. MLP, RBF, In house Exudates DR, Normal Specificity= (2009) SVM data 92.59% Sensitivity= Jaafar et al. DIARETD 91.2% Exudates Thresholding DR, Normal (2010) B Specificity= 99.3% e-ophtha Zhang et al. Sensitivity= 96% Exudates Morphology DR, Normal EX (2014) Specificity= 89% database, DIARETD Xiao et al. Sensitivity=97.3% Exudates SVM DR, Normal B, HEMI- (2015) Specificity=90% MED

3.1.2. Segmentation of Red Lesions

The red lesions occur in the DR are mainly microaneurysms (MA) and haemorrhages. MAs are tiny lumps in the walls of retinal blood vessels (Fleming et al. 2006; Solomon et al. 2017). In color fundus images, the appearance of MAs is like round red dots, which have a diameter in a range of 10 to 100 µm. It is difficult to distinguish MAs from dot-haemorrhages, as dot-haemorrhages looks similar to MA but slightly larger in size (Tang et al. 2013; Scanlon 2019). MAs are typically the initial symptoms of DR and the quantity of this retinal lesion is directly related to the DR severity (Tang et al. 2013; Solomon et al. 2017).

Swinburne University of Technology Sarawak Campus | Literature Review 48

THESIS – DOCTOR OF PHILOSOPHY (PHD)

Numerous methodologies have been proposed by various researchers for segmentation of MAs using color fundus image. Table 3.2 shows the various state-of- the-art methods for red lesion detection. Similar to the bright lesion detection methods, the red lesion detection methods can be also distributed into four groups (Mookiah et al. 2013; Salamat, Missen & Rashid 2019).

1. Region Growing Methods

The general concept of region growing method for red lesions detection is similar to that of exudate bright lesion detection. Fleming et al., 2006 have performed region growing on a watershed gradient image to detect candidate red lesion regions. (Jelinek et al. 2006) validated an automatic MA detection method by following the method proposed by (Cree et al. 1997) and (Spencer et al. 1996). The circular, non- connected red lesions are discriminated from blood vessels using a top-hat transformation. Then, candidate lesions are segmented using a region growing algorithm.

Li and Chan have proposed a microaneurysm detection method using region growing. In a pre-processed image, all the pixels are considered as non-member of a region. The process initiates from a pixel with the maximum intensity. The algorithm checks the 8-neighboring pixels of the seed pixel and excludes the neighnboring pixel from the region growing if the pixel is already a member. If the neighboring pixel is not already a member then, the absolute difference is computed between the intensity of the neighboring pixel and seed pixel and compred with the pre-defied threshold value. When the difference value exceed the threshold the pixel is not considered as the part of the region in which the seed pixel belongs to. Thus, it remains non- member. When the difference value does not exceed the threshold, the neighboring pixel is considered as the part of the region where the seed point belongs to. Then, the process repeats for each included member pixels until no more pixels left to add. Each clustered region is marked as either microanurysm or non-microaneurysm region based on the ground truth representative points of microaneurysms. From the candidate regions 12 features are extracted, viz., mean and standard deviation of three color channels, mean and standard deviation of the candidate region, L1 norm, L2 norm, mean absolute deviation, and number of pixels in a region. The extracted 12

Swinburne University of Technology Sarawak Campus | Literature Review 49

THESIS – DOCTOR OF PHILOSOPHY (PHD) features are then fed to artificial neural network (ANN) to classify the microaneurysm and non-microaneurysm region (Li & Shan 2018).

2. Mathematical Morphology Operations

The basic concept of morphological operation has been described in previous section, For red lesion detection, various researchers have used morphological operations. As morphological operation, a polynomial contrast enhancement operation is used to identify MAs in (Walter et al. 2007) and to differentiate between MAs and blood vessels in (Jaafar, Nandi & Al-Nuaimy 2011a).

Sreng et al. used top-hat operation to segment the red lesions in DR image. Initially, a pre-processed image is complemented to reverse the intensity values. Then, the background pixels of the images are detected using morphological opening operation. A disc shaped element structure of radius 10 has been used to detect the background. The red lesions are detected by subtracting the background image pixels from the complemented image. The 5 features, viz., perimeter, area, mean of the maximum intensity, minimum intensity and mean intensity of the candidate region are fed into SVM to classify DR and no DR image (Sreng et al. 2018).

3. Wavelet-based Methods

A wavelet-based method is useful when a signal varies over time. It includes various transforms depending on the merit function used for a particular application. Wavelet transform can be discrete or continuous. The discrete wavelet transform decomposes a signal into a set of wavelets of functions orthogonal to the translation and scaling. The generated number of wavelet coefficients is same as that of the data points. Continuous wavelet transform generates a 1D array of vector larger than input data points.

A wavelet-based method has been proposed in (Quellec et al. 2008). Using wavelet transform images are discomposed in sub-bands and then based on template matching approach the microaneurysms are identified from the complementary information in each sub-band.

Prasad et al. have used wavelet based approach for detection of DR. From a pre-processed image, the histogram equalization and canny edge detection have been

Swinburne University of Technology Sarawak Campus | Literature Review 50

THESIS – DOCTOR OF PHILOSOPHY (PHD) used for segmentation of the blood vessels. Then morphological operations are performed to detect microaneurysms and exudates. For feature extraction, area of blood vessels, microaneurysms, and exudates are calculated. Additionally, mean, skewness, entropy, standard deviation, and GLCM are extracted as the features from 4 sub images. The features selection is done by Haar wavelet and Principal Component Analysis (PCA). Haar wavelet discomposes the image into sub-bands and converts 41 features into 65 features. Then, the dimension of the features set is reduced by PCA and a set of 22 features is constructed to feed to a back propagation neural network (BPNN) (Prasad, Vibha & Venugopal 2016).

4. Hybrid Methods

Niemeijer et al. developed a hybrid algorithm for segmentation of red lesions (Niemeijer et al. 2005). The system combined a mathematical morphology based algorithm for candidate selection and a pixel classification algorithm to identify red lesions. Zhang et al. evaluated multi-scale correlation filtering method for candidate detection. The correlation between pixel intensity distributions is calculated for the entire image and a Gaussian model is used for identifying MAs (Zhang et al. 2010). The approach explained by Lazar and Hajdu is based on computing cross-section profiles along multiple directions to generate a multi-directional height map (Lazar & Hajdu 2013).

García et al. developed a feature classification method, where a set of features is extracted from the image. The most suitable feature subset is selected, using a feature selection algorithm, for red lesion detection. To obtain the final segmentation result, four Neural Network-based classifiers: radial basis function, support vector machine, multilayer perceptron, and a combination of these three are used through a majority voting scheme (García et al. 2008, 2010). Sánchez et al. used a three-class Gaussian mixture-model considering that each pixel in the image belonged to either background or foreground pixels depicting lesions, vessels, or optic disc. The pixels those don’t belong to any of these two classes are considered as outliers (Sánchez et al. 2009). Mizutani et al. used a modified double ring filter to extract MAs alongside with blood vessels. In this method, the double ring filter detects the areas of the image having the mean pixel value lower (inner circle) than the mean pixel value in the surrounding area, i.e., outer circle (Mizutani et al. 2009; Inoue et al. 2013).

Swinburne University of Technology Sarawak Campus | Literature Review 51

THESIS – DOCTOR OF PHILOSOPHY (PHD)

Dashtbozorg et al. have used a hybrid approach for detecting microaneurysm and non-microaneurysm region. By applying gradients weighing technique and iterative thresholding method, various candidate microaneurysm regions are extracted. The candidate features are extracted using local convergence filter (LCF). Along with the LCF features, intensity based features and shape based features are added to construct a feature set. To discriminate microaneurysm and non-microaneurysm region, hybrid boosting/sampling algorithm RUSboost has been used. RUSboost (Seiffert et al. 2010) is a combination of adaptive boosting classifier AdaBoost and random undersampling method (RUS). To particularly work with microaneurysm detection, RUSBoost is coupled with decision tree to detect microaneurysm and non- microaneurysm region (Dashtbozorg et al. 2018).

Table 3.2 State-of-the-art for DR Detection using Red Lesions

Authors Features Method Target Class Database Performance Sensitivity= Pixel Neimeijer et In house 100% Red lesions classification DR, Normal al. (2005) database Specificity= using k-NN 87% Sensitivity= Fleming et Region growing In house 85.4% Red lesions DR, Normal al. (2006) based k-NN database Specificity= 83.1% Sensitivity= Jelinek et al. Top hat transform Hospital 85% Red lesions DR, Normal (2006) and Naïve Bayes database Specificity= 90% Gaussian Walter et al. In house Sensitivity= Red lesions filtering, top-hat DR, Normal (2007) database 97% transform Mizutani et ROC Sensitivity= Red lesions Double ring filter DR, Normal al. (2009) database 63.5% Sensitivity= Gracia et al. In house 100% Red lesions Neural Network DR, Normal (2010) database Specificity= 56% Sensitivity= Jaafar et al. Morphology 98.8% Red lesions DR, Normal DIARETDB (2011) based Specificity= 96.2% Morphology Inoue et al. ROC Sensitivity= Red lesions based, PCA, DR, Normal (2013) database 72.9% ANN ROC Sensitivity Dashtbozorg Thresholding, Red lesions DR, Normal database score=0.47 et al. (2018) LCF, RUSBoost MESSIDOR AUC= 0.798

Swinburne University of Technology Sarawak Campus | Literature Review 52

THESIS – DOCTOR OF PHILOSOPHY (PHD)

3.1.3. Diabetic Retinopathy Screening Methods

With the help of all those discussed DR lesions’ segmentation algorithms, various research groups able to develop automatic DR screening systems. Depending on the type, quantity, and location of the retinal lesions, DR severity levels can be characterized. Various methods have been proposed to assess DR severity scales and automatically identify the stage of DR in a patient. The majority of the DR screening methods are evaluated for 2-class classification, i.e. DR or No DR (Singalavanija et al. 2006; Tang et al. 2013). Using machine learning techniques, the retina images are classified into DR and No-DR based on the red lesion count (Roychowdhury, Koozekanani & Parhi 2013). An NPDR grading method has been proposed in (Usman Akram et al. 2014), where NPDR is graded into the mild, moderate, and severe based on the type and numbers of red lesions and exudates. The performance of this method has been evaluated in terms of accuracy of classifying dark and bright lesions. Similarly, based on the number of red lesions, four different severity grades have been-proposed in (Antal & Hajdu 2014). According to the red lesion count, images are graded into three classes. However, the performance of the algorithm is evaluated in terms of DR (R0) and No-DR (R1, R2, R3) images.

A three-class classification has been proposed in (Lee et al. 2005) using features such as haemorrhages, microaneurysms, hard exudates, and soft exudates. This method can accurately classify NPDR into the mild, moderate, and severe stage with an accuracy of 82.6%, 82.6%, and 88.3% respectively. In (Nayak et al. 2008) Neural Network is used to classify three classes with the help of features such as exudates, blood vessels, and texture. This method obtained classification accuracy of 93%, specificity of 100%, and sensitivity of 90%. A Genetic Algorithm optimized Probabilistic Neural Network (PNN) has been used in (M. R.K. Mookiah et al. 2013), to classify NPDR, PDR and Normal image using features like bifurcation points and area of the blood vessels, exudates, texture, and entropies. This method can classify normal image with an accuracy of 92.88%, NPDR with 96.97%, and PDR with 100% accuracy. The sensitivity and specificity of their proposed system are 96.27% and 96.08% respectively. Dupas et al. has provided a method, where the severity of DR is determined based on the presence of red lesions. According to the type of red lesions and their corresponding numbers, the retina images are graded into 0, 1, 2, and 3.

Swinburne University of Technology Sarawak Campus | Literature Review 53

THESIS – DOCTOR OF PHILOSOPHY (PHD)

Moreover, the risk of macular edema (ME) is evaluated by measuring the distance of exudates and fovea (Dupas et al. 2010).

Carrera et al. have developed a computer aided diagnosis method using SVM classifier. They have attempted to detect and grade NPDR by extracting the features of blood vessels, microaneurysms, and exudates. For segmenting blood vessels, the RGB color images have been converted into CMY image and morphological operation such as erosion, dilation, and opening are applied on Magenta component to segment the blood vasculature. In order to count the microaneurysms, green channel has been extracted and dilation operation is performed using disc structuring element. Then, edge detection and hole filling algorithms are used to detect the possible microaneurysms. For exudates detection, the Magenta component has been used from CMY image. By calculating the standard deviation of the magenta component, thresholding method is applied to extract the exudates region. Using the circular Hough transformation, optic disc is segmented and removed. By applying dilation and erosion operation the density of the exudates is computed. Total 8 features have been selected, viz. blood vessel density, actual number of microaneurysm, possible number of microaneurysm, density of the hard exudates, entropy of the green channel, and standard deviations of red, green, and blue channel. These features are fed to SVM to classify NPDR and grade NPDR into grade 1, 2, and 3 (Carrera & Carrera 2017).

There are methods available for four-class classification of DR stages, viz. normal, moderate NPDR, severe NPDR, and PDR. Acharya et al. has presented an automatic DR detection system using a combination of texture and obtained an accuracy of 85.2% (Acharya et al. 2012). Similarly, Yun et al., has proposed a method using Neural Network as a classifier and blood vessel area as features and achieved an accuracy of 84%, sensitivity 90%, and specificity 100% (Yun et al. 2008).

For classification of five classes viz. normal, mild NPDR, moderate NPDR, severe NPDR, and PDR, Acharya U et al. proposed a method using the bi-spectral invariant features of higher-order spectra techniques. By using an SVM classifier, the method attained an average accuracy of 82%, sensitivity 82%, and specificity 88% in (Acharya U et al. 2008). The same group in (Acharya et al. 2009) used the blood vessels, exudates, microaneurysms, and haemorrhages as features and SVM as a classifier to obtain an accuracy 85%, sensitivity 82%, and specificity 86%.

Swinburne University of Technology Sarawak Campus | Literature Review 54

THESIS – DOCTOR OF PHILOSOPHY (PHD)

Many of these automatic Computer Aided Diagnostic (CAD) methods are commercially available as retinal image analysis systems (Sim et al. 2015). Some of the commercially available systems are: Retinalyze System®, iGradingM®, IDx- DR®, RetmarkerDR®, etc. Retinalyze System® can detect the red lesions and classify DR and Normal retina image (Larsen et al. 2003; Hansen et al. 2004). Similarly, iGradingM® performs image enhancement and based on the count of microaneurysms it detects DR and No-DR images (Philip et al. 2007). The other systems proposed in (Fleming et al. 2010; Soto-Pedre et al. 2015) have achieved sensitivity above 90% for detecting referable retinopathy. The study shows that considering more than one type of lesion for detecting DR does not make any significant difference in sensitivity, however, it might increase the false positives (Goatman et al. 2011).

Another popular DR screening system is IDx-DR® (Abràmoff et al. 2010, 2013; Sánchez et al. 2011). This system utilizes several algorithms for DR clinical signs detection, such as microaneurysms, haemorrhages (Quellec et al. 2008; Niemeijer et al. 2005), hard exudates, cotton wool spots (Niemeijer et al. 2007), and neovascularization (Tang et al. 2013). The system has been validated for referable DR detection on a database of 1748 fovea-cantered images (Abràmoff et al. 2013).

RetmarkerDR®, is a software, developed at the University of Coimbra, combined the image quality control with red lesion detection (Oliveira et al. 2011). This system can classify DR and No-DR images and compare the image with that of the previous visit of the patient. The system is capable of reducing 48.42% of the human workload (Ribeiro et al. 2015). Then, one of the very useful software is Telemedical Retinal Image Analysis and Diagnosis Network®. This web-based service allows automatic evaluation of the retinal image quality (Chaum et al. 2008; Karnowski et al. 2011).

These CAD systems effectively applicable for DR screening and identify the retina affected by DR or referable DR. However, all these methods are unable to identify the high-risk DR or the presence of Diabetic Macular Edema (DME) (Sim et al. 2015). Zhou et al. have developed a CAD method by combining clinical features such as red lesions, bright lesions, and blood vessel to grade DR and DME (Zhou, Wu & Yu 2018). Table 3.3 summarizes different DR screening methods.

Swinburne University of Technology Sarawak Campus | Literature Review 55

THESIS – DOCTOR OF PHILOSOPHY (PHD)

Table 3.3 State-of-the-art for DR Screening Methods

Target Author Features Methods Database Performance Class Sensitivity= Exudates, 90% Nayak et al. Normal, Hospital vessel, and Neural Network Specificity= (2008) NPDR, PDR database contrast 100% Accuracy= 93% Sensitivity= normal, Morphological 90% Yun et al. blood moderate, Hospital transform, Neural Specificity= (2008) vessels severe database Network 100% NPDR, PDR Accuracy= 84% Sensitivity= DR, Normal, Dupas et al. Red lesions, 83.9% k-NN Macular MESSIDOR (2010) Exudates Specificity= Edema 72.7% Blood vessels, Sensitivity= Bifurcation 96.27% Mookiah et al. points, GA optimized PNN Normal, Hospital Specificity= (2012) exudates, classifier NPDR, PDR database 96.08% global Accuracy= texture, and 96.15% entropies Co- Sensitivity= occurrence normal, 98.9% Acharya et al. matrix moderate, Hospital Specificity= SVM (2012) and run severe Database 89.5% length NPDR, PDR Accuracy= matrix 100% Hidden Markov Sensitivity= Tang et al. Red lesions, Model, Hospital 92.2% DR, Normal (2013) Exudates heterogeneous database Specificity= ensemble classifier 90.4% Normal, Sensitivity= Hybrid classifier of Mild, Akram et al. Red lesions, 99.17% m-Mediods and Moderate, DIARETDB (2014) Exudates Specificity= Gaussian Mixer Severe 97.07% NPDR Sensitivity= Normal, Antal et al. 90% Red lesions Ensemble classifier Mild NPDR, MESSIDOR (2014) Specificity= DR 91% Sensitivity= Roychowdhury Gussian Mixer 100% Red lesions Normal, DR MESSIDOR et al. (2014) Model, k-NN, SVM Specificity= 53.16% Accuracy=92.4 %, Sensitivity= Normal, 87.3%, Blood NPDR Specificity=97.4 vessels, Carrera et al. Morphological % microaneury MESSIDOR (2017) operation, SVM Normal, Accuracy=85.1 sms, hard mild, %, exudates moderate, Sensitivity=49.4 severe %, Specificity= NPDR 88.4% Blood Normal, Sensitivity=96.4 vessels, red Morphological mild, Zhou et al. 6%, lesions, operation, Fisher moderate, MESSIDOR (2018) Specificity=98.5 bright Discriminant analysis severe 6% lesions NPDR Swinburne University of Technology Sarawak Campus | Literature Review 56

THESIS – DOCTOR OF PHILOSOPHY (PHD)

3.1.4. Deep Learning Methods

The recent advancement in the deep learning has unlocked the path of a new methodology for retinal image analysis. Especially, Convolutional Neural Network (CNN) has caught the attention of the researchers. Most of these works have used the popular CNN architecture for analyzing the DR images. Gulshan et al. used Inception V3 architecture for detecting moderate and worse diabetic retinopathy, which is considered as referable diabetic retinopathy, then, referable diabetic macular edema, or both. They conducted experiments on EyePACS-1 database consisting of 9963 images and Messidor-2 database consisting of 1748 images. The referable diabetic retinopathy has been defined as when any image fulfilled the criteria 1) moderate, severe or proliferative, (2) referable diabetic macular edema or both. A single network has been trained to make binary predictions about each of these conditions. For detecting referable DR, the algorithm has obtained an area under the receiver operating curve (AUC) of 0.991 for EyePACS-1 and 0.990 for Messidor-2. For EyePACS-1 the sensitivity achieved is 97.5% and specificity is 93.4%. For Messidor- 2 database, the sensitivity is 96.1% and specificity is 93.9% (Gulshan et al. 2016).

Doshi et al. has used deep convolutional neural networks for automated diagnosis and classification of five stages of the Diabetic Retinopathy based on severity. They have designed a CNN architecture comprising of five sets of combination of convolution, pooling and dropout layers in sequence. This is tailed by two sets of fully connected hidden and pooling layers. The achieved accuracy of a single trained network is 0.386 on a quadratic weighted kappa metric. The ensemble of three such identical models ensued a kappa score of 0.3996 on the EyePACs dataset database (Doshi et al. 2016).

Haloi et al. has provided a deep learning based automated method for microaneurysm detection. Each pixel of the image is classified as either microaneurysm or non-microaneurysm using a deep neural network with max-out activation function. Their architecture contains five layers including convolutional, max pooling, Softmax layer, and additional dropout layer. The presented method is evaluated using publicly available Retinopathy Online Challenge (ROC) and Diaretdb1v2 database. For ROC database, the method achieved AUC 0.98. For Diaretdb1v2 database, the methods achieved sensitivity 97%, specificity 95% and

Swinburne University of Technology Sarawak Campus | Literature Review 57

THESIS – DOCTOR OF PHILOSOPHY (PHD)

AUC 0.98. For DR and no-DR case, the method achieved sensitivity 97%, specificity 96%, and accuracy 96% in Messidor dataset (Haloi 2015).

Chandore et al. has used a deep CNN model of 15 layers to classify DR and non-DR images. Their CNN architecture contains thirteen convolutional layers and two fully connected layers. The layers are arranged as three successive convolution layers tailed by a max-pool layer. A ReLU layer is placed after each convolutional layer and fully connected layer and then, dropout layers are used to deal with overfitting. They experimented on 35,126 labelled high-resolution images from Kaggle dataset. The model has obtained an accuracy of 45% for detecting no DR image and 40% accuracy for detecting DR image. The method has achieved Precision value 0.88 for detecting DR images (Chandore Vishakha 2017).

Lam et al. has carried out experiments on deep learning models for DR grading. They have performed experiments for DR screening using AlexNet, VGG16, and GoogLeNet. These three models are trained and tested on Kaggle and Messidor Mild DR Dataset for binary classification (DR and no-DR). Among all these models GoogLeNet performed the highest. GoogLeNet achieved 95% sensitivity and 96% sensitivity on Kaggle dataset; and obtained 90% sensitivity and 71% specificity for Messidor dataset. For multi-class classification (5 Class) GoogLeNet obtained 75% sensitivity 74% positive predictive value and Kappa value 0.536 (Lam et al. 2018).

Gargeya & Leng have used deep learning for diagnosing DR. Their model attained 0.97 AUC, 94% sensitivity and 98% specificity validated by a 5-fold cross- validation scheme over a local dataset. They also tested the method against MESSIDOR 2 and E-Ophtha databases and attained a 0.94 and 0.95 AUC score, respectively (Gargeya & Leng 2017).

3.1.5. Discussion

From the extensive elaboration of research methods and algorithms for DR detection/classification, it can be observed that the most of methods follow the general machine learning rule. These methods are mostly dependent on the segmentation and feature extraction of various clinical lesions/features of DR. The algorithms available for DR lesions detection are quite heterogeneous. For all these

Swinburne University of Technology Sarawak Campus | Literature Review 58

THESIS – DOCTOR OF PHILOSOPHY (PHD) methods, the validation methods are not uniform and the test databases are not standardized. Therefore, it is difficult to make any direct comparison of the performance among them. Although these algorithms show promising performance, the challenges still remain. Addition improvement is necessary for these proposed CAD methods in order to reduce the workload of the ophthalmologists efficiently. Detection of DR in the earliest stage is still challenging. The work developed by Singalavanija et al., 2006, has achieved good sensitivity and specificity for classifying normal and DR fundus, however, the method is not sensitive enough to detect early stage of non-proliferative DR (NPDR). The majority of the works is focused on detecting DR and No-DR images. The methods available for grading different types/stages of DR show poor performance in detecting the earliest signs of DR or mild-NPDR. Recent works on the deep learning for DR grading are promising. However, most of the CNN architectures used are highly complex and require high- end hardware to support such large networks. Moreover, they require huge amounts of data to train and consume maximum time and memory.

3.2. State-of-the-Art of RVO Detection

The automatic detection of RVO is relatively novel in the field of Computer Aided Detection (CAD) of retinal blood vascular disease. The literature survey provides plenty of work for automatic detection of ocular disease like Diabetic Retinopathy, Glaucoma (Bock et al. 2010), and Age related Macular Degeneration (AMD) (Burlina et al. 2011) etc. However, only a few research works are available to automatically diagnose and detect RVO types. The state-of-the-art methods for RVO detection are mostly feature representation based, i.e. using handcrafted segmentation, feature extraction algorithms for analysing blockage in the blood vessels.

3.2.1. Feature Representation

In the feature representation technique, various features of retina are being processed and extracted. The extracted features are then fed to a classifier to detect/classify a disease. H. Zhang et al., has used Hierarchical Local Binary Pattern (HLBP) to characterize the abnormal features of blood vessels to detect BRVO in

Swinburne University of Technology Sarawak Campus | Literature Review 59

THESIS – DOCTOR OF PHILOSOPHY (PHD) fluorescein angiography (FA) images. They took inspiration from the convolutional neural network and proposed a hierarchical combination of Local Binary Pattern (LBP) and max-pooling operation. There are two levels in HLBP, containing a max- pooling layer and an LBP-coding layer in each level. In the first level, a feature map-1 is generated by applying a max-pooling operation on the Fluorescein Angiography (FA) image. Then, the LBP is performed on the first level features, i.e. feature map-1, to generate an LBP1 feature map. In the second level, again, max pooling and LBP is performed on the LBP1 feature map respectively to generate an LBP2 map. Then, the histogram of LBP1 and LBP2 is combined to produce the final feature vector. For testing, the fluorescein angiography (FA) images are collected from a private hospital. A total of 670 images are gathered from 200 people, where 570 images are of BRVO and 100 images are of normal eyes. Each image is of size 768 × 576. The BRVO images are collected from 100 patients while every subject has donated 4-11 images. The 10-fold scheme is used for performance evaluation, where each fold contains 57 BRVO images and 10 normal images. During training session, nine folds are used for training and remaining one fold is used for testing. In total 10 test results are obtained and the overall performance accuracy of the method is calculated by taking the average value of the 10 results. For classification, Support Vector Machine (SVM) with the linear kernel is used and attained a mean accuracy of 96.1% (H. Zhang et al. 2014).

Anitha et al. has developed an automatic eye disease detection system where CRVO is one of the considered eye diseases along with other three. Fuzzy C-means clustering is applied for feature extraction from the images and Back Propagation Neural Network (BPNN) or minimum distance classifier is used for classification. The database used for experiments is collected from a private hospital, consists of 205 images where each image is of size 1504 × 1000. The database constitutes of 4 types of images, namely non-proliferative diabetic retinopathy (NPDR) images, CRVO images, choreoidal neo-vascularisation membrane (CNVM) images and central serous retinopathy (CSR) image. The network is trained using 20 images of each 4 types of diseases and testing is performed over 32 CNVM, 27 CRVO, 30 NPDR and 36 CSR images. For BPNN based scheme, they have achieved 92% of accuracy, and for minimum distance based classifier 64% of accuracy is achieved (Anitha, Selvathi & Hemanth 2009).

Swinburne University of Technology Sarawak Campus | Literature Review 60

THESIS – DOCTOR OF PHILOSOPHY (PHD)

3.2.2. Texture Analysis

In a texture analysis method, various retinal features such as, blood vessels, suspicious regions/lesions are analysed to extract meaningful structural or statistical properties. Most of the existing work attempting RVO detection is based on the textural analysis of retinal blood vessels. Gayathri et al. have used blood vessel as features to diagnose the possible blockage in the vein. At first, the blood vasculature is segmented. Then, Completed Local Binary Pattern (CLBP) is performed to extract the textures of the blood vessel. CLBP is initiated with a centre pixel, and then, a local difference sign-magnitude transforms (LDSMT) is applied to extract sign and magnitude of CLBP. The sign and magnitude components of CLBP are combined to generate the histogram. Then, the neural network is used to analyse the histogram and classify the abnormal retinal images. With the regression plots, they showed the feasibility of their method to detect retinal blood vascular disease like RVO (Gayathri et al. 2014).

Fazekas et al. have proposed a fractal analysis based method for detecting possible occlusion in the retinal blood vessels. Fractal analysis helps to understand a shape or pattern in a data those are difficult to describe using simple geometry. Fractal analysis approximates the fractals in an image/data. The fractals are referred to recurring pattern that can exist amidst a chaotic data. The fractal properties can be estimated by computing the fractal dimension, which provides the statistical index of the complexity of a fractal pattern and quantifies how details in a pattern changes with scale at which it is measured in a form of ratio. Fazekas et al., 2015 , applied fractal analysis on two blood vessel segmentation methods to learn the normal and abnormal blood vessels. The fractal properties of segmented blood vessels are evaluated to differentiate normal retina image and RVO affected images. For fractal analysis, box- counting method has been used to compute the fractal dimension, where the box- dimension of the vessels is calculated. In the box-counting method, a grid is laid over the image and checks how many boxes cover the part of the image (here, blood vasculature). By shrinking the grid size repeatedly, the pattern of a particular object can be extracted more accurately. If a fractal object F is divided into N number of homogeneous objects with a scaling factor, fractal dimension of F can be simply defined as follows (Mandelbrot & Wheeler 1983; Burn & Mandelbrot 2007):

Swinburne University of Technology Sarawak Campus | Literature Review 61

THESIS – DOCTOR OF PHILOSOPHY (PHD)

(3.1)

Now, box-counting does not exist for all the sets, therefore, the upper and lower box counting. If F is a bounded subset of m dimensional Euclidean space, the lower and the upper box counting dimensions of a subset can be calculated as follows(Mandelbrot & Wheeler 1983; Burn & Mandelbrot 2007; Fazekas et al. 2015):

̅̅ ̅ ̅̅ ; (3.2)

When the lower and upper box-counting dimensions produce the same value, then the generated common value is referred to as the box-counting dimension of F and is denoted as(Mandelbrot & Wheeler 1983; Burn & Mandelbrot 2007; Fazekas et al. 2015):

(3.3)

where can be one of a following: “(i) the smallest number of closed balls (i.e., disks, spheres) of radius that cover F; (ii) the smallest number of cubes of side r that cover F; (iii) the number of -mesh cubes that intersect F; (iv) the smallest number of sets of diameter at most r that cover F; (v) the largest number of disjoint balls of radius r with centres in F.” From the fractal dimension calculation, the difference between the fractal dimension of the healthy image and RVO affected image is huge. So, the authors argued the possibility of detecting different types of RVO using fractal analysis (Fazekas et al. 2015).

Similarly, Zode have used fractal analysis for detecting BRVO. They used Box-Counting Method, Density-Density Correlation method, and Mass-Radius method to calculate fractal dimension. Out of these three methods, Box counting and Mass-Radius methods have achieved more precise results. In the Mass-Radius method, a relationship is defined between the size of object of certain radius and the area within the radius. This dimension analysis can be initiated from different points and different radius. The fractal dimension is calculated by plotting the log-log plot of the area as the function radius. Zode et al. constructed series of concentric circles cantered at the optical disc. Then, the mass of the occupied circles within a given

Swinburne University of Technology Sarawak Campus | Literature Review 62

THESIS – DOCTOR OF PHILOSOPHY (PHD) radius is calculated. The fractal dimension has been estimated from the log of pixel mass within the circle of given radius vs. log of radius of circle plot (Zode 2017).

3.2.3. Deep Learning Approach

For BRVO detection, the deep learning approach has been first exploited by Zhao at al. They used a classical CNN model to classify the BRVO and normal color fundus images via image-based and patch-based approaches. In the patch-based approach, the image is distributed into small patches and labelled according to the presence of BRVO or normal features. Only the patches having the evident BRVO feature are labelled as BRVO and abstruse patches are rejected. In the image based scheme, along with the original image, three extra images are created by adding noise, flipping, and rotating the pre-processed image. On the basis of classification results made by the newly generated four images, the final classification decision has been made for a test image. They have also used the same database and 10-fold scheme as in (H. Zhang et al. 2014), where images in one fold are tested while images in nine folds are used for training. The only difference is they used 100 BRVO and 100 normal images. Therefore, every fold is constructed with 10 BRVO and 10 normal images. They have obtained a mean accuracy of 98.5% and 97% for patch-based and image-based method respectively (Zhao, Chen & Chi 2015).

Table 3.4 summarizes the state-of-the-art for RVO detection. Table 3.5 shows the details of significant research works on RVO detection.

3.2.4. Discussion

In the case of automatic detection of RVO, the literature provides very limited information, unlike DR detection. Most of the state-of-the-art techniques are focused on the detection of BRVO. The majority of these methods rely on the segmentation of blood vessels. Some methods provided only the possibility of detecting RVO without any performance evaluation for detecting RVO (Fazekas et al. 2015; Gayathri et al. 2014). From the limited literature, it can be observed that RVO is still the least exploited in the area of Computer Aided Detection Methods for blood vascular disease, considering the fact that it is the second most popular reason for vision loss, after DR. Since there is a lack of clinical studies on RVO, therefore, there is a lack of

Swinburne University of Technology Sarawak Campus | Literature Review 63

THESIS – DOCTOR OF PHILOSOPHY (PHD)

Table 3.4 State-of-the-Art of RVO Detection

Author Target Class Method Remarks Fuzzy C-means CRVO is one of the 4 other clustering, Back J. Anitha et. al (2009) CRVO eye diseases for multiclass Propagational Neural classification Network (BPN) Complete Local Binary R= 0.98 under Regression Gayathri et al.(2014) - Pattern, Neural plot, R=0.69 for testing. Network Hierarchical Local Accuracy 96.1%. Zhang et al. (2014 ) BRVO Binary Pattern, Support

Vector Machine Accuracy 97% (images Convolutional Neural Zhao et al. (2015 ) BRVO based), 98.5% (patch- Network based) CRVO, Fractal properties of Fazekas et. al (2015) BRVO, No performance evaluation blood vessels HRVO Zode et al (2017) BRVO Fractal analysis No performance evaluation

Table 3.5 Significant Existing Techniques for RVO Detection

Target Training Testing Author Database Method Accuracy Class Images Images

Hierarchical Linear 570 BRVO + 96.1%. Zhang et BRVO, Binary Pattern and 100 Normal = 603 67 al. (2014) Normal Support Vector 670 images Machine

Whole image and Convolutional 97% Neural Network 100 BRVO + (LeNet) Zhao et BRVO, 100 Normal = 180 20 al. (2015) Normal 200 images Patch image and Convolutional 98.5% Neural Network (LeNet)

Fuzzy C-means clustering and Back Propagational 92 % NPDR, Neural Network J. Anitha CRVO, (BPNN) 205 images 80 125 et. al CNVM, CSR Fuzzy C-means clustering and 64 % Minimum Distance Classifier

Swinburne University of Technology Sarawak Campus | Literature Review 64

THESIS – DOCTOR OF PHILOSOPHY (PHD)

-adequate research on automatic detection of RVO in the early stage as well (Ageno & Squizzato 2011). There is no such algorithm proposed till date that can detect all three types of RVO. The possible reason behind the lack of research on automatic detection of all types of RVO could be the complexity laying in the representation of all the clinical features of RVO. For e.g., haemorrhages can be of different shape, size, and texture and literature do not support research work for detecting all these haemorrhages. Similarly, the state-of-the-arts for detecting cotton wool spots and dilated tortuous veins are also very limited. Moreover, there is no standard set of rules for identifying the RVO types for automated screening process.

3.3. Discussion on Existing Methodologies for Retinal Blood Vascular Disease Detection

From the literature, the state-of-the-art methods for retinal blood vascular disease detection can be divided mainly into traditional methods and deep learning methods. In the traditional machine learning methods, it can be seen that the majority of the works, either for DR or RVO, practices the hand designed segmentation and feature extraction algorithms for abnormal features such as microaneurysms, haemorrhages, hard exudates, cotton wool spots, and blood vessels to classify the disease. The problem here is that the classification decision relies on the successful segmentation and feature extraction method. On the other hand, the deep learning methods are free from hand designed feature extraction methods. The deep models can extract the features from the image pixels itself. However, these popular deep learning models are highly complex and demand a huge amount of training data, memory, and training time. Tuning the parameters and hyperparameters is still an issue. In the part of DR detection, most of the existing methods show poor performance in detecting DR at the earliest stage as the textural change in the retina during mild NPDR is indistinct.

There are two possible approaches for automatic detection of retinal blood vascular diseases such as DR and RVO. Either follow the traditional method and use hand designed segmentation and feature extraction algorithms to individually extract the abnormal features, then, pass those features through a classifier to recognize the type of retinal disease (DR or RVO/DR types/RVO types). Otherwise, the abnormal

Swinburne University of Technology Sarawak Campus | Literature Review 65

THESIS – DOCTOR OF PHILOSOPHY (PHD) features can be extracted from the appearance of the whole image, and use machine learning classifiers to discriminate between DR and RVO. For detecting all types of RVO compound pattern recognition techniques are required. Again, it is quite challenging to design algorithm in such a way that it can identify the common lesions and distinguish different diseases. Moreover, the performance of the final classification for such conventional CAD methods depends on the quality of the acquired retina image. The factors such as the inter-image and intra-image contrast, color variation, and luminosity make it challenging to identify the abnormal features.

The possible effective way of diagnosing retinal blood vascular disease is the deep learning approach. Because using deep learning it is possible to avoid the burden of complex multifaceted hand-designed feature extraction methods. Therefore, it will significantly reduce the overall complexity of a CAD method and also improve the performance effectively. Only thing should be taken care of is the inherent complexity of the deep learning models and its huge requirement of the resources. It is required to find a way to set the hypothesis for designing simple, effective deep learning models useful for a particular task in hand.

3.4. Chapter Summary

In this chapter, a thorough literature survey has been provided. Various state- of-the-art techniques for DR and RVO detection have been discussed. The various available works on DR analysis are explained in details. The state-of-the-art methods are divided into methods for red lesion detection, bright lesion detection, DR screening tools and deep learning methods. For each category, the existing methods are summarized and compared in individual tables. Similarly, the state-of-the-art methods for RVO detection are provided with the description of those methods. The methods are compared in tabular form. However, compared to the literature on DR detection, the literature on RVO detection is quite limited. After elaborating the state- of-the-art methods for DR and RVO, those methods are discussed in details in the separate discussion section. Moreover, a discussion of the available methods for retinal blood vascular diseases is provided with the possibilities to improve the methods and the challenges.

Swinburne University of Technology Sarawak Campus | Literature Review 66

THESIS – DOCTOR OF PHILOSOPHY (PHD)

Chapter-4

4 Materials and Methods

In this chapter, the materials and methods employed for retinal blood vascular disease diagnosis are explained meticulously. Initially, some of the popular machines learning methods/algorithms used for retinal blood vascular disease detection have been briefly discussed. Then, a brief introduction to deep learning is provided and different models are explored. The architecture of the basic Convolutional Neural Network (CNN) is briefly discussed in order to explain why it is useful for retinal image analysis. This chapter explains the hypothesis of designing a Convolutional Neural Network (CNN) for retinal image analysis and retinal blood vascular disease detection. The designing of CNN for retinal abnormality detection is explained thoroughly starting from its foundation. In this chapter, a novel architecture of CNN, based on basic CNN, has been proposed for diagnosing retinal blood vascular diseases such as DR and RVO. The proposed CNN is an efficient model to detect DR at the earliest stage and grade DR according to the severity level. Moreover, Cascaded CNN, a novel architecture, has been proposed to particularly diagnose RVO and detect all its three types. These proposed methods have been discussed in different subsections.

4.1. Machine Learning Methods

Machine Learning (ML) is the essence of Artificial Intelligence (AI), which makes a machine/computer learn itself from the given data without being explicitly programmed. We feed the input data and output to a machine and the machine learning algorithms build its own logic to learn from the data, which we can evaluate by testing the program. The learning can be supervised, semi-supervise, unsupervised, and reinforcement.

Swinburne University of Technology Sarawak Campus | Materials and Methods 67

THESIS – DOCTOR OF PHILOSOPHY (PHD)

Figure 4.1: Machine Learning Method

 Supervised Learning: In supervised learning. The model is trained with the labelled data and target class. The model generates a program to interpret the logic to predict the target class from the given labelled data.  Unsupervised Learning: In unsupervised learning, no labelled data is provided to the model. The model generates an inferred function that learns the underlying hidden pattern. It sorts the data into group according to similarities, differences, and pattern without any guidance.  Semi-Supervised Learning: In semi-supervised learning, a model is provided with both labelled and unlabelled data. Typically, there are less labelled data and more unlabelled data is provided in order to improve the learning accuracy.  Reinforcement Learning: In this case, the model interacts with the environment and takes actions according to its experience. Then, for each action, the model is either rewarded or penalized. From a series of actions and consequences, the model learns to make correct decisions.

A brief description of various machine learning algorithms has been provided in the following sub-sections.

4.1.1. K-Nearest Neighbour Algorithm

It is one of the popular machine learning algorithms used for classification. It preserves all the available samples and classifies new samples based on a similarity measure. An object is classified to a particular class based on the majority votes of the k nearest neighbors measured by a distance function (e.g., Euclidean, Manhattan, Hamming) (Peterson 2009; Karaca & Cattani 2018). If , the object is classified

Swinburne University of Technology Sarawak Campus | Materials and Methods 68

THESIS – DOCTOR OF PHILOSOPHY (PHD) to the class of the single nearest neighbour. In Figure 4.2, if k=3, then the star class will be class B, if k=6, the star class will be class A.

Figure 4.2: k-NN Algorithm Explanation

The algorithm for k-NN classifier can be stated as below:

Step 1: Load the data Step 2: Initialize a positive integer value of k Step 3: For classification of the test example, iterate from 1 to N=total number of training samples 3.1. Calculate the Euclidean distance between test example and each row of the training examples. 3.2. Sort the distance value in ascending order. 3.3. Select top k rows from the array 3.4. Get the majority vote of these rows and select the frequent class. 3.5. Return the predicted class.

Swinburne University of Technology Sarawak Campus | Materials and Methods 69

THESIS – DOCTOR OF PHILOSOPHY (PHD)

4.1.2. Support Vector Machine (SVM)

A SVM is a popular method of classification as it can handle both linear and non-liner data. A SVM can be defined by a hyperplane in and N-dimension space, where N is the number of features, which distinctly separates two classes by maximizing the distance margin of the two classes (Shown in Figure 4.3). To discriminate two classes, there could be many possible hyperplanes separating the two classes. For N number of features, there could be N number of hyperplanes. The objective of the SVM is to find the optimal hyperplane that has the maximum margin of distance between data points of both classes.

Figure 4.3: Support Vector Machine (SVM) Explanation

In case of non-linearly separable data points, SVM maps the data points into another dimension using a wide range of mathematical operations, known as kernel, such a way that it can draw a linearly separable hyperplane. The kernel function can be linear, sigmoid, polynomial, and Radial Basis Function (RBF). Figure 4.4 shows how SVM maps the linearly non-separable data into another dimension using kernel ϕ and rearranges the data in order to make them linearly separable (Cortes et al. 1995; Marwala 2018).

Swinburne University of Technology Sarawak Campus | Materials and Methods 70

THESIS – DOCTOR OF PHILOSOPHY (PHD)

Figure 4.4: SVM Mapping Non-Separable Data to High Dimensional Space.

4.1.3. Decision Tree

A decision tree is a supervised learning algorithm mostly used for classification of both categorical and continuous data. Here, the data points are split into two or more homogeneous groups. A decision tree can be referred to as a directed graph that starts from a root node. Then, it spilt into two or more leaf nodes that represent the classes/categories the tree can classify (shown in Figure 4.5). Based on different decision tree criteria, a node is split into two leaf nodes. Chi-Square is a popular splitting algorithm for decision tree. It computes the statistical significance of the differences between the parents and the sub-nodes. It can be measure as follows (Flora 1982; Willard 2019):

∑ (4.1)

Other than Chi-square, there are Gini, reduction in variance, and information gain approach for splitting the nodes in a decision tree.

Swinburne University of Technology Sarawak Campus | Materials and Methods 71

THESIS – DOCTOR OF PHILOSOPHY (PHD)

Figure 4.5: Decision Tree

4.1.4. Linear and Logistic Regression

Linear regression defines the simple relationship between the input and the output, which states that for every change in the input there will be a change in the output. Therefore, it predicts a real valued output based on the given input value. A linear equation adds scale factor or a coefficient c to each input value. Additionally, it adds another coefficient, called bias or intercept, in order to provide additional degree of freedom. For a single independent variable (input) x and a dependent variable (output) y, the simple linear regression can be expressed as below (Davis & Offord 1997; Goodman 2017):

(4.2)

Where, b is the bias or intercept and c is coefficient or slope of the line defining the relationship between input and output. Figure 4.6 shows the linear regression line.

The logistic regression on the other hand put a limit to the output data no matter how big is the input value. It squashes the space between the ranges 0 to 1

Swinburne University of Technology Sarawak Campus | Materials and Methods 72

THESIS – DOCTOR OF PHILOSOPHY (PHD)

(shown in Figure 4.7). While linear regression predicts the continuous numerical value,, the logistic regression acts as a binary classifier that classifies data into category and no-category. Therefore, logistic regression is widely used for classification task in machine learning and useful for non-linear data.

Figure 4.6: Linear Regression

Figure 4.7: Logistic Regression

Swinburne University of Technology Sarawak Campus | Materials and Methods 73

THESIS – DOCTOR OF PHILOSOPHY (PHD)

4.1.5. Artificial Neural Network

Artificial Neural Network (ANN) is inspired by the human brain and consists of set of algorithms that recognizes a pattern from the given labelled input data. ANN contains series of calculation units, called neurons and they are connected by the synapses or just the weight values as shown in Figure 4.8.

Figure 4.8: Artificial Neurons inspired by Human Brain Neurons.

The series of neurons in ANN is also called layers. Therefore, ANN contains input layer, hidden layer, and output layer (shown in Figure 4.9). For a given labelled dataset the neurons in the hidden layer performs computation using activation function to learn the pattern in the data and classify them into the target classes. An activation function triggers when it finds particular pattern, which can be expressed as follows:

∑ (4.3)

Where, output y is the function of input x and can be calculated by adding a bias b to the weighted sum of the input set { }.

Swinburne University of Technology Sarawak Campus | Materials and Methods 74

THESIS – DOCTOR OF PHILOSOPHY (PHD)

Figure 4.9: Artificial Neural Network (ANN)

4.2. Deep Learning

Deep Learning is a method, which is mainly inspired by the anatomy and functions of the human brain and can learn to differentiate between different inputs by automatically extracting discriminative features (Lecun, Bengio & Hinton 2015). Deep learning is a branch of the broad family tree of machine learning methods and the main intension to introduce it is to bring machine learning closer to one of its original goals, i.e., artificial intelligence. Learning can be supervised, partially supervised or unsupervised. Three major advantages of utilizing deep learning are:

 It reduces one of the most time-consuming parts of machine learning practice i.e., feature engineering.  It shows the best performance in classification problems of multiple domains and significantly outperforms other solutions by a significant amount.  It has special architecture, which is easily adaptable to new problems, e.g. vision, time series, language etc.

In recent years, Deep Learning (DL) methods and architectures have shown outstanding performance in several complicated tasks, such as image understanding, image classification, speech recognition, end-to-end machine translation, etc.

Swinburne University of Technology Sarawak Campus | Materials and Methods 75

THESIS – DOCTOR OF PHILOSOPHY (PHD)

(Angelov & Sperduti 2016). The fundamental theory of deep learning is to learn data representations through hierarchical levels of abstraction. In lower levels less abstraction representations are defined, then gradually in the higher levels, more abstract representations are learned. This type of powerful method helps a system to understand and learn intricate representations from the raw data itself. Therefore, this hierarchical method is beneficial in many disciplines (Bengio 2009; Goodfellow et al. 2016). There are several DL architectures available in the literature such as Deep Neural Network (DNN) (Saxe, McClelland & Ganguli 2013), Recurrent Neural Network (RNN) (Pascanu et al. 2013), Convolutional Neural Network (CNN) (Wiatowski & Bölcskei 2018; LeCun & Bengio 1995), Deep Auto-encoder (DA) (Vincent et al. 2010), Deep Boltzmann Machine (DBM) (Salakhutdinov & Hinton 2009), Deep Belief Network (DBN) (Hinton, Osindero & Teh 2006), Deep Residual Network (Han, Kim & Kim 2016), etc.

In this research, a Deep Learning (DL) method has been proposed for analyzing the retinal images and extracting the relevant information from an image, which can be used for quantifying the disease of interest. Specifically, the main focus is on exploiting architecture of the Convolutional Neural Network (CNN) to analyze the retina image and interpret the possible abnormality. By performing textural analysis on the features extracted from the whole retina image, it is possible to detect and grade the retinal disease. Deep learning can help to analyse the texture of the image from the abstract level and predict the blood vascular disease in a fully automated way.

4.3. Convolutional Neural Network (CNN)

4.3.1. Overview

The Convolutional Neural Network (CNN) is a deep learning model inspired by the biological functions of the visual cortex. The visual cortex consists of numerous specialized small cells looking for specific characteristics. Thus, a CNN consists of specialized components similar to the neuronal cells within the visual cortex, and those are sensitive to some particular regions of the visual field, called the receptive field. The receptive field behaves as local filters over the input space. This

Swinburne University of Technology Sarawak Campus | Materials and Methods 76

THESIS – DOCTOR OF PHILOSOPHY (PHD) notion of having specialized modules inside of a system performing specific tasks is the basis behind CNN. By assembling multiple different layers, complex architectures of CNN can be fabricated for various classification problems (LeCun et al. 1998).

Deep Convolutional Neural Networks have provided the state-of-the-art classifications and regression results over various high-dimensional problems (Krizhevsky, Sutskever & Hinton 2012). CNNs are designed in such a way that it can process data in any form, including multiple array data, e.g. an RGB color image has three 2D arrays of pixel intensities. Various types of data can also be considered as in the form of multiple arrays, such as 1D for signals and its sequences, 2D for image and audio spectrograms, and 3D for video and volumetric images. This model is beneficial for the kind of data having an internal structure like images, and where it is required to discover invariant features. Typically, there are four pivotal notions behind every CNN network: ‘local connections’, ‘shared weights’, ‘pooling’ and ‘multiple layers’, which gathers advantages from the properties of natural signals (LeCun, Bengio & Hinton 2015). The classic architecture of CNN is presented in Figure 4.10. Initially, there are mainly two types of layers in the network: convolutional layers and downsampling layers. A convolution layer generates feature maps, which is the units of a convolutional layer. Each unit in the convolution layer is attached to a local patch in the feature maps generated by the previous layers with a set of weights, which is named a filter bank. Then, the outcome generated by this local weighted sum is sent through a non-linear layer, for e.g. ReLU (rectified linear unit). All units of a feature map are part of the same filter bank and different feature maps in a layer are part of different filter banks. The motive behind this kind of architecture is that, in array data like images, the neighbouring pixel values are often extremely interconnected and form distinct local motifs, which can be easily perceived. Moreover, the local information about images and other types of signals are invariant to its locations. This means, if a motif or feature can appear in one region of the image, then it could appear in other regions of the image as well. Hence, the idea comes to use units at different positions that share the same weights and detects the same patterns at the different location of the array (Lecun, Bengio & Hinton 2015). The filtering operation that is mathematically executed by a feature map is a distinct convolution, therefore the name of CNN.

Swinburne University of Technology Sarawak Campus | Materials and Methods 77

THESIS – DOCTOR OF PHILOSOPHY (PHD)

Figure 4.10 Basic CNN Architecture

4.3.2. Types of CNN

A Convolutional Neural Network (CNN) is an advanced version of a multilayer perceptron (MLP) network with a special topology comprising multiple hidden layers (Krizhevsky, Sutskever & Hinton 2012). CNN is mostly applied for object recognition, speech recognition, and handwritten character recognition, as it is capable of extracting distinguishing features automatically inside its layers from the raw input data, without any explicit normalization. There are several popular architectures of Convolutional Networks. The most common architectures are:

 LeNet: It is the first Convolutional Networks developed by Yann LeCun in 1990s. The LeNet architecture was successfully used for reading digits, zip codes, etc. (LeCun et al. 1998).

 AlexNet: AlexNet is the first work, developed by Alex Krizhevsky, Ilya Sutskever and Geoff Hinton, that made CNN popular in the field of computer vision. The architecture of this network is very similar to LeNet but bigger and deeper. In this network, the convolutional layers are arranged on top of each other like a stack. In 2012, it was submitted to the ImageNet ILSVRC

Swinburne University of Technology Sarawak Campus | Materials and Methods 78

THESIS – DOCTOR OF PHILOSOPHY (PHD)

challenge and won the challenge with outstanding performance. It achieved a top 5 error of 16% compared to 26% error obtained by the second runner-up (Krizhevsky, Sutskever & Hinton 2012).

 ZF Net: The Convolutional Network developed by Matthew Zeiler and Rob Fergus is known as the ZFNet (short for Zeiler & Fergus Net). This network is developed by improving the AlexNet architecture. The hyperparameters of AlexNet is tweaked by expanding the size of the middle convolution layers and choosing smaller filter size and stride in the first layer. It was the winner of ILSVRC 2013 (Zeiler & Fergus 2014).

 GoogLeNet: Szegedy et al. from Google developed this CNN, which was the winner of the ILSVRC 2014. The introduction of Inception Module in this CNN dramatically reduced the number of parameters in the network. In comparison with 60M parameters in AlexNet, GoogLeNet contains 4M parameters. In addition to that, average pooling has been used instead of fully connected layers at the top of the network, which significantly reduced a large number of insignificant parameters. There are also some follow-up versions of the GoogLeNet, the most recent version is the Inception-v4 (Szegedy et al. 2015).

 VGGNet: It is developed by Karen Simonyan and Andrew Zisserman, which is the runner-up in ILSVRC 2014. The main contribution of VGGNet is the notion that the depth of the network plays a critical role in achieving good performance. VGGNet contains 16 Convolution and fully connected and uses 3x3 convolution filters and 2x2 pooling filters from the beginning of the network to the end. Of the network, this makes the network extremely homogenous. The disadvantage of the VGGNet is that it is extremely expensive as it requires a huge amount of memory and parameters. It uses 140M parameters and most of them are in the first fully connected layer. These fully connected layers can be removed and use different classifiers without degrading the performance. Removing the fully connected layers can significantly reduce the number of necessary parameters. The pre-trained

Swinburne University of Technology Sarawak Campus | Materials and Methods 79

THESIS – DOCTOR OF PHILOSOPHY (PHD)

model of VGGNet is available for plug and play use in Caffe (Simonyan, Andrew Zisserman & Zisserman 2014).

 ResNet: Residual Network is the winner of ILSVRC 2015, which is developed by Kaiming He et al. It has a special feature of skipping connections and extensive use of batch normalization. In this architecture, there are no fully connected layers at the end of the network. This network is available for experiments in Torch. ResNets are the current state of the art Convolutional Neural Network models and a popular choice for using CNN in practice starting from May 10, 2016. (He et al. 2016).

4.3.3. Limitations and Challenges

Deep learning models require an immense amount of training dataset. The classification accuracy of deep learning classifier is mostly dependent on the size and quality of the dataset. This mainly concerns the computational efficiency of the classifier. Currently, there is a significant amount of computational burden on the popular deep models to achieve the performance equivalent to the state-of-the-art methods on medium to large size data sets, especially for off-line environments (Angelov & Sperduti 2016). During the testing time, these deep learning models are extremely time-consuming and memory demanding. It makes them unsuitable to be deployed on the mobile platforms having limited resources. Therefore, it is very essential to investigate how to reduce the complexity of the architecture and acquire fast-to-execute models without losing the accuracy (Gu et al. 2018).

In the field of medical imaging, it is difficult to have easy access to the acquired images due to various factors, such as cost, security, and privacy. Therefore, in the medical imaging field, unavailability of the dataset is one of the major hurdles in successful implementation of deep learning methods. In addition to that, developing a large dataset of medical images with annotations is quite a challenging task, because it requires a large amount of time from medical experts. Moreover, providing annotations to the images requires opinions from multiple experts to avoid the human error. Sometimes, the annotations are nearly impossible when there is a lack of qualified expert or insufficient data in case of rare diseases. Another major

Swinburne University of Technology Sarawak Campus | Materials and Methods 80

THESIS – DOCTOR OF PHILOSOPHY (PHD) issue of deep learning models is the unbalancing of data. If the training sets do not contain an almost equal amount of data for each target class, the deep learning models tend to under-fitting. Unbalancing of data is a very common issue in the health sector, especially for rare diseases. On account of being rare, there is an insufficient representation in the data sets, and if not taken care of this issue carefully, it ensues the class imbalance (Mahmud et al. 2018; Lippi 2017).

One of the major challenges for employing CNN on a new task in hand is that it demands substantial skill and experience to determine suitable hyperparameters, such as the number of layers, kernel sizes of convolutional layers, the learning rate, etc. These hyper-parameters have internal dependencies and that intra-reliability makes them more expensive for tuning. Therefore, tuning hyper-parameters for these models is still an issue. Moreover, there is an inadequate understanding of how to choose structural features efficiently. Although various computational units have been proposed based on their mathematical properties, current research on selecting hyperparameters is mostly experimental and ad hoc (Angelov & Sperduti 2016).

4.4. Proposed Methodology for Diagnosing Retinal Abnormality

The ultimate goal of this research is to build a powerful model that can diagnose and detect retinal blood vascular diseases, particularly Diabetic Retinopathy (DR) and Retinal Vein Occlusion (RVO), as early as possible and detect their types, and also distinguish them irrespective of their common clinical features. Therefore, a simple CNN architecture has been proposed with reduced complexity, which can efficiently detect retinal blood vascular diseases individually and can also classify the different diseases regardless of their common features. The proposed CNN analyses the retinal abnormality in a more effective way and detects the abnormality, specifically related to RVO and DR, as early as possible. In this image-based method, the designed CNN assesses the whole input retina image at the same time. The complete process can be mainly divided into two phases: Image pre-processing and Classification. The overview of the proposed methodology for the automated detection of retinal abnormality is shown in Figure 4.11 .

Swinburne University of Technology Sarawak Campus | Materials and Methods 81

THESIS – DOCTOR OF PHILOSOPHY (PHD)

Figure 4.11 The Overview of the Proposed Computer Aided Diagnosis of Retinal Abnormality

4.4.1. Image Pre-processing

Image pre-processing is an imperative stage for a classification task, especially for medical images. The medical images, due to different image acquisition techniques, tend to have more noise and illumination problem. Before supplying the retinal images to CNN, pre-processing is performed over the input images in order to enhance the quality and remove noise. For pre-processing, first, the green channel has been extracted from the color fundus images as it offers more distinct visual features of the retina than the other two channels (blue and red). To eradicate the noise present in the image, an averaging filter of size has been applied. Afterwards, the Contrast-Limited Adaptive Histogram Equalization (CLAHE) has been applied to enhance the contrast of the grayscale retina image. The CLAHE operates on small areas of an image and enhances the contrast of each small area individually. After the

Swinburne University of Technology Sarawak Campus | Materials and Methods 82

THESIS – DOCTOR OF PHILOSOPHY (PHD) pre-processing, all the images are resized into . The algorithms for CLAHE and the image pre-processing steps are explained below (Zuiderveld 1994):

(a) Algorithm for CLAHE

Step 1: Divide the original image into non-overlapping sub-regions of size M×N. Step 2: Calculate the histogram of each sub-region. Step 3: Histogram of the sub-regions are clipped. The number of pixels in each sub-region is equally distributed to each gray level. Then, the average no. of gray value is calculated as

(4.4)

Where, = No. of average pixels

= No. of pixels in x-dimension of the sub-region

= No. of pixels in y- dimension of the sub-region

= No. of gray levels in the sub-region

The actual clip limit is calculated based on Eq. (4.1) as,

(4.5)

Where, = Actual clip limit

= Normalized clip limit in the range [0, 1]

The pixels in the original histogram are clipped when the number of pixels is greater than . So, the average pixel distribution in each gray level is computed using the total number of clipped pixels .

(4.6)

Using the above equations, the contrast limited histogram can be calculated using the following clipping rule:

Swinburne University of Technology Sarawak Campus | Materials and Methods 83

THESIS – DOCTOR OF PHILOSOPHY (PHD)

Where, and are histograms of the original image and clipped image for each sub-region at i-th level.

Step 4: After the above distribution, redistribute the remaining clipped pixels and step of the redistributed pixels is given by,

(4.7)

The method scans from minimum to maximum gray level to count all the pixels. If the number of pixels is less than , the program will distribute one pixel to that gray level. If all pixels are not distributed at the end of the search, the new step is calculated using Eq. (4.4) and start a new search until all the pixels are redistributed. Step 5: The intensity value in each sub-region is enhanced using histogram equalization. Finally, the pixels in the sub-regions are mapped using bilinear interpolation.

(b) Algorithm for image pre-processing

Input: Color Fundus retinal image. Output: Enhanced grayscale image of size 60×60. Step 1: Select the coloured retina portion of the fundus image as Region of Interest (ROI). Step 2: Crop the ROI from the whole fundus image. Step 3: Separate the red, blue and green channel of the cropped image. Step 4: Select the green channel of the image. Step 5: Apply an average filter of size 5×5 to remove the possible noise present. Step 6: Apply Contrast-Limited Adaptive Histogram Equalization (CLAHE) to augment the contrast of the retina image using the above-mentioned algorithm. Step 7: Resize the image to 60×60. Step 8: Write the image into TIF format.

Swinburne University of Technology Sarawak Campus | Materials and Methods 84

THESIS – DOCTOR OF PHILOSOPHY (PHD)

Figure 4.12. shows the pre-processing steps on the sample colour fundus images.

Normal Retina Green Channel Enhanced Image

DR affected Retina Green Channel Enhanced Image

Figure 4.12 Pre-processing Steps for Image Quality Enhancement

4.4.2. Designing CNN for Retinal Blood Vascular Diseases Classification

The popular deep CNNs like GoogLeNet, VGGNet, emanate the extra complexity and computational cost. As per the discussion on the limitations of these deep networks in Section 4.2.3., these models also require a huge amount of training data. Therefore, the main goal here is to design the CNN such a way that the network is simple yet effective on retinal fundus image even for smaller training datasets. Various architectures and advantages of CNN network have been exploited to design a novel CNN for learning retinal images so that it can extract the normal retinal features such as blood vessels, optic disc, fovea, etc., and abnormal features related to different diseases. The designed Convolutional Network is based on LeNet-5 architecture (LeCun et al. 1998). The general structure of LeNet-5 is as follows:

Input=>Conv=>Pool=>Conv=>Pool=>FC=>ReLU=>FC=>Output

Swinburne University of Technology Sarawak Campus | Materials and Methods 85

THESIS – DOCTOR OF PHILOSOPHY (PHD)

The proposed CNN architecture comprises five types of layers, viz., convolution layers, subsampling layers, non-linear layers, normalization layer, and fully connected layer. Let’s explain how the network has been designed and how it works. The network designing part has been explained from the basic level, i.e. from the perceptron itself. Then, gradually the hypothesis has been set to design the entire CNN and the functionalities of the chosen layers have been explained.

Perceptron:

In the basic model of the human brain, the dendrites carry the signals to the cell body or neuron, where all those signals get summed. When the final summation value exceeds a certain threshold, the neuron fires and sends a spike along its axon. In case of an artificial neural network, a perceptron is an artificial neuron that accepts multiple binary inputs, and produces a single binary output as:

� � ��

� ���� � �� � �� Output

�� � �

In the example shown above, the perceptron has three inputs , and

, are the weights. The output of the neuron is either 0 or 1, depending on whether the weighted sum ∑ is greater than or less than certain threshold value. The output of a neuron can be expressed as (Hopfield 1988):

∑ { (4.8) ∑

Now, if the weighted sum is represented as dot product, i.e. ∑ where and are vectors representing the weights and inputs, respectively. Then the threshold is moved to the other side of the inequality, and replaced by the perceptron's bias, . The bias is a measure of how easily the

Swinburne University of Technology Sarawak Campus | Materials and Methods 86

THESIS – DOCTOR OF PHILOSOPHY (PHD) perceptron can obtain an output equals to 1. By replacing the bias in place of the threshold, the rule of the perceptron can be rewritten as (Hopfield 1988):

{ (4.9)

By taking the concept of perceptron the CNN has been designed level by level with five different types of layers as mentioned earlier. The functionality of each of these layers has been explained below:

1) Convolution Layer:

Convolution is a mathematical operation where two signals are overlapped to form a third signal. If and are two discrete functions, the convolution of these two functions can be expressed as follows (Smith 2003):

∑ (4.10)

Now, in the convolution layer of CNN, small filters are convolved over the image. Let’s consider is an image and is a filter convolving the image, mathematically it can be expressed as follows (Smith 2003):

∑ ∑ (4.11)

The kernel or filter slides over the image and calculates a new pixel as a weighted sum of the pixels in every position it floats over. To understand the function of a convolution layer in the neural network, let’s consider a 1-dimensional convolution layer with inputs { } and outputs { } (Zurada 1992).

Swinburne University of Technology Sarawak Campus | Materials and Methods 87

THESIS – DOCTOR OF PHILOSOPHY (PHD)

The outputs can be described in terms of the inputs as (Zurada 1992):

(4.12)

Instead of multiple neurons A, let’s consider a single neuron for the instant. Now, a standard neuron in a neural network can be expressed as (Hopfield 1988):

(4.13)

Where are the inputs. The weights, describe how the neurons are associated to the given inputs. A negative weight depicts that an input prevents the neuron from firing; on the other hand, a positive weight triggers the neuron to fire. The weights control the behaviour of the neurons. The multiple neurons can be identical when the weights are same.

The convolution layer handles this wiring of neurons, recounting all the weights and which ones are identical. All the neurons in a layer can be described using the weight vector as follows (Hopfield 1988):

(4.14)

Where, is an activation function and is the bias. In convolutional layer, there exist several duplicates of the same neuron; therefore, many such weights appear in multiple locations.

Swinburne University of Technology Sarawak Campus | Materials and Methods 88

THESIS – DOCTOR OF PHILOSOPHY (PHD)

This corresponds to the equations (Hopfield 1988):

(4.15)

(4.16)

So while, generally, a weight matrix links every input to every neuron with different weights (Hopfield 1988):

(4.17)

[ ]

In case of the weight matrix for a convolution layer, the same weights appear in several different locations. Since the neurons do not connect with many possible inputs, there are plenty of null values or zeros. Therefore, it can be expressed as (Hopfield 1988):

(4.18)

[ ]

Swinburne University of Technology Sarawak Campus | Materials and Methods 89

THESIS – DOCTOR OF PHILOSOPHY (PHD)

Now, multiplying by the above matrix is similar to that of convolving with [ ] The filter sliding to different locations corresponds to having neurons at those locations.

Now, each filter in the convolution layer works as a feature identifier. The features mean the simplest characteristics that most of the images have in common with each other, for e.g., simple colors, straight edges, and curves. Let’s consider, the input image is of size and is convolved with K kernels/filters of size each. D represents the depth of the image, thus, the depth of the filter also has to be the same. When an input convolves with one kernel, one output feature gets generated. Consequently, the convolution of the input image with K individual kernels generates K features. Initiating from the top-left corner, each kernel spans the input image from left to right and top to bottom until it reaches the bottom-right corner. Along every stride, elements of the kernel are multiplied with the elements of the input. This element-by-element multiplication performed between the values of the kernel and receptive field of the input generates one value on each position of the kernel. Thus, multiply-accumulate operations are compulsory to produce one element of one output feature. For stride S and the number of zero padding Z, the output of a convolution layer is calculated as follows (Samer, Rishi & Rowen 2015) :

(4.19)

Thus, in a CNN, the n-th convolution layer generates feature maps using the equation (LeCun et al. 1998):

∑ ( ) (4.20)

where is the input to the convolution layer. is a weight of the convolution filters with the size for layer n. is a bias, is the output feature, and N is the total number of inputs used to produce the output . The function of

Swinburne University of Technology Sarawak Campus | Materials and Methods 90

THESIS – DOCTOR OF PHILOSOPHY (PHD) convolution layer has been shown in Figure 4.13. In the figure, the input is of size 5×5 and the filter size is 3×3, after convolving over a receptive field of size 3×3 from left to right and top to bottom, the generated feature map is of 3×3. In the deeper level, the size of the receptive field increases and the input size keep decreasing so that the filters with the large receptive field can build the meaningful features from the most significant features or the activation values. Figure 4.14 shows how the receptive filed of the input image increases in each level while size of the input decreases.

Figure 4.13: Convolution function

Figure 4.14: The size of receptive filed increases in the deeper level.

Swinburne University of Technology Sarawak Campus | Materials and Methods 91

THESIS – DOCTOR OF PHILOSOPHY (PHD)

2) Subsampling Layer:

A subsampling or pooling layer is used to downsample the features generated after each convolution layer. This layer significantly reduces the spatial dimension of the image, i.e., the length and the width, without changing the depth. It also makes the features robust against distortion and noise. The intuition behind pooling layer is that once an explicit feature is observed in the original input image, the exact location of that feature can be disregarded. Once the network learns the high activation value, it only cares about the relative location of those particular features to the other features. There are two main purposes of pooling layer: 1) It reduces the quantity of parameters or weights by 75%, resulting in less computational cost, 2) It can control overfitting. The pooling can be of 2 types: Max-pooling and Average pooling (Samer, Rishi & Rowen 2015). An average pooling compresses the activation values generated by the convolution layer by calculating the mean value in block. On the other hand, the max-pooling compresses the activation map by selecting the highest activation value in block. Figure 4.15 shows how max-pooling and average-pooling reduces the spatial dimension in the sub-sampling layer.

The max-pooling is sensitive to the existence of a particular pattern in the pooled region whereas, the average pooling measures the mean value of the existence of the pattern for the given region. Max-pooling is more sensitive to the most important features like edges whereas, an average pooling makes the internal representation of the features smooth and blurry. For this research, the objective is to analyse the retinal image and detect the abnormalities related to blood vascular diseases. It is crucial to detect the structural change in the blood vasculature and for that edge detection is important. The clinical signs associated with DR and RVO are extreme. A Max-pooling operation takes the highest activation value, which depicts the most significant feature within the block. In addition to that, it makes the features “translation invariant”. That means the same pooled features within the hidden layer remain active even when the image undergoes certain translation. Therefore, the max- pooling operation has been used to down-sample the features in the designed CNN for retinal image analysis. The max-pooling layer basically contains a group of filters, which is about the same length as the stride taken. The filters convolve over the input volume (output of a convolution layer) and select the maximum value in every sub-

Swinburne University of Technology Sarawak Campus | Materials and Methods 92

THESIS – DOCTOR OF PHILOSOPHY (PHD)

Figure 4.15: Function of Sub-sampling Layer

region that the filters convolve around using the following equation (Krizhevsky, Sutskever & Hinton 2012):

( ) (4.21)

Where, is a weight and is bias. The output of pooling layer can be calculated by the following equation (Samer, Rishi & Rowen 2015):

(4.22)

Where, W is the size of the input image, F is the size of the filter and s is the stride. The pooling layer helps to gradually increase the size of the receptive field in the convolution layer in the deeper level.

3) Non-Linear Layer:

A network requires non-linear activation functions to trigger certain features. It activates a certain function to signal the distinctive identification of prospective

Swinburne University of Technology Sarawak Campus | Materials and Methods 93

THESIS – DOCTOR OF PHILOSOPHY (PHD) features on each hidden layer. Another main purpose of a non-linear layer is to add non-linearity in the data so that it becomes comparable to the non-linear natured data of the real world. Non-linearity can be added using mainly three types of activation functions, such as Sigmoid, Hyperbolic Tangent (Tanh), and ReLU (Rectified linear Unit). A Sigmoid activation function is a special case of logistic function and can be defined as follow (LeCun et al. 2012):

(4.23)

Sigmoid has S-shaped curve and it “squashes” the real valued numbers into the range 0 to 1. Fig. 4.7 shows the sigmoid function and its derivatives. It can be observed that the gradient is very high between the range [-3, 3]. This means that within this range any small change in the value x can fetch about a large change in the value of Y. Therefore, the function essentially attempts to shove the Y values towards the extremes. This is a very desirable and advantageous quality when it is required to classify the values to a specific class. However, Sigmoid kills the gradient. From Figure 4.16, it can be observed that the function is flat beyond +3 and -3, therefore, once the function approaches that zone the gradients become very small. At this point the network stops learning as the gradient is approaching to zero. Another problem is that the values of the function only range from 0 to 1, which means the sigmoid function is asymmetric around the origin and all the received values are just positive values. It is an undesirable property as the neurons in subsequent layers processing in the CNN would be receiving data that is not zero-centred.

The Tanh function is similar to the Sigmoid function, except the fact that its value ranges from -1 to 1. It can be defined as follows (LeCun et al. 2012):

(4.24)

Figure 4.17 shows the tanh function and its derivatives. It can be observed that tanh has stronger gradients, since data is centred on 0, the derivatives are higher, and tan

Swinburne University of Technology Sarawak Campus | Materials and Methods 94

THESIS – DOCTOR OF PHILOSOPHY (PHD)

Figure 4.16: Sigmoid Function and its derivatives (Isaac Changhau)

Figure 4.17: Tanh Function and its derivatives (Isaac Changhau)

-evades bias in the gradients. However, just like Sigmoid function, tanh function also suffers from vanishing gradient problem.

The Rectified Linear Unit (ReLU) is simple non-liner activation function. The value of the ReLU ranges from 0 to ∞. In comparison to sigmoid or tanh functions that include expensive operations (exponentials, etc.), the ReLU can be applied by simply thresholding a matrix of activations at zero. A ReLU performs a function (Glorot, Bordes & Bengio 2011):

Swinburne University of Technology Sarawak Campus | Materials and Methods 95

THESIS – DOCTOR OF PHILOSOPHY (PHD)

(4.25)

i.e. if and if . Figure 4.18 shows the ReLU activation and its derivatives. Since the gradient has a constant value when , it reduces the likelihood of vanishing gradient unlike sigmoid and tanh functions. The effect of ReLU activation is shown in Figure 4.19.

Figure 4.18: ReLU function and its gradients (Isaac Changhau)

Figure 4.19: Effect of ReLU

Swinburne University of Technology Sarawak Campus | Materials and Methods 96

THESIS – DOCTOR OF PHILOSOPHY (PHD)

In the designed CNN, ReLU has been chosen as a non-linear layer. To begin the learning process, the weights of the network are randomly initialized. Generally, continuous trigger non-linear functions like sigmoid and tanh cause almost all the neurons to fire in an analog way, which processes all the activations to describe the output of the network at the same time. This makes the activations dense and costly for the network. On the other hand, ReLU allows almost 50% of the network to yield 0 activations. Basically, this activation function turns all negative activations in the feature map into zero. This helps the network to generate sparse representation of the features. The sparsity results in concise model of the features and often has better predictive power and less overfitting. It is a very essential quality because it is more likely that the neurons are processing the meaningful aspects of the problem. Hence, it can help the network to build more meaningful features from the retina image using less number of training sets. One of the major objectives of this research is to design simple CNN model for retinal image analysis and use of ReLU will make the network less complicated and will also help in faster training. However, ReLU has the problem of dead neuron. When , some of the activations go to zero activation and then never recover from that. Therefore, some of the neuron dies during the training.

4) Normalization Layer:

Additionally, a normalization layer is selected for designing the CNN as it handles the internal covariate change and helps to train the network faster. It also helps to avoid overfitting. A batch normalization layer regularizes each input channel through a mini-batch. For a mini batch, at first, the activations of each input channel are regularized by subtracting the mean of the mini-batch and dividing by its standard deviation. Then, the input is shifted by a learnable offset β and is scaled by a learnable scale factor γ. The idea is to normalize the inputs of each layer in such a way that they have a mean output activation of zero and standard deviation of one (Ioffe & Szegedy

2015a). For an input mini-batch { } and are the parameters to learn. The output of the layer is given by (Ioffe & Szegedy 2015b):

{ } (4.26)

Swinburne University of Technology Sarawak Campus | Materials and Methods 97

THESIS – DOCTOR OF PHILOSOPHY (PHD) where,

̂

̂

5) Fully Connected Layer:

The last final layer is usually a fully connected layer. A fully connected layer is a multilayer perceptron that mainly searches some high-level features, which are most strongly associated with a particular class, by calculating probabilities for all the available classes. The neurons in a fully connected layer have full connections to all activations generated from the previous layers. As explained in the Perceptron, these activations are calculated through matrix multiplication followed by a bias offset. The activation maps generated from the previous layers, mainly convolution and pooling layers, signify high-level features of the input image. Let be the m-th input map of the output layer, and then the linear combination of the output can be defined by:

∑ (4.27)

Where M = 1024 for the proposed CNN and as shown in L5M3R1 of Fig.4.4. The main objective of the fully connected layer is to analyze these high features and classify the input image into various target classes based on the training dataset. Basically, fully connected layer takes the output of the preceding convolution or ReLU or pool layer and creates a K-dimensional vector, where K is the number of target classes (shown in Figure 4.20). An activation function is utilized to look for the Swinburne University of Technology Sarawak Campus | Materials and Methods 98

THESIS – DOCTOR OF PHILOSOPHY (PHD) high level features those are explicitly correlated to a particular class having particular weights. The activation function ensures that the product between the output generated by the previous layer and the weights produces the correct probabilities for the different classes. Different activation functions can be used in the fully connected layer for computing the probabilities such as Sigmoid, Tanh, Softmax etc. The details of the Sigmoid and Tanh activation have been already discussed in the non-linear layer. In this research, a Softmax classifier has been employed in the output layer of the fully connected layer of the proposed architecture of CNN to predict the probability of the input image, which belongs to a particular label. The Softmax function compresses an input vector of arbitrary real-valued scores into to a vector of values between one and zero that sums to 1. The probability distribution of the input data over K different classes is predicted by the softmax function as (Simonyan et al. 2016):

(4.28) ∑

Figure 4.20: Function of Fully Connected Layer

Swinburne University of Technology Sarawak Campus | Materials and Methods 99

THESIS – DOCTOR OF PHILOSOPHY (PHD)

The Softmax activation is particularly selected for the proposed model, because Softmax is useful for multiclass problem where classes are mutually exclusive. In the research, one of the major objectives is to classify different types of RVO and the three types of RVO are mutually exclusive. Especially, HRVO possesses the features of both CRVO and BRVO, which makes the classification task challenging. The sigmoid function outputs marginal probabilities and therefore, it can be used for multiple-class classification as well, but only when the classes are not mutually exclusive. Additionally the softmax layer is also resilient to outliers. A problem with sigmoids is that when the derivatives reach saturation (values get close to 1 or 0), the gradients vanish. This is detrimental to optimization speed and softmax doesn't have this problem. Tanh function possesses the similar limitations like sigmoid.

4.4.2.1. Design Hypothesis

Initially, the LeNet structure has been explored and exploited for diagnosing retinal blood vascular diseases. In the general LeNet structure, there are 2 convolution layers and 2 max-pooling layers, 1 non-linear layer, and 2 fully connected layers. Now, according to the “no-free-lunch theorem”, if a learning algorithm works well with one kind of data, it will work poorly with other types of data. Therefore, to work with the retina image datasets, the parameters and hyperparameters have to be changed or adjusted. The input image is of size after pre- processing. To process the whole input image, the receptive field size in each convolution layer is kept . After the first convolution, an activation map or feature map of size is generated using Eq. 4.19. The second layer is a max-pooling layer. The input for the pooling layer is the output of the first convolution layer. Using Eq. 4.22, the generated output volume is . The depth of the output volume remains same, but the spatial size becomes half of the input volume. Keeping the filter’s size same, the numbers of filters is increased to 490. After the second convolution, an activation map of is generated. This output volume is further downsampled to by the second max-pooling layer. In the fully connected layer, the filter size is with 1024 filters, therefore, the generated feature map is of size . The architecture of CNN with input size

Swinburne University of Technology Sarawak Campus | Materials and Methods 100

THESIS – DOCTOR OF PHILOSOPHY (PHD)

and filter size is shown in Figure 4.21. However, this network failed to diagnose diseases properly. This network is efficient for simple binary classification only. It can diagnose DR and No-DR images and RVO and No-RVO images. It shows poor performance in detecting mild NPDR and detecting types of DR and RVO. The main objective here is to detect the retinal disorders at the earliest stage and to detect their variants. Again, the use of the very deep network is not desirable as discussed in Section 4.3.2. Therefore, the hypothesis for designing CNN can be stated as follows:

While designing a network, it is important to know the target problem. The problem to solve can be generalized into a hypothesis space where the aim is to find the best hypothetical solution for the target problem. Each hypothesis in the space has certain dimension and the learning model adjust these variables to find the best hypothesis in the space. Now, if the number of dimension increases the effective size of the hypothesis space also increases exponentially. Therefore, it is troublesome to search the higher dimensional hypothesis space (Dulek 2013). The continuous hypothesis space leads to the Hughes Phenomenon, which states that “The accuracy of the resulting hypothesis is substantially reduced as the dimensionality increases and the number of training examples is kept constant” (Hughes 1968). Therefore, to find the best hypothesis, either the more training sets have to be acquired or some dimensions have to be reduced. In the medical field, it is an issue to get large number- of labelled training example. The feasible solution is to design optimized learning model to find the best hypothesis with fewer parameters or dimensions. Therefore, to design a network, it is very important to choose the parameters and hyper-parameters carefully. Now, it is crucial to understand how many layers to use, how many Convolution layers are required, what should be the filter size or the value for stride and padding should be taken. There is no standard set of rules for choosing parameters, hyper-parameters, and organization of layers to fulfil the application requirement. This is because the network mostly depends on the type of data in a particular application. The data can be diverse in terms of size, the complexity of the image, type of image processing task, type of acquired image, and more. Therefore, it is required to decide on these factors empirically so that the designed CNN can correctly extract the desired features and detect the retinal abnormality.

Swinburne University of Technology Sarawak Campus | Materials and Methods 101

THESIS – DOCTOR OF PHILOSOPHY (PHD)

Figure 4.21: CNN structure with Filter Size

The general rule to improve the performance of the network is to increase the size of the network, which includes both increasing number of the working levels and the number of units at each level (Szegedy et al. 2014). The major drawback of this rule is that with the increase of size, the number of the parameters increases too. Let’s consider, is the depth of the network for N levels, k is the number of kernels in the convolution layer and m is the size of each kernel, and d is the depth of the input image. Then, the number of parameters in the convolution layer is . Suppose, bias is 1 for each image, then the total number of parameters after a convolution layer is ( . Similarly for the next convolution layer for l features, the number of parameters will be ( . Therefore, the number of parameters for a network can be calculated as:

∑ (4.29)

Swinburne University of Technology Sarawak Campus | Materials and Methods 102

THESIS – DOCTOR OF PHILOSOPHY (PHD)

The depth of the CNN is directly proportional to the number of parameters, which can be expressed as follows:

(4.30)

The large networks are prone to overfit, especially when the number of labelled training examples is limited. In this research, one of the main objectives is to diagnose RVO and detect all its variants. Since RVO detection is a new field, therefore, there is limited number of labelled data available for RVO. In addition to that, the increased network size also increases the computational cost. The deepness of the CNN depends on the input size, filter size, number of padding, and stride. The choice of these parameters is mostly heuristic depending on the application requirement.

The ultimate goal of a network is to discompose the input image and generate a 1-dimensional vector in order to calculate the probabilities of the features and classify the image into the target class. While designing a CNN, the two most important factors are the input size and the filter size. If the aim is to design optimized network, the input size, receptive filed size, and the filter size should be taken care of. The input size plays the pivotal role as it decides the dimension of the hypothesis space. If the input image is very large, then a high dimensional hypothesis space is generated. For e.g. if the input size is 1980×2800, then we are dealing with 5,544,000 dimensional space. Again, along the network the dimension of this feature space further increases with the number of filters in each layer as shown in Eq. 4.26. It can be imagined how large the dimension of the hypothesis space can grow throughout the network. This leads to the curse of dimensionality. This is the point where we can relate to the Hughes phenomenon mentioned above. The performance of a network or classifier increases because of the large feature space, which is the main principle of the deep learning networks. However, when the training samples are same or limited, the performance degrades after reaching certain optimal point. It is difficult to completely get rid of curse of dimensionality problem in deep learning. However, one can make an effort to avoid this situation as much as possible. Therefore, to design an optimal learning model, a smaller square sized input image is preferred. Again, the input size should be divisible by 2 as it is convenient to downsample the feature set in the later stages. If the input size is very large and filter size is very small, then larger stacks of convolution layers will be needed to span the entire image, then number of

Swinburne University of Technology Sarawak Campus | Materials and Methods 103

THESIS – DOCTOR OF PHILOSOPHY (PHD) layer will increase, hence the parameters. Again, if the size of the filter is large, then small stacks of large filters will be stacked. Therefore, it is important to balance the size of the input and the filter in order to get the desired depth of the network. This can be expressed using the following rule:

{ (4.31 ) Rule:

In this research, by following the rule, the input size is chosen to be 60×60. The length and the width of the input image are kept same and square size because the receptive field’s dimension is always square size. If the length and the width are kept different, then, some of the pixels might not be covered by the filter and therefore, some of the pixel features might be left out during the convolution process. If the size of the input image is too large, then it is required to define a larger filter size in order to maintain the depth of the whole network. It is preferable to stack small size filters. There are some disadvantages of small stacks of large size filters. Because, stacking convolution layers with small filters in contrast with having one convolution layer with big filters helps in extracting more powerful features from the input, and requires fewer parameters. Therefore, the convolution filter size is kept , neither too small nor too large. Now, to decompose the input image of size 60×60 using filters into a 1×1 vector, the number of convolution layer required is 5. In order to reduce the size of the feature maps, 3 max-pooling layers have been used after the 1st, 2nd and 3rd Convolution layer. Normalization layers are included after first 3 Convolution layers to handle the internal covariate change. Rectified Linear Unit (ReLU) is included after the 4th Convolution layer to include non-linearity in the network by discarding all the negative valued activations. For the 5th Convolution layer, the filter size is taken the same as the input size to finally break down the input retina image into a 1-dimensional vector. Lastly, fully connected layer reads all the features generated from previous layers and calculates the probability score using the Softmax activation function. Therefore, finally, the designed CNN is 13 layers deep, comprising 5 Convolution layers, 3 Max-pooling layers, 3 Batch normalization layers, 1 Rectified Liner Unit (ReLu) layer, and 1 Fully Connected layer. The topology of the proposed CNN, shown in Figure 4.22, is described below. Table-4.1. describes the proposed CNN configuration.

Swinburne University of Technology Sarawak Campus | Materials and Methods 104

THESIS – DOCTOR OF PHILOSOPHY (PHD)

Figure 4.22 Topology of the Designed CNN

4.4.2.2. Network Topology

• : Input layer is with an input pre-processed abnormal/normal retina image of size [ ].

• : The first hidden layer consists of one convolution layer, one pooling layer, and a batch normalization (BN) layer. The convolution layer contains convolution filters of size [ ] and the max-pooling (MP) layer contains filters of size [ ]. This layer transmutes the input data into a size of [ ] after convolving and down-sampling using Eq. (4.19) and Eq. (4.22). In this layer, the network learns filters those get activated by simple visual features such as an edge of the blood vessels or a blotch of the optic disc. Thus, the first hidden layer extracts simple low-level features such edges, curves, and simple color variation in the grayscale retinal image. Swinburne University of Technology Sarawak Campus | Materials and Methods 105

THESIS – DOCTOR OF PHILOSOPHY (PHD)

• : The second hidden layer is composed of a convolution layer with Conv filters of size [ ], a max-pooling layer of size [ ] and batch normalization. This layer transforms the output of the first hidden layer into a size of

[ ] high-level features with the help of Eq. (4.19) and Eq. (4.22). The input for this layer is the activation maps produced by the previous layer. Therefore, each layer of the input describes certain low-level features appearing at various locations in the original image. Now, when a set of filters is applied over them, the resultant activations represent the higher level features. The previously extracted low-level features are combined to represent slightly higher level features, such as semicircles (combination of a straight edge and curve) or squares (combination of several straight edges). Thus, after second hidden layer features such as semicircles of the optic nerve, haemorrhages, cotton wool spots, etc., and different shapes of blood vessels are extracted.

• : The third hidden layer is devised with Conv filters of size [ ], a max-pooling layer with filter size [ ] and batch normalization layer. This layer transforms the output generated by the second hidden layer into a size of

[ ] high-level features using Eq. (4.19) and (4.22). The convolution layer in this hidden layer extracts more complex features, e.g. shape of the optic disc, different haemorrhages, and bright lesions and the blood vascular structure, from the activation map generated after second hidden layer.

• : The fourth hidden layer is comprised of convolutional filters of size [ ] and a Rectified Linear Unit (ReLU) layer to add non-linearity into the system and to make the training process faster. This layer changes the activation map generated by the previous layer into a [ ] feature map using Eq. (4.19) and Eq. (4.25). The numbers of kernels are further increased so that they can extract more detailed features. By the end of this hidden layer, some of the filters activate when there is a bright spot in the image, filters that see the large circular optic disk, filters that activate when they see dark lesions and some filters that activate when they see the different shapes of the blood vessels.

: The final hidden layer, which is a fully connected layer, is formed by convolving over the activation map of the previous layer using convolutional filters of size [1×1×1024]. This layer contains only one map of neurons, which denotes

Swinburne University of Technology Sarawak Campus | Materials and Methods 106

THESIS – DOCTOR OF PHILOSOPHY (PHD) the classes/types of images. This layer creates a network with full connections to the input data and computes the probabilities for it to be one of the classes using Eq. (4.24). A Softmaxloss function is used here as a classifier to predict the label using softmax activation as shown in Eq. (4.25). It looks at activation maps of high- level features, which is the output of the preceding hidden layer and determines which features most correlate to the target class of retinal abnormality. For e.g., if the algorithm is meant for predicting the normal retinal images, it will have high values in the activation maps, which represent high-level features such as optic disk, blood vasculature structure, macula region. Similarly, when the program is predicting the retina images with the particular abnormality (DR/RVO), it will have high values in the activation maps that represent high-level features, such as microaneurysms, haemorrhages, dilated tortuous veins, bright lesions, abnormalities in blood veins etc.

Table 4.1 The Proposed CNN Configuration for RVO Detection

No. of Size of Feature Size of No. of No. of Layer Types Filters Map Kernel Stride Padding Input Layer - 60*60*1 - - -

CL1 (Conv layer-1) 60 60*60*1 5*5 1*1 0*0 BN1 (Batch - 56*56*60 - 1*1 0*0 Normalization layer-1) M1 (Max-pooling 1 56*56*60 2*2 2*2 0*0 layer-1) CL2 (Convolution 300 28*28*60 5*5 1*1 0*0 layer-2) BN2 (Batch - 24*24*300 - 1*1 0*0 Normalization layer-2) M2 (Max-pooling 1 24*24*300 2*2 2*2 0*0 layer-2) CL3 (Convolution 512 12*12*300 5*5 1*1 0*0 layer-3) BN3 (Batch - 8*8*512 - 1*1 0*0 Normalization layer-3) M3 (Max-pooling 1 8*8*512 2*2 2*2 0*0 layer-3) CL4 (Convolution 1024 4*4*512 4*4 1*1 0*0 layer-4) R1 (ReLU layer-1) - 1*1*1024 - - - CL5 (Convolution U (no. of 1*1*1024 1*1 1*1 0*0 layer-5) classes) Softmax Layer - U*1 - - -

Swinburne University of Technology Sarawak Campus | Materials and Methods 107

THESIS – DOCTOR OF PHILOSOPHY (PHD)

In the network, each filter in the convolution layers acts as a feature extractor. Therefore, the Conv-1 extracts 60 features, Conv-2 extracts 300, Conv-3 extracts 512, and Conv-4 extracts 1024 features. Figure 4.23 shows the visualization of the output data for CL1, M1, CL2, M2, CL3, M3 using first 8 filters. The data visualization of Conv-4 is not shown, as the output of Conv-4 is of size. From Figure 4.23 it can be seen how each filter extracts different features from the image pixels. In addition to that, it can be observed how the size of the data gets reduced in each layer and fully connected layer forms a 1 array at the end of the network as mentioned in Section 4.3.2.1. After that, the classifier predicts the class based on the learned features or activations and classify the input image into the target classes using the highest probability values.

Figure 4.23 Output Data Visualization

Swinburne University of Technology Sarawak Campus | Materials and Methods 108

THESIS – DOCTOR OF PHILOSOPHY (PHD)

4.4.2.3. Learning Algorithm

The most important aspect of a supervised learning model is the training and the proper learning algorithm. Initially, the filters in the first convolution layer do not know to look for edges and curves. The filters in the higher layers do not know to look for optic disk, blood vessels, and abnormal lesions. It is essential that the designed network learns how to make the filters in each hidden layers to activate for the desired features, how to quantify all the clinical features to detect either DR or RVO and their types, how to understand the difference between DR and RVO, and correctly identify the disease irrespective of their common lesions and similar changes in the retina. Therefore, a backpropagation method is used so that the proposed CNN learns to look for particular filter values and update when needed. For that, the network is trained with training sets containing the images of each target class (for e.g., DR and normal retina, RVO and normal retina, DR and RVO and normal retina) and each of the images has a proper label specifying the type of the retinal image. The backpropagation process can be divided into four distinct phases: forward pass, loss function, backward pass, and weight update. The whole classification process relies on the two key components: a score function and a loss function. The score function maps the raw image pixels to class scores, and the loss function quantifies the difference between the predicted scores and the ground truth labels. Then, it is considered as an optimization problem, where the main objective is to minimalize the loss function with respect to the parameters of the score function.

During the forward pass, the training images are passed through the entire network. Initially, the weights of the filters are randomly initialized. The fully connected layer generates a predicted class score by mapping the raw image pixels.

Let’s consider, for i-th example of the given input image , the label is . If s is the score function, the classifier computes the vector of the class scores. Then, the score function can be given by (Simonyan et al. 2016),

(4.32)

Swinburne University of Technology Sarawak Campus | Materials and Methods 109

THESIS – DOCTOR OF PHILOSOPHY (PHD)

As mentioned in the previous section, in the proposed CNN, a Softmax classifier is exploited to predict the probability of the input image, which belongs to a particular label. Thus, the Softmax classifier calculates the predicted score using Eq. (4.25). In the first batch, the output does not give preference to any particular number. With the initial weights, the network is unable to look for the low-level features and that is why the network is unable to make any rational conclusion about the prediction result.

Now, the loss function part of the backpropagation handles the next situation. Since the training data have both images and the labels, the loss function measures the quality of a particular set of parameters (weights) based on how well the predicted scores matched with the ground truth labels. When the test label does not match with the training label, the error/loss occurs. To calculate the error in prediction, the Softmax classifier uses Cross entropy loss, which can be expressed as follows (Simonyan et al. 2016):

( ) ∑ (4.33) ∑

Where, is the j-th element of the vector of class scores , which is the softmax function as shown in Eq. (4.28). The full data loss for the dataset is the average value of over all training examples together, let’s say N, can be expressed as follows (Simonyan et al. 2016):

∑ (4.34)

Again, a cross entropy between ground truth value and an estimated value can be defined as (Simonyan et al. 2016),

∑ (4.35)

The loss is evaluated for the individual batches during the forward pass. For the first few training images, the loss is very high. Now, the objective is to get to a

Swinburne University of Technology Sarawak Campus | Materials and Methods 110

THESIS – DOCTOR OF PHILOSOPHY (PHD) point where the predicted label by the convolutional network is the same as the training label, that means the designed network is able to make the correct prediction. To attain that point, it is required to minimize the amount of loss. The Softmax

classifier tries to minimize the predicted value p in Eq. (4.35), where , from ∑ Eq. (4.33). For that, it is important to find out which inputs, i.e. weights contributed to the loss or error of the network to the most. To determine which weights caused the most loss and find ways to adjust them to reduce the loss, a backward pass is performed. To optimize the loss, the Gradient Descent algorithm (LeCun et al. 1998; Simonyan et al. 2016) is used. The principle of the Gradient Descent algorithm is that it gradually proceeds towards a local minimum of the function by taking steps that is proportional to the negative of the gradient of the function at the current position (shown in Figure 4.24). To find the error or loss, the derivative or the gradient of the cost function is calculated with respect to the weight and then, each weight is changed by a small increment in the negative direction to the gradient. The gradient can be mathematically expressed as , where W defines the weights at a particular layer. If error function can be simply expressed as:

( ) (4.36)

( ) ∑

= ( )

To reduce the error/loss L by gradient descent, the weights are moved or incremented in the negative direction to the gradient . After computing this derivative, the weight update is the final step, where all the weights of the kernels are updated so that they alter in the direction of the gradient.

Here, the weights are updated using Windrow-Hoff Learning Rule or the Delta learning rule:

Swinburne University of Technology Sarawak Campus | Materials and Methods 111

THESIS – DOCTOR OF PHILOSOPHY (PHD)

Figure 4.24 : Gradient Descent Principle

( 4.37)

Where, is the weight, is initial weight, and is the learning rate. The weight changes need to be applied repetitively for every weight present in all the layers of the network and for every training image in the training set. When all the weights for the whole training set are passed through the entire network, it is called one epoch of training. After several epochs, all the become zero as the output of the network match with all the target training patterns, and the training process terminates. Then, it can be said that the training process has converged to a solution. In this research, for training the network, mini-batch gradient descent algorithm is used to compute the error and update the weight. In mini-batch gradient descent, the training set is split into mini batches of 10 and then the gradients are computed over each mini batch for the entire training set. Now, the learning rate plays a pivotal role. This hyper-parameter that controls the amount of weights that has been adjusted in the network with respect the loss gradient. The effect of learning rate can be seen in Figure 4.25 . If the learning rate is set too low, the gradient descent is slow. That means training progress will be very slowly as the network keeps making very small updates to the weights. However, if the learning rate is set very high, the gradient

Swinburne University of Technology Sarawak Campus | Materials and Methods 112

THESIS – DOCTOR OF PHILOSOPHY (PHD)

Figure 4.25: Effect of large and small learning rate

-descent might overshoot the local minima, therefore, it can cause undesirable divergent behavior in the loss function (LeCun et al. 1998). Therefore, for training the proposed network, the learning rate is empirically set on 0.0001.

This algorithm repeats the cycle of the forward pass, loss calculation, backward pass, and weight update for 2000 epochs for each 10 mini batch of training images in each individual task. Once the parameter update is finished on the last training example, the network is trained well enough to tune the weights of each layer correctly and learn the features of retina images for predicting the possible retinal blood vascular diseases.

At the end of the training process, the network learns the filters in the first hidden layer when they are activated by the visual features such as an edge of the blood vessels or a curve of the optic disc. So, the first hidden layer extracts simple low-level features like edges, curves, and simple color variation in the grayscale retinal image. The input for the second hidden layer is the resultant activation maps of the first hidden layer. After the second hidden layer, features such as semicircles of the optic nerve, haemorrhages, cotton wool spots, different shapes of blood vessels, etc. are extracted. The convolution layer in the third hidden layer extracts more complex features, e.g., the shape of the optic disc, different haemorrhages, and bright lesions and the blood vascular structure, from the activation map generated after the Swinburne University of Technology Sarawak Campus | Materials and Methods 113

THESIS – DOCTOR OF PHILOSOPHY (PHD) second hidden layer. By the end of the fourth hidden layer, some of the filters activate when there are bright spots in the image, filters that notice the large circular optic disk, filters that activate when they see bright or red lesions and some filters that activate when they see the different shapes of the blood vessels. The final hidden layer looks at the high-level features that are most strongly associated with a particular class of DR and/or RVO, and have particular weights such that the product of the weights and the previous layer’s activation value generate the correct probabilities for the different classes.

Now, the theorem for deep learning can be stated as follows (Haeffele & Vidal 2017):

Assumptions:

 For fixed weight w, is the function which associates the inputs of a network with its output. The deep network is and the regularization loss is , and both of them are summation of positively homogeneous functions with the same degree.  is the loss function and in a compact set S, the loss function is convex and once differentiable in x.

Then:

 Any local minimum for ( ) such that a subnetwork of has zero weights, is a global minimum (Theorem 1).  Above a critical network size, from any initialization, the local descent will always converge to a global minimum (Theorem 2)

The designed CNN is composed of convolution layers, max-pooling, ReLU, and fully connected layer, where ReLU and the linear layers are positively homogeneous functions. The theorems only hold if is positively homogeneous in w of the same degree as . The degree of in CNN increases with the number of layers. Now, the employed batch normalization corresponds to a positively homogeneous regularization function of the same degree as . Again, the Theorem 1 explains how the dyeing neuron situation in ReLU is a feature rather than an issue. Under the hypothesis of Theorem 1, if the learning algorithm has reached a minimum

Swinburne University of Technology Sarawak Campus | Materials and Methods 114

THESIS – DOCTOR OF PHILOSOPHY (PHD) and a full subnetwork has died, then the algorithm provably found a global minimum. Therefore, the dead neurons only remove the overheads of unwanted neurons.

4.4.3. The Novelty of the Proposed CNN

The novelty of the proposed method for retinal abnormality detection lies in the network designing part. The proposed CNN is a novel architecture specifically designed for retinal blood vascular disease, such as Diabetic Retinopathy and Retinal Vein Occlusion, for fast-track treatment. In Section 4.2.3., the challenges in using deep learning models have already been disused. The popular CNN used in the state- of-the-art retinal abnormality detection are computationally expensive, requires a large amount of memory and time. Therefore, to overcome those previously mentioned challenges, in this research a novel CNN has been proposed by following the Occam's Razor principle: “use the least complicated algorithm that can address your needs and only go for something more complicated if strictly necessary”. The main effort has been put towards the design strategies and training algorithm, and the novel network based on LeNet-5 has been built. The hypothesis of building a simple CNN has been set, which suggests the careful selection of the input image size, filter size, organization of layers by considering the dataset and the objective of the particular task at hand. The relations among input size, filter size, network size, and the number of parameters have been formulated. Considering the dataset of retinal images of DR and RVO, one way to think about how to choose the hyper-parameters is to find the perfect combination that generates abstractions of the image at an appropriate scale. Therefore, the empirically chosen parameters/hyper- parameters and the right combination of layers to design the CNN have made the proposed CNN a novel architecture for diagnosing the retinal disease. The proposed CNN architecture is an efficient, simple model that can diagnose retinal image and detect disease at high accuracy with less number of layers, less memory requirement and less training samples. The input size for the proposed CNN is small; therefore, it takes less number of hidden layers to dissolve the input image into a 1x1 vector by the fully connected layer. And, thus, it takes less time to train and test in comparison to the existing state-of-the-art techniques. Because of the selected low learning rate, it is convenient to train the network more deeply and extract minute details of the feature. As there is less number of hidden layers and input is of smaller size, better

Swinburne University of Technology Sarawak Campus | Materials and Methods 115

THESIS – DOCTOR OF PHILOSOPHY (PHD) performance can be achieved with this smaller and faster network. Therefore, the proposed CNN architecture is a computationally inexpensive novel model for diagnosing retinal blood vascular diseases.

If compared to LeNet-5 architecture (shown in Figure 4.26), it contains 7 layers, whereas the proposed CNN contains 13 layers. LeNet-5 uses 3 convolution layers to extract features, on the other hand, the proposed CNN extracts more detailed features using 5 convolution layers. Instead of average pooling as used in LeNet-5, max-pooling is used in the designed CNN to down-sample the feature map generated by convolution filters; because the max-pooling selects the features with the highest activation values within a sub-block. When a feature appears in a particular location, it is less likely that it will again appear within the small neighbourhood. Therefore, once a maximum value is generated for a particular feature, there is no point of averaging the neighbouring values. Thus, the internal visualization of feature representation, using max-pooling, becomes more distinct, unlike the average pooling that makes the internal representation blurry as it takes the mean value within the sub- block. In LeNet-5, Sigmoid activation is used, whereas in the designed CNN, ReLU (Rectified Linear Unit) activation is used as a non-linear layer. As mentioned in the Section 4.3.2., ReLU can be defined by the function , where . One major benefit of ReLU is that it reduces the likelihood of the vanishing gradient. The vanishing gradient problem arises when . In this scheme, the gradient has a constant value, which helps in faster training. In contrast, the gradient of sigmoid activation gradually decreases with the increase in the absolute value of x. The other advantage of ReLU is sparsity. Sparsity occurs when . Therefore, the existence of more such units in a layer increases the sparsity in the resulting representation. On the other hand, Sigmoid functions are always likely to generate some non-zero values, which lead to dense representations. Thus, sparse representations are more advantageous than the dense representations. In the output layer of LeNet-5, Radial Basis Function (RBF) is used for classification. The Tanh activation is used to squash real-valued numbers in the range [-1, 1]. Then, RBF unit computes the Euclidean distance between the input vector and its estimated vector. In probabilistic terms, the output of the RBF can be considered as the un-normalized negative log-likelihood of a Gaussian distribution in the space of feature representation of the output layer. For a given input pattern, the Mean Square Error

Swinburne University of Technology Sarawak Campus | Materials and Methods 116

THESIS – DOCTOR OF PHILOSOPHY (PHD)

Figure 4.26 The Architecture of LeNet-5

-(MSE) is used as loss function to get the representation of output layer as close as possible to the estimated vector of the RBF corresponding to the desired class of the pattern. However, the problem with the Tanh activation is same as that of the Sigmoid activation, it also suffers from gradient loss. On the other hand, Softmaxloss function is used in the designed CNN, which is a simple function that calculates the probability of the predicted value. The softmax activation squashes real-valued number in the range [0, 1] and sum of all values are always 1. This is helpful to get a probability distribution, hence get the best answer. Therefore, here Cross Entropy Loss (CEL) is used for error measurement. Cross-entropy is a specific error measurement that is valid when it is required to calculate the probability distribution, which means all probabilities sum to 1. MSE gives too much emphasis to the incorrect outputs, whereas CEL gives more emphasis on correct output. The detailed comparison of LeNet-5 and proposed CNN is provided in the Table-4.2.

The proposed CNN is further compared with the other CNN models used for DR detection and is shown in Table-4.3. From the table, it can be seen that the existing models used for DR diagnosis are mainly inspired by VGG-16 model. These models (Doshi et al. 2016; Chandore 2017) are mainly fine-tuned models for their

Swinburne University of Technology Sarawak Campus | Materials and Methods 117

THESIS – DOCTOR OF PHILOSOPHY (PHD) respective objectives. The input size, layer’s organization, activation functions, parameters and hyper-parameters such as the size of the filters, padding, and learning rates have made each architecture novel in their respective application. Similarly, the proposed architecture is novel for the particular application. Moreover, it can be observed how the input size and filter size determine the depth of a network as

Table 4.2 LeNet-5 Architecture vs. Proposed CNN architecture

LeNet-5 Architecture Proposed CNN Architecture

Total No. of Layers 7 13 No. of Convolution 3 5 Layer

No. of Pooling Layer 2 3 Convolution Layer, Pooling layer, Convolution layer, Pooling Normalization layer, Non-linear Type of layers layer, Non-liner layer, Fully layer, Fully connected layer, connected layer, Output layer Softmax layer Type of Pooling Average pooling Max- pooling

Type of Non-liner layer Sigmoid ReLU

Activation function in tanh Softmax FC layer Output layer RBF Softmaxloss

Loss function Mean Square Error (MSE) Cross- entropy loss

Table 4.3 State-of-the Art Deep Learning Model for DR detection vs. Proposed CNN

Input Conv Kernel Pooling FC Activation Learning Inspired Authors Size Layer Size Layer Layer Functions Rate Model

Doshi et 512× 5×5, LReLU, 0.003, 12 5 2 VGG-16 al. 512×1 3×3 Softmax 0.0003

Haloi et 129× Maxout, 3 5×5 3 2 - - al. 129×3 Softmax

ReLU, Chandore 448× 5×5, 13 4 2 Maxout, 0.0005 VGG-16 et al. × 3×3 448 1 Softmax

60×60 ReLU, Proposed 5 5×5 3 1 0.0001 LeNet-5 ×1 Softmax

Swinburne University of Technology Sarawak Campus | Materials and Methods 118

THESIS – DOCTOR OF PHILOSOPHY (PHD)

-discussed in Section 4.3.2.1. The complexity of a model can be analysed by the number of layers and the number of nodes or neurons. Since the state-of-the-art models used more convolution layer, hence the number of nodes and parameters increase exponentially making these models highly complex. They consume more time and memory. On the other hand, the proposed model performs the same task (DR detection) with equivalent or better performance using a relatively less number of hidden layers and neurons.

Recently, a deep learning model has been proposed for segmentation of preserved photoreceptors no en face OCT image in (Camino et al. 2018). The proposed method involves pre-processing of the OCT images, manual segmentation of preserved ellipsoid zone, extraction of patches of certain size from the OCT images, training of CNN using patches, and post processing. The employed CNN for classifying B-scan patches consists of three convolution layers, three pooling layers, four ReLU layers, and two fully connected layers. The architecture of their CNN is similar to the proposed CNN for blood vascular disease detection in terms of the layer organization. However, both of these CNNs are different from each other in terms of how the input image is being processed and the meta-parameters used. Their CNN processes a patch size 33×33. The three convolution layers use 32, 32, and 64 filters of size 5×5. Out of three pooling layers, one is max pooling layer with window size 3×3 and other two are average pooling layers with window size 3×3. Then, Softmax classifier performs the binary classification. The network has been trained with learning rate 0.05 to 0.0005 with batch size 100. The proposed CNN model has five convolution layers, three max-pooling layers, three batch normalization layers, one ReLU, and one fully connected layer. For input image 60×60, 5×5 filters are used in convolution layers and 2×2 filters are used in max-pooling layers. The difference of this CNN and the proposed CNN has been shown in Table 4.4. The merits of the proposed CNN are that it is more robust to the classification task, more practical as it analyse the entire retina image, and flexible to different tasks. The CNN model proposed by Camino et al. uses four ReLU layers. Although ReLU activation adds non-linearity and helps in faster training, it also loses some of the neurons while training. Therefore, the four ReLUs might kill a significant number of neurons while training. The use of batch normalization in the proposed CNN helps in handling the internal covariate change during training and helps in faster training. Since

Swinburne University of Technology Sarawak Campus | Materials and Methods 119

THESIS – DOCTOR OF PHILOSOPHY (PHD)

Table 4.4: The CNN Model Proposed in Camino et al. vs. Proposed CNN Model

Non- Input Conv Kernel Pooling FC Activation Learning Authors linear Size Layer Size Layer Layer Functions Rate layer 3 (1 max- pooling, Camino 33×33 4 ReLU, 0.05, 3 5×5 2 average 2 et al. × ReLU Softmax 0.0005 1 pooling, filter size 3×3) 3 (max- 60×60 1 ReLU, Proposed 5 5×5 pooling, 1 0.0001 × ReLU Softmax 1 filter size 2×2)

-normalization layer help to avoid overfitting, the proposed CNN can work with relatively fewer number of training sets. Again, the learning rate kept 0.0001 so that the network can carefully extract the minute features from the training images. As discussed before, the architecture and the meta-parameters of CNN are dependent on the application requirement and experimentations. Hence, the proposed CNN model is a novel architecture in field of detecting retinal blood vascular diseases.

This proposed simple CNN structure is powerful with the proper training. It has following functionalities:

1. It can detect DR at the earliest stage to provide early treatment and prevent further deterioration of retinal health. In addition to this, it can also grade the severity of DR into mild NPDR, moderate NPDR, and Severe NPDR to PDR.

2. It can diagnose RVO at the earliest stage and can detect two types of RVO, viz. BRVO and CRVO.

3. It can discriminate between DR and RVO irrespective of their similar visual features and classify them with high accuracy. It is powerful enough to learn the inter-class and intra-class variability and capable of being a stand-alone model for multiple blood vascular disease detection.

Swinburne University of Technology Sarawak Campus | Materials and Methods 120

THESIS – DOCTOR OF PHILOSOPHY (PHD)

4.5. Cascaded-CNN for Diagnosing Retinal Vein Occlusion (RVO)

Another major concern of this dissertation is the automatic diagnosis of Retinal Vein Occlusion (RVO) to detect all its three types, viz. BRVO, CRVO, and HRVO. The current state-of-the-art lacks algorithms for detecting all three types of RVO. The single end-to-end trained proposed architecture of CNN (Section 4.3) shows poor performance in detecting HRVO. The ambiguity lies in HRVO features makes it difficult for the single trained network to discriminate it from BRVO and CRVO, thus, HRVO is detected as either BRVO or CRVO. To solve this issue, an ensemble approach has been attempted, where 3 CNNs are chained together to improve the feature learning process. The focus has been made on the strategy of combining more than one CNN to individually learn the different aspects of HRVO features that can help to discriminate it from that of BRVO and CRVO. For the ensemble approach, it is not desired to simply combine different networks and average the classification decision, because it has been already tested that single end- to-end trained network provides poor result in terms of HRVO detection. Again, use of deeper network is not preferable as it will increase the complexity more. Therefore, the aim is to design an optimized ensemble network with the fewer parameters overhead. Now, the ultimate goal is to detect all three variants of RVO and main issue is in detecting HRVO. Since the HRVO possesses features of both BRVO and CRVO, the idea is to learn what the mutual features of these three classes are. For this purpose, a single network can be utilized for binary classification, and hence, one network can learn the features of BRVO and HRVO or CRVO and HRVO. With this notion, two individual networks can perform the feature learning of BRVO – HRVO and CRVO – HRVO separately. Since the designed CNN is efficient for detecting CRVO, BRO, and Normal class, this network can be used at the master network to receive input image. With this visualization of using separate working unit for individual task, three networks can be chained together with the sole objective. The general ensemble classifier uses different classifiers trained with different subsamples of the training sets. The classification decision is taken based on the mean classification results of all the classifiers. For the particular task of detecting RVO, the ensemble should be in such a way that the classifiers have co-dependencies. Therefore, the key idea is to combine same learning algorithm trained over different subset of target class and the training samples. After this analysis, novel ensemble

Swinburne University of Technology Sarawak Campus | Materials and Methods 121

THESIS – DOCTOR OF PHILOSOPHY (PHD) architecture has been designed containing three Convolutional Neural Networks of the same configuration. As explained in Section 4.3.1 , in the first step, pre-processing has been done on the input RVO images before feeding into the network. The pre- processing process on sample images is shown in Figure 4.27

This deep learning approach learns normal features of the retina and abnormal features caused due to RVO, and thereby CNN detects its variants. For all types of RVO detection, a novel architecture of cascaded convolutional neural networks has been proposed, named as Cascaded Convolutional Neural Network (CCNN). This proposed CCNN is composed of three designed Convolutional Neural Networks (CNN) of same configuration as described in Section 4.3.2. In the research, an attempt

Figure 4.27: Pre-processing of RVO images.

Swinburne University of Technology Sarawak Campus | Materials and Methods 122

THESIS – DOCTOR OF PHILOSOPHY (PHD)

-has been made to ameliorate the issue of subjectivity induced bias in feature representation by training three Convolutional Neural Networks (CNN) using raw color fundus images to discriminate BRVO, CRVO, HRVO, and Normal retina image. In CCNN, the initial CNN learns the features of Normal and RVO affected retina images from a set of Normal, BRVO, and CRVO training images. When this initial CNN confirms normal features, CCNN gives classification result as Normal image. When it confirms features of either BRVO or CRVO, another CNN checks the decision by comparing the features with HRVO. When either of the CNN confirms BRVO or CRVO features, the CCNN makes the decision on BRVO or CRVO. The other 2 CNNs are meant for learning ambiguous features of HRVO to distinguish it from CRVO and BRVO. Each designed CNN in CCNN has 5 convolution layers, 3 max-pooling layers, 1 ReLU, 3 normalization layers, and a softmaxloss layer; and each of them are trained independently. The initial CNN is trained with Normal, CRVO and BRVO images; one CNN is trained with BRVO and HRVO images, and; the other CNN is trained with CRVO and HRVO images.

4.5.1. Design Strategy for Cascaded Network

The general ensemble classifier uses Condorcet’s jury theorem where N individual voters provide individual decision for binary classification, and based on ∑ the majority voting the final prediction P is made, where , p is the probability of the correct decision by individual voter. As mentioned in the previous sections, the key idea here is to not simply combine multiple CNN and take the majority vote of the classification decision, but to combine same learning algorithm trained over different subset of target class and the training samples. The proposed CCNN contains three CNNs, which can be considered as three classifiers. Each classifier is individually trained over different classes; hence, the training sets are different for each of them. Each CNN analyses the training images and build abstraction of the feature sets and provides classification result. However, these internal networks possess co-dependency while providing classification decision for the CCNN. The algorithm of the learning rule for CCNN can be expressed as follows:

Swinburne University of Technology Sarawak Campus | Materials and Methods 123

THESIS – DOCTOR OF PHILOSOPHY (PHD)

Algorithm:

CCNN Training:

Step 1: Train network-1 using data samples with target class T1, T2, and T3.

Step 2: Train network-2 using data samples with target class T2, T4.

Step 3: Train network-3 using data samples with target class T3, T4. CCNN Decision Rule:

Input set { }, Target= T1, T2, T3, T4 for i=1 to n Network-1:

if ( ) Decision=T1 break

else if ( ) N1_decision= T2 go to network-2 and network-3

Else if ( ) N1_decision=T3 go to network-2 and network-3 Network-2:

If ( ) N2_decision=T2 Else N2_decision=T4 Network-3:

If ( ) N3_decision=T3 Else N3_decion= T4 If N1_decision=N2_decsion=T2 Decision= T2 break Else if N1_decision=N3_decision=T3

Swinburne University of Technology Sarawak Campus | Materials and Methods 124

THESIS – DOCTOR OF PHILOSOPHY (PHD)

Decision= T3 break Else if N2_decision=N3_decion=T4 Decision= T4 end

The architecture of Cascaded Network is hierarchically organized with three CNNs to learn and extract the features of different types of RVO. The initial CNN accepts the input retinal image and learns to distinguish between the normal features of the retina and abnormal features of RVO. This CNN is trained with Normal, BRVO, and CRVO images to carefully build the hierarchies of features using convolution filters in each convolution layer. If, the Cascaded Network finds the normal features, then, it confirms the input image as Normal; otherwise the input image is sent to other two CNNs to find the types of RVO (BRVO, CRVO or HRVO). When this initial CNN finds the features of BRVO or CRVO in the input image, then, the other two CNNs further diagnose the input image to check for the HRVO features. The 2nd and 3rd CNNs are mainly for learning the ambiguous features of HRVO. One CNN is trained with BRVO and HRVO images to learn the distinguishing features of BRVO and HRVO, and; the other CNN is trained with CRVO and HRVO images to learn the distinctive features of CRVO and HRVO. When, both of these CNNs confirm the HRVO features, the Cascade Network classify the image as HRVO. Otherwise, it classifies the image as either CRVO or BRVO based on the classification result of 2nd and 3rd CNN. The flow chart for the architecture of Cascaded Convolutional Neural Network is shown in Figure 4.28. The function and feature extraction method of each network is described in the following section.

Swinburne University of Technology Sarawak Campus | Materials and Methods 125

THESIS – DOCTOR OF PHILOSOPHY (PHD)

Figure 4.28 The Flow Chart for Proposed Cascaded Convolutional Neural Network for RVO Detection

4.5.2. Function of Each CNN in the Cascaded Network:

CNN-1: CNN-1 takes the input retina image and checks for normal or RVO features. For RVO features, it mainly checks for CRVO and BRVO features. As CNN-1 is meant for 3-class classification, it is trained with CRVO, BRVO and Normal images. During the training session, Convolution layers of CNN-1 learn the normal features and clinical features of CRVO and BRVO. Therefore, after feature extraction, fully connected layer classifies the input image into one of the three target classes. Figure 4.29 shows the architecture of CNN-1.

CNN-2: The main function of CNN-2 is to classify the image into either BRVO or HRVO. To learn the discriminating features of BRVO and HRVO, CNN-2 is trained with BRVO and HRVO images. When CNN-1 finds BRVO features, the image is

Swinburne University of Technology Sarawak Campus | Materials and Methods 126

THESIS – DOCTOR OF PHILOSOPHY (PHD) further sent to CNN-2 to confirm by comparing with HRVO features. The architecture of CNN-2 is shown in Figure 4.30 .

CNN-3: The purpose of CNN-3 is to learn the distinguishable features of CRVO and HRVO. For that, CNN-3 is trained with CRVO and HRVO images. When CNN-1 finds CRVO images, CNN-3 confirms whether the features are actually CRVO or HRVO. After extracting the features, CNN-3 classifies the image into either CRVO or HRVO. The architecture of CNN-3 is shown in Figure 4.31 .

Figure 4.29 Architecture of CNN-1

Swinburne University of Technology Sarawak Campus | Materials and Methods 127

THESIS – DOCTOR OF PHILOSOPHY (PHD)

Figure 4.30 Architecture of CNN-2

Figure 4.31 Architecture of CNN-3

Swinburne University of Technology Sarawak Campus | Materials and Methods 128

THESIS – DOCTOR OF PHILOSOPHY (PHD)

All these three networks are collectively known as Cascaded Convolutional Neural Network (CCNN). Even though there are three individual CNN and are trained individually, the final decision of the CCNN is based on the collective decision. When an input is given to CCNN, it can be any image ranging from BRVO, CRVO, HRVO or Normal. CNN1, which is the master network, receives the input first. Since it is trained with 3 types of images, it can classify the image as Normal or CRVO or BRVO image. If it declares as Normal, then cascaded network provides the decision as Normal. If the image is not normal and it finds either BRVO or CRVO features it sends the input image to the other 2 networks. Then, CNN2 and CNN3 will further verify the features. CNN2 is trained with BRVO and HRVO images, therefore, it will check whether the input image is really BRVO image as per CNN1 or it could be HRVO. If CNN2 also finds BRVO, then Cascaded Network will give the decision as BRVO as both CNN1 and CNN2 have provided the same decision. If CNN2 declares HRVO, then Cascaded Network waits for the CNN3. Now, CNN3 is trained with CRVO and HRVO, therefore, if the input image is HRVO then CNN3 will confirm HRVO. Thus, when both CNN2 and CNN3 declare HRVO, then Cascaded Network will provide the decision as HRVO. Similarly for CRVO, if both CNN1 and CNN3 declare CRVO, then Cascaded network will provide the decision as CRVO. Let’s elaborate this working rule of CCNN considering 4 test cases as below:

Case 1: Test image is Normal

The CNN1 receives the image first. If it correctly identifies the image as Normal, then, CCNN provides the decision as Normal image. Therefore, there is no further processing of the test image by CNN2 and CNN3.

Case 2: Test image is BRVO

If CNN1 correctly classify the image as BRVO, then CCNN wait for the result of other two networks. If CNN2 also correctly classify the image as BRVO, then, CCNN provides final decision as BRVO by ignoring the classification result of CNN3. Since CNN3 is trained with CRVO and HRVO images, it will classify the BRVO image as either CRVO or HRVO depending on the closest learned probability distribution.

Case 3: Test image is CRVO

Swinburne University of Technology Sarawak Campus | Materials and Methods 129

THESIS – DOCTOR OF PHILOSOPHY (PHD)

Similarly, if both CNN1 and CNN3 correctly classify the image as CRVO, then, CCNN classify the images as CRVO by ignoring the decision of CNN2, because CNN2 will classify the image as BRVO or HRVO as it is trained with these two types of images.

Case 4: Test image is HRVO

Since CNN1 is trained with two types of RVO, viz. CRVO and BRVO, it will classify the HRVO image as either CRVO or BRVO depending on the closest probability value. Now, the test image goes to both CNN2 and CNN3. If both CNN2 and CNN3 correctly classify the image as HRVO, then, CCNN classify the image as HRVO. If either of these two networks misclassifies the image, then, CCNN will check the decision of CNN1. If the decision of CNN1 is matching with the decision of any of the other two networks, then, CCNN will provide that mutual decision as final decision.

When these conditions are not satisfied, the CCNN provides erroneous results or it just misclassifies the target image.

4.5.3. Contribution

The proposed methodology for RVO detection has filled the gap in the literature for RVO detection. There is no such method has been proposed till date which is able to successfully detect all three types of RVO images, i.e. HRVO, BRVO and CRVO along with Normal images. This work can be considered as one of the first of its kind, which can be used for automatic detection of RVO and normal images. The proposed Cascaded Convolutional Neural Network architecture is a novel architecture. It is carefully designed with 3 identical CNNs, where 2 CNNs strategically used to learn the ambiguous features of HRVO. The advantage of the proposed Deep Cascaded Network (CCNN) is that it can further analyze the ambiguous features HRVO, so that the network can discriminate all three types of RVO efficiently. In the literature, different ensemble based methods are available for various other recognition/classification problems. Generally, for such ensemble based methods, the final classification result is calculated by averaging the independent classification result of each classifier (Antal & Hajdu 2014; Maji et al. 2016; Doshi et

Swinburne University of Technology Sarawak Campus | Materials and Methods 130

THESIS – DOCTOR OF PHILOSOPHY (PHD) al. 2016). On the other hand, the proposed Cascaded Convolutional Neural Network does not provide classification result simply by averaging the classification results of the individual network. In fact, CCNN uses individual networks for better feature understanding. It builds the different feature representations on each internal CNN. The two extra CNNs are used only to learn and extract the distinguishing features of HRVO. The CCNN provides the classification decision on the types of RVO based on the results of two internal CNNs.

4.5.4. Novelty of the Proposed Cascaded CNN

The novelty of the proposed Cascaded CNN lies in the design itself. The ensemble CNNs has been used before for other recognition/classification problems as well, such as DR detection (Doshi et al. 2016), and retinal blood vessel detection (Maji et al. 2016). However, the independent outputs of the CNNs are averaged to get the final outcome of the classification result. On the other hand, the proposed Cascaded Convolutional Neural Network uses internal CNNs for a better understanding of ambiguous features and simultaneous detection of RVO types. The CCNN builds the feature representations using each CNN such a way that it can identify the intra-class variability and separates HRVO from BRVO and CRVO with significantly less error. The CCNN provides the classification decision on the types of RVO based on the results of 2 internal CNNs.

The state-of-art CNN architecture used for RVO detection used 3 convolution layers to extract Normal and BRVO features using 3, 6, and 9 filters of in subsequent convolution layers. In the image based scheme, along with the original image, three extra images are created by adding noise, flipping, and rotating the pre-processed image. Based on the classification results of these newly generated four images, the final decision for a input test image has been made. In the proposed Cascaded CNN, 4 convolution layers are used to extract Normal, BRVO and CRVO features using 60, 300, 512, and 1024 filters in subsequent layers. The Cascaded CNN confirms the final BRVO features by carefully learning the distinguishing features from HRVO in another CNN using 4 convolution layers with the same number of filters as first CNN. That means the BRVO is detected using total of 3,792 convolution filters in 8 convolution layers. For a test image, the final decision on BRVO is made depending

Swinburne University of Technology Sarawak Campus | Materials and Methods 131

THESIS – DOCTOR OF PHILOSOPHY (PHD) on the classification results of 2 CNNs. Similarly, CRVO is detected. For detecting HRVO, the HRVO features are learned using 2 CNNs of the same configuration. Each of these 2 CNNs learns to discriminate HRVO features from BRVO and CRVO separately. The Cascaded Convolutional Neural Network makes the final decision on HRVO for a test image when both of these CNNs confirm HRVO. Table 4.5. shows the comparison of the proposed CNN based method with the existing CNN based method for RVO detection.

Table 4.5 State-of-the-art CNN Architecture used for RVO detection vs. Proposed Cascaded CNN for RVO detection

Existing CNN based Method Proposed Cascaded CNN

1. Single end-to-end trained CNN. 1. Cascade of 3 end-to-end trained CNN.

2. Used LeNet-5 network. 2. Network designed from scratch based on LeNet-5. 3. Detects BRVO and Normal image. 3. Detects BRVO, CRVO, HRVO and Normal 4. 3 Convolution layers for extracting Image. features of BRVO. 4. Total 8 convolution layers for extracting BRVO, CRVO and HRVO features.

Recently, some of the deep learning networks have been proposed for object recognition in different fields, which used more than one CNN (Schlegl et al., 2018; Wolf et al., 2019). The US patent application publication (Schlegl et al., 2018) has proposed a computerized device to detect at least one object between intraretinal cystoid fluid (ICF) and subretinal fluid (SRF) in optical coherence tomography (OCT) images. To improve the accuracy and speed of cyst detection in OCT images the authors have proposed a model where two CNNs are stacked together. The computerized system consists of receiving unit, providing unit, and processing unit. The processing unit has been configured with CNN to automatically segment and discriminate normal retina, ICF, and SRF in an OCT image. The model has been Swinburne University of Technology Sarawak Campus | Materials and Methods 132

THESIS – DOCTOR OF PHILOSOPHY (PHD) trained with patches of size 35×35 and 71×71 extracted from the OCT image centred at the same point. The patches of size 35×35 have been analysed with a sequence of three convolution and max-pooling pairs. The patches of 71×71 have been analysed with a sequence of two convolution and max pooling pairs. The outputs of the last max pooling layers of each stack of CNNs are densely connected with the first joint fully connected layers. Then, the output is processed through the second fully connected layer. Along with the outputs of the second fully connected layer, the spatial location information of the patches with respect to fovea has been fed to the classification layer. A cascaded convolutional neural network has been proposed in (Wolf et al., 2019) for face detection. They have pitched for the idea that for detecting a face one could use n face descriptors and for each n descriptor, n convolutional neural networks can be used as object detectors. Each object detector detects at least one object from a pre-determined window of at least one face image and associated with the respective down-sampling ratio with respect to one image. Again, the object detectors are associated with the same respective image window size, which defines a scale detector. When the scale detectors show the same configuration with that of object detectors and the downsamplers, and the CNNs involved with the object detectors shows group of layers having identical characteristics, then, the object detectors are trained such that they can share common layer. For a particular scale of the input image, there can be multiple object detectors and downsamplers. Each object detector in the group of object detectors has its respective CNN and classifier. If there are L object detectors then, there are L CNNs with equal or increased number of layers. Then, the outputs of the each object detectors are coupled with the respective downsampler that downsamples the input image with particular window size. When multiple scale detectors are employed, there can be multiple object detectors associated with the same window size. Therefore, only one object detector can be used for detecting and classifying a particular object. However, if the classification confidence level of detection is not enough, another object detector or CNN with more layers can be employed. The other object detector processes only those window sizes where the probability of having the particular object in there is above the predetermined value. The outputs of each CNN are fed to the respective classifiers. Then classifiers decide the object in a binary manner, for e.g. face, no face.

Swinburne University of Technology Sarawak Campus | Materials and Methods 133

THESIS – DOCTOR OF PHILOSOPHY (PHD)

The proposed cascaded CNN model for RVO detection is not analogous to the above mentioned CNN based models for object detection as the purpose and application of these models are contrasting. However, an attempt has been made to make a theoretical comparison. The CNN based model proposed in (Schlegl et al., 2018) is for object detection in OCT image consists of two CNNs whereas the proposed CNN based model is for types of RVO detection in fundus image consists of three CNNs. Generally, the OCT and the color fundus image are totally different in terms of texture, structure, retinal image capturing way, and the nature. While the color fundus images provide the details of the internal structure of the entire retina, the OCT images show the layers of retina and help the ophthalmologists to measure the thickness of the layers. In (Schlegl et al., 2018), two CNNs are used to analyse the OCT images patches of different sizes. Two CNNs are trained with OCT image patches of size 35×35 and 71×71. Therefore, the size of the CNN has been decided by the meta-parameters decided by the authors according to the application requirement. The patches of size 35×35 have been analysed with a CNN consisting three convolution layers followed by three max-pooling layers. The convolution layers consist of 32, 64, and 128 convolution filters of size 6×6, 4×4, and 3×3 respectively. The downsampling size is kept 2×2. Again, the patches of size 71×71 have been analysed with a CNN consisting two convolution layers followed by two max pooling layers. The convolution layers consist of 32 and 64 filters of size 8×8 and 5×5 respectively. The pooling filters are chosen to be of size 4×4 and 2×2. The output of each CNN is concatenated and fed to the common fully connected layers. The output is then fed to the classification layer along with the location information of the patches with reference to the fovea. The first fully connected layer consists of 2048 filters and the second fully connected layer consists of 64 filters. All these values are empirically selected by the authors. The first and the major difference of this model compared to the proposed CNN based model is that the proposed model is meant for disease classification (BRVO, CRVO, and HRVO), whereas the former one is for object detection (ICF, and SRF). The proposed CNN based model consists of three CNNs of identical configuration and analysed whole retinal fundus images of size 60×60. Each CNN consists of 5 convolution layers, 3 max pooling layers and 1 fully connected layer. The convolution layers consist of 60, 300, 512, and 1024 convolution filters of size 5×5. The pooling filter size is 2×2. Another major difference is in how these two models process their respective input images. The Swinburne University of Technology Sarawak Campus | Materials and Methods 134

THESIS – DOCTOR OF PHILOSOPHY (PHD) model proposed by Schlegl et al. process images patches of OCT image. The two CNNs trained with two different size patches. Both CNNs share the common fully connected layers and the location information of the patches is provided since for object detection location information is important to detect the particular object in the image. On the other hand, in the proposed cascaded model, each CNN processes different types of images since the main goal is to classify the images based on the types of the disease. Since one of the target diseases possesses ambiguous clinical features (HRVO) with other two types of RVO (BRVO and CRVO), two CNN are focused to differentiate the disease possessing ambiguous features with other two types separately. The final classification result depends on the individual classification results of each CNN. The cascaded network provides final classification decision when two of the CNN provides the same classification decision provided that the first CNN takes a pivotal role in decision making. The practical advantage of the proposed method is that it processes the entire retina image rather than patches, which is more practical in real time scenario. One of the possible disadvantage for the CNN based model by Schlegl et al. is that the patches of size 71×71 has been processed with a smaller network compared to the patches size 35×35. If a large image is processed by a smaller network using large filters, there is possibility that some of the important features are missed out. Then fist fully connected layer has 2048 neurons whereas the second fully connected layer has only 64 neurons. That means more than half of the neurons are dead. In other ways a higher dimensional space has been shrunk into lower dimension space. Therefore, most of the meaningful features are lost for classifier to correctly identify the intended object even though the additional spatial location has been provided. The advantage of proposed cascaded model is that the features are extracted and analysed by each individual CNNs and classification decision has been taken by confirming with at least two CNNs. Therefore, total 2048 neurons provide the confidence level of being one type of disease.

Now, if consider the cascaded convolution neural network proposed by Wolf et al., 2019, it can be observed that it is basically an idea that plurality of CNNs can be arranged in parallel where each CNN works as object detector for face detection. For identifying a face objects, they have suggested that an image can be scaled into different sizes and for each scaled image plurality of down-samplers can down- sample the scaled image into patches of different size. For each pre-determined image

Swinburne University of Technology Sarawak Campus | Materials and Methods 135

THESIS – DOCTOR OF PHILOSOPHY (PHD) window, a CNN can be used as single object detector. For identifying a particular object there can be a group of object detectors associated with a CNN. If the confidence level of one CNN is not good enough to detect the particular object, the size of the network can be increased by increasing the number of layers so that at least one network get the confidence level matching with the pre-determined value. Therefore, depending on the application requirement, there can be N number of scaled images of one face image, for each scaled image there can be M no of downsampler associated with respective window size, for each downsampler there can be multiple CNNs serving as object detectors. They have not mentioned any specific configuration for CNN as the size and structure of those CNN depends on the application requirement. Again, the number of layers will be increased if that particular CNN fails to attain the pre-determined confidence level. Each of the CNN will be trained for detecting particular object associated with the particular window size, downsampler, and the scale factor. The authors have not provided any experimentation for this concept. In a nutshell, it can be said that this cascaded convolutional neural network is extremely complex. In comparison to this cascaded convolutional neural network, the proposed cascaded convolutional neural network is definitely much simpler. Only three CNNs are involved for the classification task. For dealing with ambiguous features, two CNNs are specially utilized to differentiate the particular feature with other two features sharing pathological and natural history. The proposed cascaded model is more beneficial for classification tasks where the entire image is being processed, whereas the cascaded model concept provided in (Wolf et al., 2019) is more beneficial for multiple object detections from patches of single image. The purpose of both of these cascaded convolutional networks is different and as mentioned earlier the architecture of the deep models or CNNs solely depends on the application requirement. Therefore, the proposed cascaded CNN is a novel architecture for diagnosing RVO.

4.6. Contribution of This Research on Retinal Abnormality Detection

In this research has following contributions:

1. This research has provided an efficient method to diagnose retinal blood vascular diseases and detects two prime disorders causing blindness in the middle-aged to

Swinburne University of Technology Sarawak Campus | Materials and Methods 136

THESIS – DOCTOR OF PHILOSOPHY (PHD)

elderly people. The proposed methodology would help the radiologists and the ophthalmologists detect DR and RVO at their initial stage itself and provide fast- track treatment. In this way, the retinal health of the patients would not get deteriorate any further. Moreover, it would also help to prevent any other possible health issues, such as blockage in the cardiac veins or neural nerves.

2. The fundamentals of designing CNN are analysed in depth and have designed a 13 layer CNN comprising 5 Convolution layers, 3 Max-pooling layers, 3 Batch normalization layers, 1 Rectified Liner Unit (ReLU) layer, and 1 Fully Connected layer. The main objective has been set to design simple yet effective architecture of CNN rather than using popular deep CNNs like GoogLeNet, VGGNet as they emanate extra complexity, computational cost, and overfitting problem if training samples are less. Therefore, the proposed CNN architecture has successfully justified the objective of reducing complexity, reduce the number of layers yet gain higher accuracy for retinal image analysis.

3. Development of application effective network to extract the desired features for detecting different retinal abnormalities. The CNN has been carefully designed so that it can effectively and efficiently learn the discriminative features irrespective of the application in retinal image analysis. The same network can be used for diagnosing DR and RVO individually. It can classify the disease according to its type. Moreover, the CNN is efficient to discriminate the similar features of the different disease. Therefore, it can distinguish DR and RVO effectively irrespective of their common lesions. Hence, it is a powerful automated method for retinal abnormality detection, especially RVO and DR, which are the top two reasons for visual impairment.

4. While the literatures on retinal abnormality detection methods are individualistic, the proposed deep learning based method provides a generalized model for diagnosing retinal blood vascular disease. The majority of the existing CAD methods are for diagnosing particular eye diseases, for e.g., methods for detecting DR, methods for detecting Glaucoma, or methods for detecting BRVO, and so on. Each of such methods is not applicable or useful for detecting other type of eye

Swinburne University of Technology Sarawak Campus | Materials and Methods 137

THESIS – DOCTOR OF PHILOSOPHY (PHD)

disease. However, the proposed CNN has the potential to detect multiple retinal diseases at the same time with proper learning and training.

5. The proposed Deep Cascaded Network is a novel architecture for classifying types of disease with ambiguous features. This architecture is carefully designed so that it can effectively and efficiently learn the discriminative features of Normal, BRVO, CRVO, and HRVO from the whole retina fundus image. The single end- to-end trained CNN shows poor performance in detecting HRVO as HRVO shares the etiological and clinical features of both CRVO and BRVO (Sivaprasad et al., 2015). Therefore, the strategy of combining multiple designed CNNs is analysed in order to effectively learn the ambiguous features of HRVO and build upon the distinctive feature set of HRVO to differentiate it from the other two types.

6. The research has focused on the training strategy of the network as well so that CNN can extract the features carefully and identify the disease according to the requirement. The learning algorithm for the proposed CNN has been developed carefully so that it can understand the interclass and intra-class variability of the features. The training strategy for CCNN is also carefully analysed so that each CNN in the cascade network can learn to extract the desired features for its individual task. Each CNN has been given individual training so that each can learn the task-specific features from the labelled training images irrespective of the image number, quality, lighting condition, camera angle etc.

4.7. Chapter Summary

In this chapter, the proposed method for diagnosing retinal abnormality using deep learning has been discussed in details. At the beginning of the chapter, a brief introduction about deep learning, Convolutional Neural Network and its different architecture has been provided. Then, an extensive detail has been provided in designing CNN from the root level. The hypothesis has been set for designing simple CNN for particular application and a novel architecture of CNN has been proposed for diagnosing retinal abnormality. The design hypothesis for selecting parameters has been explained for choosing hyperparmeters, layer organization, and filters. It has

Swinburne University of Technology Sarawak Campus | Materials and Methods 138

THESIS – DOCTOR OF PHILOSOPHY (PHD) been elaborated how input size and filter size affect the size and depth of whole CNN and these are important factors for designing any network. The proposed CNN also overcomes the major limitations and challenges of using deep models, particularly for retinal image analysis. The proposed architecture is an efficient model for multiple retinal disease detection. This CNN can be used to detect DR and RVO at the earliest stage. It can also grade the severity of DR and detect BRVO and CRVO with appropriate learning and training approach.

In this chapter a Deep Cascaded Network (CCNN) has been proposed, which particularly works for RVO affected retina images. Using the proposed simple CNN architecture as a base network, the cascaded network has been designed to detect all three types of RVO. The CCNN can efficiently extract the ambiguous features of HRVO and classify BRVO, CRVO, HRVO, and Normal images simultaneously using inbuilt three CNNs of the same configuration. In the current state-of-the-art methods, there is no such method that can detect all three types of RVO.

In this chapter, all the research objectives mentioned in Chapter-1 have been successfully completed. The evaluation of the proposed methods has been discussed in the next chapter.

Swinburne University of Technology Sarawak Campus | Materials and Methods 139

THESIS – DOCTOR OF PHILOSOPHY (PHD)

Chapter-5

5 Experimental Validation

In this chapter, the experimental validations are provided for the proposed Deep Learning based method for retinal vascular disease detection. The performances of the proposed CNN and Deep Cascaded Network (CCNN) have been evaluated individually. The information regarding experimental setup and databases is provided in details. The experiments have been conducted on the proposed CNN with three different objectives. First, the proposed CNN has been tested to detect DR at the earliest stage and also to classify DR according to the severity: mild NPDR, moderate NPDR, and severe NPDR to PDR. Second, the network has been tested to detect RVO and its two types: BRVO and CRVO. Lastly, the network has been evaluated whether it can discriminate between DR and RVO, or not, as both have similar clinical characteristics. Then, experimental results have been provided for the proposed Cascaded CNN to detect all three types of RVO, viz., CRVO, BRVO, and HRVO. The experiments conducted for different tasks are provided in separate sections.

5.1. Experimental Environment

For the experimental setup, MATLAB®(R2014b) and MatConvNet-1.0- beta24 tool (Vedaldi & Lenc 2014) along with 2GB NVIDIA® GeForce 840M graphics card have been used for designing and training of the proposed CNN based method.

5.2. Databases

In this study, the images have been collected from various publicly available databases. For the experiment, the numbers of the collected DR and RVO images Swinburne University of Technology Sarawak Campus | Experimental Validation 140

THESIS – DOCTOR OF PHILOSOPHY (PHD) from all six databases are shown in Table-5.1. During experiments, all those collected images have been mixed and used them for each individual task. The details of the databases are provided below:

1. STARE Database: The STARE (STructured Analysis of the Retina) Project was introduced by Michael Goldbaum, at the University of California, San Diego in 1975. The database contains 400 raw retinal images of size and all the images are in TIF format with diagnostic results. The images include normal retina to retina affected by various diseases, such as Diabetic Retinopathy, Retinal Vein Occlusion, Coat’s disease, Hypertensive Retinopathy, Retinal Artery Occlusion, Choroidal Neovascularization, Hollenhorst Emboli, Arteriosclerotic Retinopathy, etc. It also includes the expert’s annotations of the manifestations, ground truth-values for blood vessel segmentation and optic disk segmentation (Hoover & Goldbaum 2003) http://cecas.clemson.edu/~ahoover/stare/index.html.

2. MESSIDOR Database: Methods to evaluate segmentation and indexing techniques in the field of retinal ophthalmology (MESSIDOR) is a research program funded by the French Ministry of Research and Defence within the 2004 TECHNO- VISION program. The database contains 1200 eye fundus color images of the posterior pole acquired using a color video 3CCD camera with a Topcon TRC NW6 non-mydriatic retinograph with a 45-degree field of view. All the images are 8 bits images of size 1440×960, 2240×1488 or2304×1536 and in TIF format. For each image, two types of disease diagnoses have been provided by the medical experts, those are Retinopathy grade and Risk of macular edema (Decencière et al. 2014) http://www.adcis.net/en/Download-Third-Party/Messidor.html.

3. DRIVE Database: The DRIVE (Digital Retinal Images for Vessel Extraction) database provides a platform for comparative studies on segmentation of retinal blood vessels. The retinal color fundus images in the DRIVE database are 8 bits images of size 768 × 584, captured using a Canon CR5 non-mydriatic 3CCD camera with a 45- degree field of view (FOV). It contains 40 JPEG compressed photographs, out of which 7 images show signs of mild NPDR, and rest 33 images are normal images. All these images are acquired from 400 diabetic subjects between 25-90 years of age in a

Swinburne University of Technology Sarawak Campus | Experimental Validation 141

THESIS – DOCTOR OF PHILOSOPHY (PHD) diabetic retinopathy screening program held in the Netherlands. Each image has been. (Staal et al. 2004) ( https://www.isi.uu.nl/Research/Databases/DRIVE/)

4. Retina Image Bank: It is a project of the American Society of Retina Specialists. Launched in August 2012, the Retinal Image Bank® platform captures different range of the retina, which are used for clinical disease, anatomy, education, and treatment. It contains 23,337 retinal images with a different diagnosis. All these images are of different size and format. (http://imagebank.asrs.org/)

5. Hossain Rabbani Dataset: In the website of Hossain Rabbani, there are multiple different datasets. Most of the datasets are Optical Coherent Tomography (OCT) and Fluorescein Angiography (FA) images for Diabetic Macular Edema (DME) and other retinal diseaseses. One dataset contains 24 768×768 size Fluorescein Angiography (FA) videos and late FA images in Diabetic Macular Edema (DME) eyes. Another dataset contains 50 normal images of left and right eye acquired by color funduscopy and OCT. The color fundus images are in JPEG format and OCT images are in mat format. Another available database contains 22 pairs of images acquired from the eyes with different types of retinal diseases. For each image pair, there is one color fundus image and one OCT image captured by Topcon 3D OCT-1000 instrument. OCT images contain images of 650 different slices with a size of 650 × 512 × 128 (Mahmudi et al. 2014) (https://sites.google.com/site/hosseinrabbanikhorasgani/datasets-1)

6. Kaggle Database: Kaggle, the data scientists’ community, arranged machine learning competition for automatic detection of Diabetic Retinopathy. The competition was sponsored by California Healthcare Foundation. A large database of high resolution retinal images taken under different imaging conditions had been provided on the public platform. All the images in that database have been provided by EyePACS. The database contains a total of 35127 images with proper labels. The images are graded by a clinician according to the severity of DR and rated the images into 0, 1, 2, 3, and 4. The database contains 25810 Normal images, 2443 mild NPDR images, 5292 moderate NPDR, 873 Severe NPDR, and 708 PDR images. (https://www.kaggle.com/c/diabetic-retinopathy-detection/data)

Swinburne University of Technology Sarawak Campus | Experimental Validation 142

THESIS – DOCTOR OF PHILOSOPHY (PHD)

Table 5.1 Details of Collected Images

Name of No. of. No. of the Collected Normal No. of DR image No. of RVO image Database Image Image NPDR PDR BRVO CRVO HRVO STARE 180 37 72 22 11 26 12 Stage-1 Stage-2 Stage-3 _ MESSIDOR 1200 547 153 246 254 Retinal BRVO CRVO HRVO 261 - - Image Bank 91 82 88 DRIVE 40 33 7 - Dr. Hossain 100 100 - - Rabbani Mild Mod. Sev. PDR Kaggle 35127 25810 - 2443 5292 873 708 Total= Total= Total= 10070 Total=310 36908 26527

5.3. Performance Measures

The accuracy of diagnosis is very essential in the medical field as the doctors can take decisions based on the diagnostic result and provide the appropriate medical treatment and medication. There are different parameters available to measure the attributes of the diagnostic tests. Based on these attributes the doctor can select the best test for a given a disease condition. The widely used popular statistics to evaluate a diagnostic test are: Accuracy, Sensitivity, and Specificity. These three measures mainly quantify how accurate and reliable a test is. Sensitivity evaluates the capability of the test to detect positive disease. Specificity estimates the capability of the test to detect healthy patients without disease. Using sensitivity and specificity, the accuracy of a diagnostic test can be determined by the presence of disease prevalence. These parameters are dependent on certain outcomes and those are as follows:

True positive (TP) = the number of cases the disease is correctly identified

False positive (FP) = the number of cases the disease is incorrectly identified

True negative (TN) = the number of cases correctly detected as healthy

False negative (FN) = the number of cases falsely identified as healthy

Swinburne University of Technology Sarawak Campus | Experimental Validation 143

THESIS – DOCTOR OF PHILOSOPHY (PHD)

The performance of the proposed deep learning based method for diagnosing retinal blood vascular diseases is evaluated using five parameters. Those are:

1. Accuracy: The accuracy of a test is the efficiency to correctly distinguish between the diseased and healthy cases. Mathematically, this can be stated as:

(5.1)

2. Sensitivity: It is the likelihood of the positive test given that the patient actually has the disease. The Sensitivity measures the probability of the correct diagnosis of the people actually having the disease. Sensitivity can be given by the following equation:

(5.2)

3. Specificity: It is the likelihood of a negative test given that the patient is healthy. The Specificity measures the probability of the correct diagnosis of the people not having the disease. Specificity can be calculated by the following equation:

(5.3)

4. Positive Predictive Value (PPV): PPV is the likelihood of the subject with a positive screening test actually has the disease. Mathematically it can be expressed as:

(5.4)

Swinburne University of Technology Sarawak Campus | Experimental Validation 144

THESIS – DOCTOR OF PHILOSOPHY (PHD)

5. Negative Predictive Value (NPV): It is the likelihood of the subject with a negative screening test actually doesn’t have the disease. NPV can be calculated using the following equation:

(5.5)

5.4. Performance Evaluation of the Proposed Deep Learning Method for Diagnosing Retinal Blood Vascular Diseases

The performance of the proposed CNN architecture has been evaluated in terms of its capability to detect DR as early as possible and grade the severity, detect two types of RVO, and its capability to distinguish DR and RVO irrespective of their common visual features. Then, the proposed Deep Cascaded Network (CCNN) has been evaluated for detecting all three types of RVO. Therefore, the performance of the proposed Deep Learning methods and learning algorithm for retinal abnormality detection has been evaluated in terms of the following tasks:

1. Detect Diabetic Retinopathy at the earliest stage. 2. Grade the DR according to its severity. 3. Detect Retinal Vein Occlusion and classify the types. 4. Classify DR affected and RVO affected retina images.

5.4.1. Detection of DR at the Earliest Stage

To provide immediate treatment, it is crucial to detect DR as early as possible. For detecting DR at the earliest stage, the task is considered as binary classification, where the objective of the proposed CNN is to detect mild-NPDR and Normal image. Therefore, the proposed network is trained with mild-NPDR and Normal images. From the MESSIDOR database, total of 153 stage-1 DR images and total of 547 normal images have been collected. For training, 400 normal images and 100 stage-1 DR images are selected. For a deep learning model, it is important to have balanced training sets for each target class. Therefore, for a balanced training, data augmentation has been performed on the mild NPDR images by applying rotation into Swinburne University of Technology Sarawak Campus | Experimental Validation 145

THESIS – DOCTOR OF PHILOSOPHY (PHD)

, , and . Thereby, a total of 400 images, each from Normal and Stage-1 DR, are used for training. 53 images of stage-1 DR and 53 images of Normal images are used for testing. The details of the images used for training and testing are shown in Table-5.2.

Table 5.2 Details of the Images used for Training and Testing for Detection of Stage-1 DR (Messidor Database)

No. of Training and Validation No. of Images Collected No. of Testing Images Images Stage-1 Normal Stage-1 DR Normal Stage-1 DR Normal DR 547 153 400 400 53 53 Total=700 Total=800 Total=106

Thus, a total of 800 pre-processed grayscale images of size are used to train the CNN for detecting DR at the earliest stage. For each class, 90% of the images are used for training, i.e. 360 images of each type and 10% images are used for validation, i.e. the remaining 40 images. 2000 epochs have been considered as standard after a careful investigation with a higher and lower number of epochs and 10 images are kept as batch size, the learning rate is fixed with 0.0001. Thus, for a batch size 10, 9 images are used for training and 1 image is used for validation. The program sequentially takes the image from the stored database for training and validation. The time required by training and testing is 4 minute 5 second.

The performance is evaluated in terms of Accuracy, Sensitivity, Specificity, Positive Predictive Value (PPV), and Negative Predictive Value (NPV) as explained in the previous section. The proposed CNN can detect the DR at its earliest stage with a high accuracy of 98.11%. In the experiment, the CNN has achieved Sensitivity of 100%. That means all the stage-1 DR images are detected correctly. Then, out of 53- normal images, 2 images are incorrectly detected as DR affected images (shown in Fig. 5.2); hence, the obtained specificity is 96.2%. The CNN has obtained a PPV 96.3% and NPV 100%. The summary of the performance of CNN for early DR detection is shown in Table-5.3. The Receiver Operating Characteristic or ROC curve is plotted and is shown in Fig. 5.1. The obtained AUC (Area Under Curve) is 0.989.

Swinburne University of Technology Sarawak Campus | Experimental Validation 146

THESIS – DOCTOR OF PHILOSOPHY (PHD)

Table 5.3 Performance Evaluation for the Detection of DR at the Earliest Stage (Messidor Database)

Accuracy Sensitivity Specificity PPV NPV 98.11% 100% 96.2% 96.3% 100%

Figure 5.1: ROC Curve for Stage-1 DR Detection in Messidor Database

Figure 5.2: Normal Images Misclassified as Stage-1 DR (Messidor Database)

Swinburne University of Technology Sarawak Campus | Experimental Validation 147

THESIS – DOCTOR OF PHILOSOPHY (PHD)

The proposed model has been tested on Kaggle Database as well. Since Kaggle database contains large numbers of images; no data augmentation has been done. The Kaggle database contains 25810 normal images and 2443 stage-1 DR or mild NPDR images. 62 mild NPDR images are of very poor quality and those bad quality images are discarded. Some of the discarded poor quality stage-1 DR images are shown in Fig. 5.3 . For a balanced training, the proposed CNN model has been trained with 2000 normal images and 1938 stage-1 DR. The details of the images used for training and testing are given in the Table- 5.4.

Table 5.4 Details of the Images used for Training and Testing for Detection of Stage-1 DR (Kaggle Database)

No. of Images Collected No. of Training and Validation Images No. of Testing Images Stage-1 Stage-1 Normal Normal Stage-1 DR Normal DR DR 25810 2443 2000 1938 400 443 Total=28253 Total=3938 Total=843

Figure 5.3 : Samples of the Discarded Poor Quality Stage-1 DR images of Kaggle Database

Swinburne University of Technology Sarawak Campus | Experimental Validation 148

THESIS – DOCTOR OF PHILOSOPHY (PHD)

With the similar settings of 2000 epochs, batch size 10, learning rate 0.0001, and 9:1 ratio of validation sets, a total of 3938 pre-processed grayscale images are used to train the CNN for detecting the earliest stage of DR, i.e. mild NPDR. For Kaggle Dataset, the proposed CNN architecture has achieved an Accuracy of 96.6%. The Sensitivity of detecting stage-1 DR is 99.7% and Specificity is 93.2%. The ROC curve is shown in Fig. 5.4. The obtained AUC is 0.937. Only 1 mild NPDR image is detected as Normal Image and is shown in Fig. 5.5. It can be observed that the early symptoms are very subtle and quite close to normal image. Samples of the Normal images misclassified as stage-1 DR is shown in Fig. 5.6. These images have been classified as mild DR mainly because these images have crossed the threshold to classify mild NPDR set by CNN. The obtained PPV is 94.2% and NPV is 99.7%. The whole training and testing is completed in 14 minute and 1 second. Table- 5.5 shows the performance evaluation of stage-1 DR detection for Kaggle Dataset.

Table 5.5. Performance Evaluation for the Detection of DR at the Earliest Stage (Kaggle Database)

Accuracy Sensitivity Specificity PPV NPV 96.6% 99.7% 93.2% 94.2% 99.7%

Figure 5.4: ROC Curve of Stage-1 DR Detection for Kaggle Database Swinburne University of Technology Sarawak Campus | Experimental Validation 149

THESIS – DOCTOR OF PHILOSOPHY (PHD)

Figure 5.5 : The Stage-1 DR or Mild NPDR Misclassified as Normal Image (Kaggle Database)

Figure 5.6: Samples of Normal Images Misclassified as Stage-1 DR (Kaggle Database)

Swinburne University of Technology Sarawak Campus | Experimental Validation 150

THESIS – DOCTOR OF PHILOSOPHY (PHD)

5.4.2. Grading of DR Severity

The performance of the designed CNN has been further evaluated by grading DR into mild to moderate NPDR stage, and Severe NPDR to PDR stage. In this 3- class classification task, the network has been trained with mild to moderate NPDR, Severe NPDR to PDR, and Normal images. From STARE database 72 NPDR images, 22 PDR images, and 37 Normal images are collected. From, MESSIDOR database, 399 mild to moderate NPDR, 254 severe NPDR to PDR images, and 547 Normal images are collected. For training, 400 mild to moderate NPDR images, 250 severe NPDR to PDR images, and 400 normal images are selected. Then, 71 mild and moderate NPDR, 26 severe NPDR and PDR, and 70 Normal images are selected for testing. Similarly, in this case, another 3 sets of images are generated for each training class by rotating the images into , , and . The details of the images used for training and testing are summarized in Table-5.6.

Table 5.6 Details of Training and Testing Images for DR Severity Detection

No. of Training Images after No. of Images Selected No. of Testing Images Data Augmentation Mild- Severe Mild- Severe Mild- Severe Moderate NPDR- Normal Moderate NPDR- Normal Moderate NPDR- Normal NPDR PDR NPDR PDR NPDR PDR

400 250 400 1600 1000 1600 71 26 70

Total= 1050 Total= 4200 Total= 167

Just like the previous experiment, 9:1 ratio is kept for training and validation and network has been trained with 10 batch size, 0.0001 learning rate at epoch 2000. In this 3-class classification, the proposed CNN has obtained 98.2% accuracy, 100% sensitivity, 98% specificity, 97.5% PPV, and 100% NPV. The performance evaluation is shown in Table-5.7. Fig. 5.7 shows the mild to moderate NPDR class image misclassified as Severe NPDR to PDR image class and Fig. 5.8 shows the Normal image misclassified as mild to moderate NPDR image. It can be seen that the misclassified moderate NPDR is almost at the edge of transforming into severe NPDR. Since the exudates are very close to fovea, the CNN misclassified it into the Swinburne University of Technology Sarawak Campus | Experimental Validation 151

THESIS – DOCTOR OF PHILOSOPHY (PHD) severe grade. In the case of misclassified normal images, it can be seen that one of the images has tortuous veins. Some people might have tortuous veins genetically. However, the CNN classified it to DR because tortuous veins are alarming symptoms.

Table 5.7 Performance Evaluation for 3-Class Classification of DR

Accuracy Sensitivity Specificity PPV NPV

98.2% 100% 98% 97.5% 100%

Figure 5.7: Misclassified Moderate NPDR image

Figure 5. 8: Misclassified Normal Images

Swinburne University of Technology Sarawak Campus | Experimental Validation 152

THESIS – DOCTOR OF PHILOSOPHY (PHD)

5.4.3. Detection of RVO

For RVO detection, there is less number of images compared to DR. The CNN architecture is evaluated for detecting BRVO and CRVO types. From the collected image datasets, 50 images from each of the three types, viz. BRVO, CRVO, and Normal are selected for balanced training. Using the same data augmentation method as used for DR detection, a total of 200 images are created for each class, and so, the network has been trained with a total of 600 images. For testing, 88 normal retinal images, 58 CRVO images, 52 images of BRVO are used. The details are shown in Table 5.8.

Table 5.8 Details of the Images Selected for Training and Testing

No. of Training Images No. of Images Selected No. of Testing Images after Data Augmentation BRVO CRVO Normal BRVO CRVO Normal BRVO CRVO Normal 50 50 50 200 200 200 52 58 88 Total=150 Total=600 Total=168

Following the same 9:1 ratio for validation, learning rate 0.0001, epoch 2000, and batch 10; the CNN successfully detects two types of RVO (BRVO and CRVO) and normal image with accuracy 97%, sensitivity 96.15%, specificity 98%, PPV 94.34%, and NPV 98.62% (shown in Table-5.9).

Table 5.9 Performance Evaluation for RVO Detection

Accuracy Sensitivity Specificity PPV NPV 97% 96.15% 98% 94.34% 98.62%

The CNN has been tested for detecting all three types of RVO viz. BRVO, CRVO, and HRVO, and Normal retina image. However, the proposed CNN fails to detect HRVO properly due to its similar features with BRVO and CRVO. Most of the HRVO images have been misclassified into either CRVO or BRVO.

Swinburne University of Technology Sarawak Campus | Experimental Validation 153

THESIS – DOCTOR OF PHILOSOPHY (PHD)

5.4.4. Detection of DR and RVO

The proposed network has been further evaluated to check its capability to detect 2 different blood vascular disorders, viz. DR and RVO, having similar clinical features. For classifying DR, RVO, and Normal images, the images are mixed and matched from multiple databases. 94 DR images from STARE, 7 DR images from DRIVE, and 109 DR images from MESSIDOR database; therefore, a total of 210 DR images (both NPDR and PDR) are selected. Then, 37 RVO images from STARE and 173 RVO images from Retinal Image Bank; therefore, a total of 210 RVO (both BRVO and CRVO) images are selected. Again, total of 210 Normal images (37 images from STARE, 100 images from MESSIDOR, 33 images from DRIVE, and 40 images from Dr. Hossain Rabbani database) have been selected. For balanced training, first 160 images from each class are selected for training and validation and 50 images for each class are used for testing. Similarly, using data augmentation total of 1920 images are used for training and 50 images from each class are used for testing. The details are tallied in Table-5.10.

For this multiple disease detection experiment, with learning rate 0.0001, batch size 4, epoch 2000, and 9:1 ratio validation, the proposed CNN has attained a classification accuracy of 98.8%. The obtained sensitivity, specificity, PPV and NPV are 100%, 98.3%, 96.6%, and 100% respectively (shown in Table-5.11 ). The misclassified RVO and Normal images are shown in Figure 5.9 and Figure 5.10 respectively. The RVO image is misclassified as the normal image, because it can be observed that the symptoms are subtle. Here, the CNN mainly looked for the common lesions of RVO and DR and focused on analysing those common features to discriminate the diseases. Therefore, the symptom of tortuous vein for RVO has been overlooked by the CNN and misclassified the RVO image as normal image. From the misclassified normal image, it can be observed that the way the image has been captured is different than the other images. At this angle the optic nerves seemed larger than the normal size. Therefore, the CNN considered it as optic disc swelling and misclassified the image as RVO.

Swinburne University of Technology Sarawak Campus | Experimental Validation 154

THESIS – DOCTOR OF PHILOSOPHY (PHD)

Table 5.10 Details of the Images used for Training and Testing for Retinal Vascular Diseases

No. of Training Images No. of Images Selected No. of Testing Images after Data Augmentation DR RVO Normal DR RVO Normal DR RVO Normal 160 160 160 640 640 640 60 60 60 Total=480 Total=1920 Total=180

Table 5.11 Performance Evaluation for DR and RVO Detection

Accuracy Sensitivity Specificity PPV NPV 98.8% 100% 98.3% 96.6% 100%

Figure 5.9: RVO Image misclassified as Normal Image

Swinburne University of Technology Sarawak Campus | Experimental Validation 155

THESIS – DOCTOR OF PHILOSOPHY (PHD)

Figure 5.10: Misclassified Normal Image

5.5. Performance Evaluation of the Proposed Cascaded CNN

The proposed Deep Cascaded Network is specifically designed to detect all three types of RVO as the single trained CNN fails to identify HRVO from that of BRVO and CRVO. To evaluate the performance of Cascaded CNN, the RVO images have been collected from STARE database, DRIVE database, a dataset of Dr Hossein Rabbani, and Retinal Image Bank. These constitute a total of 138 normal retinal images, 108 CRVO images, 102 BRVO images and 100 HRVO images.

5.5.1. Training and Testing

For RVO detection, 26 CRVO, 11 BRVO, and 12 HRVO images from STARE, 82 CRVO, 91 BRVO, and 88 HRVO images collected from Retinal Images bank. 100 normal images are collected from Dr. Hossain Rabbani database, 30 normal images are collected from STARE, and 8 normal images are collected from DRIVE. From the collected image datasets, 50 images from each of the four types are used for training and validation purposes and for testing, remaining 88 normal retinal images, 58 CRVO images, 52 images of BRVO, and 50 images of HRVO are used. The details are shown in Table-5.12. The reason behind combining images from different databases for training and testing is because the proposed Cascaded Network requires Swinburne University of Technology Sarawak Campus | Experimental Validation 156

THESIS – DOCTOR OF PHILOSOPHY (PHD) a certain number of samples, and there is relatively less number of RVO images available in the single retina image database compared to DR images. The images from different databases with all possible varieties of images are useful for training and beneficial for the network to learn more. Otherwise, training and testing with only a single database may produce erroneous results. Initially, all the images are converted to grayscale TIF format of size 60×60 as a standard size and format (described in the Chapter-4) as the images from different databases are of different size and format.

Table 5. 12 Details of the Images Selected for Training and Testing

Types of Images No. of Training No. of Testing Images Images CRVO 50 58 BRVO 50 52 HRVO 50 50 Normal 50 88

Using the same data augmentation process as explained in the previous section, another 3 sets of images for each class is created by rotating the training images to 90°, 180° and 270°. Hence, the first CNN network has been trained with a total of 600 (200 images, each from Normal, BRVO, and CRVO) pre-processed grayscale images of size 60×60. For each class, 90% of images are used for training, i.e. 180 images of each type, and 10% images are used for validation i.e. the remaining 20 images. For training all three networks, 2000 epochs have been chosen as standard after careful investigation with a higher and lower number of epochs and 10 images are kept as batch size, and the learning rate is fixed with 0.0001. The second CNN has been trained with 200 BRVO and 200 HRVO images, while the third CNN has been trained with 200 CRVO and 200 HRVO images. The number of training and validation images are also kept as same as the first CNN network. For performance testing of the proposed Cascaded Convolutional Neural Network, the first network has been tested for 88 Normal, 58 CRVO and 52 BRVO type images. The second CNN has been tested with 52 BRVO and 50 HRVO images; whereas the third CNN has been evaluated with 58 CRVO and 50 HRVO type images. The

Swinburne University of Technology Sarawak Campus | Experimental Validation 157

THESIS – DOCTOR OF PHILOSOPHY (PHD) training and testing arrangement of each CNN in the cascade network is shown in the Table- 5.13.

Table 5.13 Details of the Images used for Training and Testing after Data Augmentation

No. of Images for Training No. of Images for Testing BRVO CRVO Normal BRVO CRVO Normal CNN-1 200 200 200 52 58 88 BRVO HRVO BRVO HRVO CNN-2 200 200 52 50 CRVO HRVO CRVO HRVO CNN-3 200 200 58 50

5.5.2. Experimental Results

The performance of the proposed Cascaded CNN has been evaluated using the same parameters used for evaluating the performance of the proposed CNN, viz., Accuracy, Sensitivity, Specificity, PPV, and NPV. Following the state-of-the-art, at first, the performance of the designed CNN is evaluated for detecting BRVO. For this 2-class classification, i.e. for detecting BRVO and Normal, the designed CNN configuration has obtained 99.3% accuracy. Only one normal image has been misclassified as BRVO image and all BRVO images are correctly classified. The obtained sensitivity is 100% and the specificity is 98.9%. Similarly, the performance for CRVO detection has been evaluated and the proposed CNN can detect CRVO and normal images with 98.6% accuracy. All normal images are correctly classified and only two CRVO images are misclassified as normal images. The obtained sensitivity is 96.5% and the specificity is 100%.

As mentioned earlier, the single trained proposed CNN model fails to detect all three types of RVO and shows poor performance in 4-class classification, viz. BRVO, CRVO, Normal and HRVO, mainly because of ambiguous features of HRVO images. Therefore, the Cascaded Convolutional Neural Network (CCNN) has been designed especially for 4-class classification. The performance of each CNN in the Cascaded Network has been evaluated individually. The function of the initial CNN or CNN-1 is to detect the Normal and RVO images, therefore, CNN-1 has been Swinburne University of Technology Sarawak Campus | Experimental Validation 158

THESIS – DOCTOR OF PHILOSOPHY (PHD) trained and tested with Normal, BRVO, and CRVO images. The confusion matrix for the training and testing session of CNN-1 is shown in Figure 5.11, where the diagonal cells represent the number of correctly classified image and percentage of correct classifications by the trained network. The confusion matrix shows that out of 53 BRVO predictions, 94.3% of the images are correctly classified and 5.7% of the images are misclassified, i.e. 50 test images are correctly classified as BRVO, which corresponds to 25.3% of all 198 test images. Out of 57 CRVO predictions, 96.5% images are correct and 3.5% is incorrect. Similarly, out of 88 Normal image predictions, 98.9% is correct and 1.1% is wrong. Out of 52 BRVO cases, 96.2% images are correctly predicted as BRVO and 3.8% images are predicted as CRVO. Similarly, out of 58 CRVO cases, 94.8% images are correctly classified as CRVO and 5.2% images are classified as either BRVO or Normal, whereas out of 88 Normal cases, 98.9% images are correctly classified as Normal and only 1.1% images are classified as BRVO. Overall, 97% of the predictions are correct for the first CNN and 3% is the error rate. Thus, for the 3-class classification, the CNN-1 has achieved an accuracy of 97%, specificity of 98%, sensitivity of 96.15%, PPV of 94.34%, and NPV of 98.62%.

Figure 5. 11: Confusion Matrix for CNN-1

Swinburne University of Technology Sarawak Campus | Experimental Validation 159

THESIS – DOCTOR OF PHILOSOPHY (PHD)

The aim of CNN-2 is to distinguish BRVO and HRVO feature and therefore, trained and tested for BRVO and HRVO images. The Confusion matrix is shown in Fig. 5.12 corresponds to the predictions made by the second CNN network. Out of 54 BRVO predictions, 92.6% images are correct and out of 48 HRVO predictions, 95.8% is correct. On the other hand, out of 52 BRVO cases, 96.2% images are correctly predicted as BRVO and the rest as HRVO, whereas out of 50 HRVO cases, 92% images are correctly classified as HRVO. Overall, 94.1% of the predictions are correct for the second CNN. The obtained sensitivity is 96.15%, specificity is 92%, PPV is 92.59%, and NPV is 95.83% for this 2-class classification.

Figure 5. 12: Confusion Matrix for CNN-2

Similarly, the goal of CNN-3 is to distinguish CRVO and HRVO features, and thus, trained with CRVO and HRVO images. The Confusion matrix is shown in Fig. 5.13 corresponds to the third CNN. It shows that out of 58 CRVO cases, 98.3% images are correctly predicted as CRVO, whereas out of 50 HRVO cases, 96% images are correctly classified as HRVO and rest as CRVO. Overall, the third CNN’s-

Swinburne University of Technology Sarawak Campus | Experimental Validation 160

THESIS – DOCTOR OF PHILOSOPHY (PHD) predictions are 97.2% correct. Therefore, for 2-class classification CNN-3 has attained accuracy 97.2%, specificity 96%, sensitivity 98.28%, PPV 96.61%, and NPV 98%. Table- 5.14 shows the performance of individual CNNs in the proposed deep Cascaded Network.

Figure 5. 13: Confusion Matrix for CNN-3

Table 5.14 Performance Evaluation of the individual CNNs in Designed Cascade Network

CNN-1 CNN-2 CNN-3 Accuracy 97% 94.1% 97.2% Sensitivity 96.15% 96.15% 98.28% Specificity 98% 92% 96% PPV 94.34% 92.59% 96.61% NPV 98.62% 95.83% 98%

Swinburne University of Technology Sarawak Campus | Experimental Validation 161

THESIS – DOCTOR OF PHILOSOPHY (PHD)

Finally, the performance of the proposed Cascaded Convolutional Neural Network has been evaluated for detecting all three types of RVO, viz. CRVO, BRVO, and HRVO. The CNN-1 detects BRVO with an accuracy of 96.2% and CRVO with accuracy 94.8%. None of the BRVO test images are falsely detected as Normal image, thus, obtained Sensitivity for BRVO detection is 100%; and 1 CRVO image is falsely detected as Normal image, thus, gained Sensitivity for CRVO detection is 98.2%. One normal image is falsely detected as BRVO images, therefore, the specificity for BRVO detection is 98.9%; and none of the Normal images are falsely detected as CRVO, thus, the Specificity for CRVO detection is 100%. The function of CNN-2 is to discriminate BRVO and HRVO features and has gained an accuracy of 96.2% for BRVO detection and 92% for HRVO detection. A total of 50 images are correctly classified as BRVO, thus, it attains sensitivity 96.2%; and falsely detects 2 images as HRVO, and thus, specificity becomes 92%. Out of 50 HRVO images, 46 images are correctly detected; hence sensitivity for HRVO detection is 92% and specificity for HRVO detection is 96.2%. CNN-3 detects CRVO with an accuracy of 98.3% and detects HRVO with accuracy of 96%. 57 CRVO images are correctly classified, hence achieved 98.2% sensitivity and 96% specificity for CRVO detection. While for HRVO detection, 2 images are misclassified as CRVO, and thus, specificity is 98.2% and sensitivity is 96%.

To evaluate the performance of the Cascaded Network for all three types of RVO detection, the average value of the classification results of the two internal CNNs has been taken. The performance of CNN-1 and CNN-2 has been averaged to get performance for BRVO recognition; the performance of CNN-1 and CNN-2 has been averaged to get a result for CRVO recognition, and; performance of CNN-2 and CNN-3 has been averaged to get an outcome for HRVO detection. Therefore, the CCNN has successfully detected BRVO with an accuracy of 96.2%, sensitivity 98.1%, and specificity 95.45%. For CRVO detection, the CCNN has achieved an accuracy of 96.5%, sensitivity 98.2%, and specificity 98%. For HRVO detection, the CCNN has obtained an accuracy of 94%, sensitivity of 94%, and specificity of 97.2%. Table 5.15 shows the accuracy, sensitivity and specificity for detection of BRVO, CRVO and HRVO by the proposed Cascaded Convolutional Neural Network (CCNN). Fig. 5.14 shows the misclassified CRVO image as BRVO image by CNN-1. Fig. 5.15 shows the normal images misclassified by CNN-1. Fig. 5.16 shows the

Swinburne University of Technology Sarawak Campus | Experimental Validation 162

THESIS – DOCTOR OF PHILOSOPHY (PHD) misclassified BRVO image as HRVO by CNN-2. Fig. 5.17 shows the misclassified CRVO image as HRVO by CNN-3.

Table 5.15 Performance Evaluation for Detection of RVO Types by Cascade Network

RVO Types Accuracy Sensitivity Specificity

BRVO 96.2% 98.1% 95.45%

CRVO 96.5% 98.2% 98%

HRVO 94% 94% 97.2%

Figure 5.14: CRVO Image Misclassified as BRVO Image by CNN-1

Swinburne University of Technology Sarawak Campus | Experimental Validation 163

THESIS – DOCTOR OF PHILOSOPHY (PHD)

Figure 5.15: Normal Image Misclassified as BRVO Image by CNN-1

Figure 5.16: BRVO Image Misclassified as HRVO by CNN-2

Figure 5.17: CRVO Image Misclassified as HRVO by CNN-3

Lastly, the performance of the CCNN has been evaluated by feeding a set of test images containing all four target classes, viz. BRVO, CRVO, HRVO, and Normal. The confusion matrix of the CCNN for 4-class classification has been shown in Figure 5.18. Out of 248 testing image, 10 images are misclassified by the proposed CCNN. Out of 52 BRVO images 1 image is misclassified as CRVO, out of 58 BRVO images 2 images are misclassified as BRVO and 1 image is misclassified as Normal image. Out of 50 HRVO images 3 images are misclassified as BRVO and 2 images are misclassified as CRVO. Out of 88 normal images 1 normal image is misclassified as BRVO. When a whole set of images is tested, CCNN directly provides the

Swinburne University of Technology Sarawak Campus | Experimental Validation 164

THESIS – DOCTOR OF PHILOSOPHY (PHD) decision, and therefore, it is difficult to track the performance of the individual networks within the CCNN for the particular test set.

Figure 5.18: Confusion matrix of CCNN for 4-class classification

5.6. Performance Comparison of the Proposed CNN based Method with the State-of-the-Art Methods

In this section, the proposed CNN based method for retinal blood vascular disease detection has been compared with the state-of-the-art methods. The proposed method has been compared with the existing method for DR detection and RVO detection in individual sub-sections.

Swinburne University of Technology Sarawak Campus | Experimental Validation 165

THESIS – DOCTOR OF PHILOSOPHY (PHD)

5.6.1. Comparison with the State-of-the-Art Methods for DR Detection

The proposed CNN based method for DR detection proves to be an efficient method for detecting the DR at the earliest stage, and thereby, it can help the ophthalmologist to prevent further deterioration of the retina health. The performance comparison with some of the existing methods is shown in Table-5.16 . From the table, it can be observed that the proposed CNN based method outperforms most of the existing methods for DR detection. It shows outstanding performance for detecting DR at the earliest stage compared to other methods. Hence, it has overcome the issue of detecting DR at the initial stage as the changes in the retina are quite vague in the earliest stage. In addition to that, the proposed CNN has shown outstanding performance in detecting DR severity as well.

It is difficult to make a direct comparison of the proposed method with the existing methods as the databases used in various methods are different. Most of the researchers used the in-house database or hospital database. There are multiple retinal image databases available for DR detection in the public platform. However, the provided labels or ground truth values are meant for achieving varied objectives. For e.g. DIARETDB and ROC database aim for detecting DR related lesions, such as microaneurysms, exudates, haemorrhages, and cotton wool spots. Therefore, the ground truth values for such databases are for particular lesions rather than the severity of the disease. The state-of-the-art methods for detecting DR at the earliest stage, i.e. mild NPDR are based on detecting microaneurysms as these are the early signs of DR. Therefore, the existing methods using microaneurysms as features for detecting mild NPDR (Mizutani et al. 2009; Jaafar, Nandi & Al-Nuaimy 2011a; Inoue et al. 2013) used mainly DIARETDB and ROC database. Similarly, the existing methods detecting DR considering bright lesions, such as exudates, Cotton Wool Spots as features (Jaafar, Nandi & Al-Nuaimy 2011b; X. Zhang et al. 2014), have used DIARETDB and E-Optha Ex databases other than the hospital database. MESSIDOR database and Kaggle database are mostly used for DR screening methods to detect different stages of DR. Various researchers have used MESSIDOR database for classifying DR severity (Dupas et al. 2010; Antal & Hajdu 2014; Roychowdhury, Koozekanani & Parhi 2013; Lachure et al. 2015). Kaggle database has been recently

Swinburne University of Technology Sarawak Campus | Experimental Validation 166

THESIS – DOCTOR OF PHILOSOPHY (PHD)

Table 5. 16 The Performance Comparison of State-of-the-Art Methods and Proposed Method for DR Diagnosis

Target Performance Authors Features Methodology Database Class Measure Exudates, Normal, Sensitivity-90% Nayak et al. area of Neural Hospital NPDR, Specificity-100% (2008) blood vessel, Network database PDR Accuracy-93% and contrast

Quellec et DR, Hospital Sensitivity= Red lesions Wavelet based al. (2008) Normal database 89.62% Specificity= 89.5%

Normal, mild Blood vessel NPDR, Morphological Sensitivity-82% Acharya et area, moderate Hospital transform, Specificity-86% al. (2009) exudates, NPDR, database SVM Accuracy-85.9% MA and HA severe NPDR, and PDR

Gracia et al. Neural DR, In house Sensitivity= 100% Red lesions (2010) Network Normal database Specificity= 56% Blood vessels, exudates Mookiah et Normal, Sensitivity=96.27% area, GA optimized Hospital al. (2012) NPDR, Specificity-96.08% bifurcation PNN classifier database [80] PDR Accucary-96.15% points, global texture and entropies Zhang et al. DR, E-ophtha EX Sensitivity= 96% Exudates Morphology (2014) Normal database, Specificity= 89% Convolutional AUC=97%, Gargeya et DR, MESSIDOR Image Pixels Neural Sensitivity= 94%, al. 2017 Normal 2, E-Ophtha Network Specificity= 98% Accuracy=98.21% Messidor Sensitivity=100% Stage-1 Specificity=96.2% DR, Normal Accuracy=96.6%% Kaggle Sensitivity=99.7% Convolutional Proposed Specificity=93.2% Image pixels Neural Method Network Mild- Moderate NPDR, Accuracy=97.92% Messidor, Sever Sensitivity= 100% STARE NPDR to Specificity= 98% PDR, Normal

Swinburne University of Technology Sarawak Campus | Experimental Validation 167

THESIS – DOCTOR OF PHILOSOPHY (PHD)

-released particularly for Deep Learning models (Gulshan et al. 2016; Doshi et al. 2016; Chandore Vishakha 2017).

In terms of performance, the proposed methodology for DR detection using deep learning has performed better than most of the existing traditional methods irrespective of the different databases. Figure 5. 19 shows Sensitivity and Specificity comparison of the proposed deep learning method with the existing methods using popular classifiers such as k-NN (Dupas et al. 2010), SVM (Lachure et al. 2015), Naïve Bayes (Jelinek et al. 2006) , Neural Network (García et al. 2010) ,and Hidden Markov Model (Tang et al. 2013). From the graph it can be seen that the proposed method has achieved higher sensitivity than the most of the existing methods. The specificity is slightly lower compared to the method that used SVM (Lachure et al. 2015), however, overall performance with the sensitivity is better than the method explained in (Lachure et al. 2015). The proposed method has obtained an equal sensitivity of 100% with that of the method used NN, however, the specificity is relatively low in the case of the existing method (García et al. 2010). Considering all the existing methods tested on MESSIDOR for DR detection, the proposed method still outperforms the state-of-the-art methods. Figure 5.20 shows the performance comparison of the proposed method with the state-of-the-art method for 2-class classification using MESSIDOR database. Since these discussed existing methods are machine learning methods, they are highly dependent on the segmentation of the targeted DR lesions. Therefore, the performance of the classifiers has been affected by the performance of the segmentation and feature designed processes. The co- dependency of the subsequent processes in the traditional machine learning methods makes them complicated. On the other hand, the proposed method is based on deep learning and acts as an autonomous model, which can efficiently extract features from the image pixels and do the classification.

The existing methods for binary classifications are mainly on detecting DR and Normal image, where they used all types of DR severity images. The proposed method, on the other hand, detects the early stage of DR, i.e. mild NPDR and Normal image. Given the condition, the proposed method is more efficient than the state-of- the-art methods. Figure 5.21 shows the performance Comparison of the proposed CNN model for mild NPDR detection in terms of accuracy. Here, all the methods except method described in (Acharya et al. 2009) used MESSIDOR database. Swinburne University of Technology Sarawak Campus | Experimental Validation 168

THESIS – DOCTOR OF PHILOSOPHY (PHD)

Performance Comparison of Other Classifiers and Proposed CNN 120.00%

100.00%

80.00%

60.00%

40.00%

20.00%

0.00% k-NN SVM NN NB HMM Proposed

Sensitivity Specificity

Figure 5.19: Performance Comparison of Existing Methods and the Proposed Method for Early DR Detection

Performance Comparison with Methods using MESSIDOR Database 120% 100% 80% 60% 40% 20% 0%

Sensitivity Specificity

Figure 5.20: Performance Comparison with Methods using MESSIDOR Database

Swinburne University of Technology Sarawak Campus | Experimental Validation 169

THESIS – DOCTOR OF PHILOSOPHY (PHD)

120%

98.20% 100% 95% 90% 80% 80.50% 80%

60%

40%

20%

0% Acharya et al. Lam et al. Antal et al. Haloi et al. Proposed

Accuracy

Figure 5.21: Performance Comparison of Mild NPDR Detection

Now, if compared with the state-of-the art methods using Kaggle database, the proposed method has obtained optimally better performance for detecting stage-1 DR. Gargeya & Leng 2017, proposed a CNN model for detecting DR, which achieved 94% sensitivity and 98% specificity in classifying normal and DR images. The proposed CNN model has achieved 99.7% Sensitivity and 93.2% Specificity for detecting normal and Mild NPDR images. Lam et al. 2016 used GoggLeNet to detect normal and mild NPDR images and Gulshan et al. 2016 used Inception V3 models to detect DR and Normal image. Figure 5.22 shows the performance comparison of the proposed method using Kaggle database. It can be noticed that the proposed CNN model has successfully achieved a higher sensitivity than the other deep learning CNN models.

Now, if 3-class classification is considered, the proposed deep learning model has attained outstanding results as well. There are a few methods available for 3-class classification for grading DR Severity and used hospital database ((Nayak et al. 2008; Mookiah et al. 2013). The performance comparison graph for 3-class classification is shown in Figure 5.23. Swinburne University of Technology Sarawak Campus | Experimental Validation 170

THESIS – DOCTOR OF PHILOSOPHY (PHD)

Performance Comparison using Kaggle Database

102%

100%

98%

96%

94%

92%

90%

88% Gargeya et al. Gulshan et al. Lam et al. Proposed

Sensitivity Specificity

Figure 5.22: Performance Comparison with Deep Learning Methods using Kaggle Database

102% 100% 100% 100% 98.20% 98% 98% 96.15% 96.27% 96.08% 96%

94% 93%

92% 90% 90%

88%

86%

84% Nayak et al. Mookiah et al. Proposed

Accuracy Sensitivity Specificity

Figure 5.23: Performance Comparison for 3-class Classification of DR

Swinburne University of Technology Sarawak Campus | Experimental Validation 171

THESIS – DOCTOR OF PHILOSOPHY (PHD)

Now, if the proposed method is evaluated in term of running time, then the proposed method is a way too faster than other traditional methods. Since the traditional methods mostly involve segmentation and feature extraction methods, each step has its individual time of completion. Again, the total time taken for a DR detection method using more than one type of lesion is different than those of the methods rely in single type of lesions, and is usually more time consuming. Dupas et al., 2010, used clinical features such as microaneurysms, haemorrhages and exudates for DR detection. The running time for the whole process will be that sum of time required by the individual segmentation and features extraction of the microaneurysms, haemorrhages, and exudates. Therefore, it is a straight observation that it is a time consuming process. Lachure et al., 2015, used microaneuryms and hemorrages for DR detection. The total time taken by the method is the sum of the time taken by detection of individual lesion, extraction of features and the classification task using SVM. The time for segmentation and feature extraction of lesions is 20 minutes and the training and testing SVM takes time 3 minutes. Therefore, total time taken by the whole process for DR detection is 23 minute in Messidor database. The proposed method performs both feature extraction and classification and total running time is 4 minute and 5 second for Messidor database. For popular deep learning method such as VGGnet16 and InceptionV3, in Kaggle database, the running time is 17 minutes and 16 minutes 6 second respectively. On the other hand the running time for the proposed network is 14 minutes and 1 second. For BRVO detection, the method proposed by Zhang et al., 2014, using SVM and HLBP takes time of 9 minutes and 4 seconds. The running time for the CNN used by Zhao et al., 2015, is 2 minutes and 5 seconds. At the same time the proposed method takes 3 minutes to complete the BRVO classification task. Therefore, the proposed model has been proved to be a relatively faster method than other methods used for DR and RVO detection. It is to be noted that the running time also depends on the configuration of the system. All the experiments have been done on a system with 4GB RAM and 2GB NVIDIA® GeForce 840M graphics card. On a system with powerful configuration, the program execution will be faster. Therefore, running time of any method or a program is tentative depending on the system configuration. However, it is important to mention that the proposed method is generally simple and faster method compared to other traditional methods those involve extra feature engineering regardless the of the system configuration. For deep learning models, the Swinburne University of Technology Sarawak Campus | Experimental Validation 172

THESIS – DOCTOR OF PHILOSOPHY (PHD) difference between the running time of the proposed model and the other popular might be equivalent sometimes depending on the size of the training samples and the ability of the popular models to skip nodes, but, causes underfitting problem.

5.6.2. Comparison with the State-of-the-Art Methods for RVO Detection

The proposed Cascaded CNN is capable of detecting all three types of RVO and shows outstanding performance. When the existing methods are available for detecting either BRVO or CRVO, then proposed Cascaded CNN successfully detects all three types: BRVO, CRVO, and HRVO irrespective of their ambiguous features. Table-5.17 shows the performance comparison of the proposed Cascade network of three CNNs with the state-of-the-art RVO detection methods. Since any other research work is not available to compare with the proposed method for simultaneous detection of three types of RVO, the existing work for single RVO detection has been listed alongside the proposed work that has single RVO detection and three RVO detection to give an overall picture of research results obtained.

The proposed method is compared with the benchmark methods for RVO detection, and comparison has revealed that the method proposed by Zhang et al. (using Fluorescein Angiographic images (FA) (H. Zhang et al. 2014) and Zhao et al. (using Color Fundus Image) (Zhao, Chen & Chi 2015) are meant for BRVO detection only. On the other hand, the proposed method has been specially designed for detecting all three types of RVO (using Color Fundus image), viz., CRVO, BRVO, and HRVO. The drawbacks of these two benchmark methods are: 1) the CNN used by Zhao et al. is the basic CNN, which is unable to detect all three types of RVO, 2) The SVM based method used by Zhang et al. is a traditional machine learning method, where the classification depends on the successful features extraction using Hierarchical Local Binary Pattern (HLBP). Both of these methods fail to detect HRVO as the HRVO possesses the ambiguous features with BRVO and CRVO. The proposed method is specifically designed for differentiating the ambiguous HRVO features for the other two types.

Swinburne University of Technology Sarawak Campus | Experimental Validation 173

THESIS – DOCTOR OF PHILOSOPHY (PHD)

Table 5. 17 Performance Comparison of the Designed Cascade Network of CNNs with the existing methods

No. of No. of Target Authors Database Training Testing Methodology Performance Class Image Image

Private Hospital Zhang et (570 BRVO+ BRVO, Accuracy = 603 67 HLBP, SVM al. 100 Normal = Normal 96.1% 670 images)

Whole image, Accuracy = Private Hospital CNN (LeNet) 97% Zhao et (100 BRVO+ 100 BRVO, 180 20 Patch image, al. Normal = 200 Normal Accuracy = CNN (LeNet) images) 98.5%

Accuracy = 97.8% BRVO, Whole image, Sensitivity = 400 140 Normal Designed CNN 98.1% Specificity = STARE, DRIVE, 97.7% Retinal Image Accuracy = Bank, Dr Hossein 98.6% Rabbani Database Proposed CRVO, Whole image, Sensitivity = (108 CRVO+ 102 400 146 Method Normal Designed CNN 96.5% BRVO+100 Specificity = HRVO+138 100% Normal=448 Accuracy = images) CRVO, Whole image, 96% BRVO, Cascade Sensitivity = 800 248 HRVO, Network of three 97% Normal Designed CNNs Specificity = 95.2%

The methods proposed by Zhang et al. and Zhao et al. have been validated against the database from a private hospital where images have the same quality. Moreover, the number of images for training and testing are less. Whereas the proposed model has been validated using more images collected from multiple publicly available databases, where all images are of different quality and the images are captured in different positions and angles. Therefore, the proposed method can be considered as a more reliable and consistent method for RVO detection. Figure 5.24 shows the performance comparison of the proposed method with the existing BRVO detection methods. However, the performances of these methods are not directly comparable as these methods have been validated using different databases. To make a fair comparison, the methods proposed by (H. Zhang et al. 2014; Zhao, Chen & Chi 2015) have been implemented and tested on the datasets collected from the publicly available databases.

Swinburne University of Technology Sarawak Campus | Experimental Validation 174

THESIS – DOCTOR OF PHILOSOPHY (PHD)

Accuracy

98% 97.80%

98%

97% 97%

97% Accuracy 96% 96%

96%

95% HLBP+SVM CNN Proposed

Figure 5. 24: Accuracy Comparison of State-of-the Art methods and Proposed Method for BRVO Detection

Since RVO is a new research area, it is difficult to make direct comparisons with the few existing methods. However, an effort has been made to compare the performance of the proposed method with the state-of-the-art methods for RVO detection using same datasets. Therefore, the existing methods (H. Zhang et al. 2014; Zhao, Chen & Chi 2015; Anitha, Selvathi & Hemanth 2009; Gayathri et al. 2014; Fazekas et al. 2015; Zode 2017) for RVO detections have been implemented and tested using the collected publicly available databases and the performances have been shown in Table 5.18. From the table, it can be observed that the proposed method has outperformed the existing methods for RVO detection.

The method proposed by Gayathri et al. used CLBP (Complete Local Binary Pattern) and Neural Network to discriminate normal retina with blood vascular disease affected retina. By following their method, the images are pre-processed by image enhancement and normalization. After that the CLBP is performed and CLBP histogram has been generated and those features are fed to the Neural Network. The disadvantage of this method is that CLBP generates rather long histogram by adding the three components: difference pixel values, sign, and magnitudes. It is a rotation

Swinburne University of Technology Sarawak Campus | Experimental Validation 175

THESIS – DOCTOR OF PHILOSOPHY (PHD) invariant method; however, the abundant feature vector possesses redundant features that decrease the performance of the classifier.

Table 5.18: Performance Comparison for RVO Detection

No. of No. of Target Method Performance/Rema Methods Database Training Testing Class Used rks Image Image STARE, DRIVE, Retinal Image Zhang et BRVO, HLBP + Bank, Dr Hossein 180 20 Accuracy= 94.1% al. Normal SVM Rabbani Database STARE, DRIVE, Retinal Image BRVO, Zhao et al. Bank, Dr Hossein CNN 180 20 Accuracy= 51.28% Normal Rabbani Database STARE, DRIVE, Retinal Image Gayathri et RVO, CLBP + Bank, Dr Hossein 255 45 Accuracy=77.4% al. Normal ANN Rabbani Database

STARE, DRIVE, BRVO, Fuzzy C Retinal Image Anitha et CRVO, means Bank, Dr Hossein 170 30 Accuracy= 79.1% al. HRVO, clustering Rabbani Normal, + ANN Database

STARE, DRIVE, Fractal dimension BRVO, Fractal Retinal Image difference of Normal Fezarkas CRVO, Analysis Bank, Dr Hossein - - and HRVO is higher et al. HRVO (Box Rabbani than that of BRVO Normal Counting) Database and CRVO Helpful process, Fractal however, not STARE, DRIVE, Analysis significant enough to Retinal Image BRVO, (Box- identify images Zode et al. Bank, Dr Hossein - - Normal counting, correctly as the Rabbani Radius- method is affected by Database mass) the segmentation method STARE, DRIVE, Retinal Image BRVO, Proposed Bank, Dr Hossein 200 70 Accuracy= 98.5% Normal CNN Rabbani Proposed Database Method STARE, DRIVE, BRVO, Retinal Image Deep CRVO, Bank, Dr Hossein Cascaded 800 248 Accuracy=96.1 HRVO, Rabbani CNN Normal Database

Swinburne University of Technology Sarawak Campus | Experimental Validation 176

THESIS – DOCTOR OF PHILOSOPHY (PHD)

Anitha et al. used fuzzy c means clustering and neural network for classifying 4 different types of diseases, viz., CRVO, NPDR, Choroidal neo-vascularisation membrane (CNVM), and Central serous retinopathy (CSR). Fuzzy C means clustering is an efficient method for representing ambiguous features and allows image pixels to be in multiple classes. For this research, the method has been implemented to classify CRVO, BRVO, HRVO, and Normal images. By following the method, images are pre-processed using histogram equalization and Gaussian filter on three color channels individually. Then, using fuzzy C mean algorithm the centroids are calculated from red, green, and blue channel of the image. These features are trained and tested by ANN using single layer and 15 hidden neurons. For the collected database, the method has obtained 79.1% accuracy. Out of 200 images, 70% images are used for training, 15% are used for validation, and 15% images are used for testing. In their paper, the diseases are of different types; therefore, the ambiguity of the features was less compared to the three types of RVO. Since the RVO features are highly ambiguous, the performance of the method has been degraded.

Fractal analysing has been performed to calculate the fractal dimension of the RVO affected retina image and the normal retina image in (Fazekas et al. 2015; Zode 2017). By following their method, at first, the segmentation is performed to extract the blood vasculature. Segmentation has been performed using morphological operations. Then, ImageJ and FracLac software have been used to calculate the fractal dimension using Box-Counting and Mass-Radius method. The fractal analysis is performed by calculating the difference between the fractal dimension of the normal image and other RVO images. The mean difference between fractal dimension of normal and BRVO image is 0.0106, mean difference between normal and CRVO image is 0.0283, and the mean difference between normal and HRVO image is 0.2065. The different segmentation methods have less effect on the fractal dimension of the normal retina image compared to RVO affected images (Fazekas et al. 2015). The main limitation of fractal analysis method is that it is highly dependent on the segmentation methods. If segmentation method fails to segment the blood vasculature properly, then the fractal analysis fails too. Again, for RVO images, the initial symptoms are just the tortuous veins, and in that case, the fractal dimensions of RVO images are almost same with the normal images. Then, in the severe stage, the haemorrhages are large in size and the segmentation methods often misclassify the

Swinburne University of Technology Sarawak Campus | Experimental Validation 177

THESIS – DOCTOR OF PHILOSOPHY (PHD) edges of haemorrhages as part of blood veins. Therefore, the difference of fractal dimension of normal and RVO affected images mostly lie in the range 0.01 to 1. Figure 5.25 shows the fractal dimension of BRVO, CRVO, HRVO, and Normal image. It can be observed that the fractal dimension is almost same for all these images.

There are two significant methods available for BRVO detection (Zhao et al. 2015; H. Zhang et al. 2014). From the Table 5.18, it can be observed that the proposed method has achieved higher accuracy for BRVO detection than Zhang et al. and Zhao et al. in terms of whole image analysis based method. By following the method of Zhang et al., 2-levels of max-pooling and LBP (Local Binary Pattern) have been performed on the datasets to generate the HLBP (Hierarchical Local Binary Pattern) features. After that, SVM is used for classification using 10-fold method. The SVM has been trained with 100 BRVO image and 100 Normal Images. Therefore, using 10-fold validation method the 9-folds are used for training and 1-fold is used for testing. In each fold, there are 10 BRVO images and 10 Normal images. The average accuracy obtained is 94.1%. By following the method of Zhao et al., a CNN with 3 convolution layers containing 3, 6, and 9 filters of size 5×5, has been trained with 100 BRVO images and 100 Normal images. Replicating the image based method, 3 others sets of images are generated by adding noise, flipping, and rotation. The final decision of a test image has made when the classification of the extra 3 image sets are same, otherwise the test image is classified as the original image. As the authors have not provided any details about the learning rate, pooling filter size, no. of strides and batch size; the model has been trained with 0.001 learning rate and batch size 10. The proposed method using the designed CNN has been trained with 100 BRVO images and 100 Normal images. Keeping the 9:1 validation ration, total 70 (50 Normal image and 20 BRVO images) images are tested and has achieved 98.5% accuracy. The performances of the proposed method and the existing methods using the same datasets have been shown in the Table-5.18. From the table, it can be seen that the proposed method has outperformed the existing methods with a huge performance difference. While the method proposed by Zhang et al. could able to maintain the performance, the method proposed by Zhao et al. has failed miserably. The main reason for this failure is that, the basic CNN used by Zhao et al. has been tuned such a way that it works with their application using the particular database. As mentioned in

Swinburne University of Technology Sarawak Campus | Experimental Validation 178

THESIS – DOCTOR OF PHILOSOPHY (PHD)

CRVO FD=1.4467

BRVO FD=1.4885

Normal FD=1.466

HRVO FD=1.4266

Swinburne University of Technology Sarawak Campus | Experimental Validation 179

THESIS – DOCTOR OF PHILOSOPHY (PHD)

Figure 5.25: Fractal Dimension of Normal and RVO Images

-the Chapter-1 and Chapter-3, tuning deep models for particular application is expensive and one of the issues. Again, the problem of the 10- fold validation scheme for the collected image dataset is that it randomly selects the 10 sets. In the collected database used for this research, there are multiple copies of the same retina image captured in different angles. Thus, it is possible that similar images might be in both training and testing images when images are randomly selected for training and testing. Therefore, it is important to carefully select the testing images so that none of the images is in the training sets. Otherwise, the model tends to overfit and performance might not be reliable. Thus, for the proposed method the test sets have been kept separate with the training sets.

Zhang et al. have used LBP for feature extraction. The LBP acts as a descriptor that characterises the local distribution of the gray levels of an image. The LBP operator takes surroundings of a pixel and calculates the Euclidean distance of each pixel from that centre pixel. If the distance is positive, then the neighbouring pixel is assigned as 1, if the distance produces a negative value, then the neighbouring pixel is assigned as 0. After generating the 8-bit representation of the neighbouring pixels, the LSB code is generated by summing the weights of the binary code, which is represented by where p is bit position from LSB (Least Significant Bit) to MSB (Most Significant Bit), for each block of the image. Now, the disadvantages of LBP are that it is not invariant to rotation, which means the extracted features are not rotation invariant. Since LBP uses only the pixel distance and not the magnitude information, the structural information captured by LBP is limited. In the proposed method, the convolution filters extract the features. A feature is generated when a filter convolves over the image and computes the sum of the multiplied values of the filter and the image pixels of 8-neighbourhood. Each filter extracts different features from the image pixels. The values of these filters or kernels are not defined but learned during the training process. Therefore, the convolution filters in CNN can extract more meaning from images that humans and human-designed filters might not be able to find. The random initialization of the filters makes sure that each filter extracts different features. In the successive layers, the number of filters has been increased to hierarchically build the feature representation. Again, the proposed HLBP method in (H. Zhang et al. 2014) has used max-pooling layer along with the

Swinburne University of Technology Sarawak Campus | Experimental Validation 180

THESIS – DOCTOR OF PHILOSOPHY (PHD)

LBP layer. A max-pooling operation is an efficient way to reduce the spatial volume. However, Zhang et al. have used max-pooling before applying LBP at the 1st level. Since max pooling has been used over the input image, the spatial dimension of the input gets reduced before any features are extracted. Therefore, there is a high chance that important features are lost during this process. This further limits the capability of LBP to extract important features as LBP is applied on the output image generated by max-pooling layer. In the proposed deep learning method max-pooling is used to reduce the spatial volume of the high dimensional feature space. A max-pooling layer is used after each convolution layer. Since, the features are extracted using convolution layer, the location of those features can be disregarded; and hence, reducing the spatial volume of the feature space doesn’t affect the extracted features or there is no loss of important features. The method proposed by Zhang et al. is a traditional machine learning approach and therefore, a classifier is used after feature extraction to classify into the target classes. The performance of the SVM classifier depends on the performance of the HLBP feature extraction method. If HLBP fails to extract meaningful features, SVM cannot classify the images properly. SVM provides score functions and mostly limited to linearly separable classes. It is a straightforward classifier and if the application is for BRVO detection then it separates BRVO images from any other images. It ignores any newly added images or other class. That makes SVM inflexible model for the future application. On the other hand, the proposed method used a Softmax classifier, which is well-suited classifier for binary to multiclass problem. It provides probability values for each class. Therefore, if any other images or images of different class are added to the training set, the Softmax classifier recalculated the values update the values. In this way, it learns itself during the training process.

The proposed deep learning method is a continuous learning model. The convolution layers extracts the abstracts features, build the feature representation of the image hierarchically, max-pooling layer reduces the spatial dimension of the feature space, non-linear layer adds non-linearity to cope with real-world data, the batch normalization deals with the covariate changes in the hidden layers and allows each layer to learn independently from other layers, and the Softmax classifier provides probabilities for each target class. Using the learning algorithm the network learns to extract features and classify test images into the target classes. During the

Swinburne University of Technology Sarawak Campus | Experimental Validation 181

THESIS – DOCTOR OF PHILOSOPHY (PHD) training session, it keeps learning and builds better features map in each cycle. When new images are added, or new classes are added, the network re-evaluates all the processes again and learns the effects of the newly added image or class. All these features make the proposed deep learning method a better method for RVO detection compared to the existing state-of-the-art methods for RVO detection. Again, the existing methods for RVO detection are limited to one types of RVO only. From the above discussion it is clear that the existing methods are not suitable for mutually exclusive classes. The proposed Deep Cascaded Network is specially designed to solve the issue of detecting ambiguous features of HRVO. Therefore, the proposed method has filled up the research gap and has provided an efficient method to detect all three types of RVO.

5.7. Discussion

From the elaborated experimental analysis, it can be seen that the proposed deep learning method based on CNN is a powerful and efficient method for diagnosing retinal blood vascular diseases. In Chapter-1, it has been mentioned that for analysing retinal images and diagnosing blood vascular diseases, multifaceted pattern recognition algorithms are required in order to detect various symptoms of a particular disease. In addition to that, these disease-specific or lesion-specific algorithms fail to differentiate between diseases possessing similar visual features or lesions. For those traditional segmentation or feature extraction algorithms, the classifiers require fusion of multiple knowledge-based rules in order to identify different diseases with similar symptoms. Therefore, the proposed deep learning based method has successfully overcome this problem and nullifies the requirement of complex multiple algorithms for blood vascular disease detection. The hierarchical feature abstraction rules of the proposed CNN model synthesize the minute details from the image pixels, which enables the model to deal with intra and inter-class variation. This helps to analyse the texture of the retinal image from the pixel level itself. Therefore, the proposed CNN model can detect the early signs of vascular diseases, such as DR and RVO, can detect the types of individual disease, and can differentiate both the diseases as well. Compared to the state-of-the-art methods for

Swinburne University of Technology Sarawak Campus | Experimental Validation 182

THESIS – DOCTOR OF PHILOSOPHY (PHD) blood vascular disease detection, the proposed methodology has proven it to be a better and efficient method for diagnosing retinal diseases, particularly DR and RVO.

It has been discussed earlier (Chapter-1, Chapter-3, and Chapter-4) that the deep learning models are generally complex and require a large amount of training set, time, and memory. These drawbacks tend to mark the deep learning models as practically infeasible models. One of the research hypotheses mentioned in this dissertation is that it is possible to design simple CNN architectures and use for specific tasks without compromising the performance. Fine-tuning and transfer learning are the easiest way to utilize deep learning models for different applications. However, fine-tuning could be expensive and requires experience and profound knowledge. Using fine-tuning and transfer learning, the popular deep learning models such as Inception V3 (Gulshan et al. 2016), AlexNet, VGG16 and GoogLeNet (Lam et al. 2018) have been used in for DR detection (binary classification). In this research, a simple CNN architecture has been designed for diagnosing retinal blood vascular diseases. The proposed CNN model is based on LeNet-5 architecture and consists of 13 layers as explained in Chapter-4. From Fig. 5.18, it can be seen that the proposed CNN model outperforms the GoogLeNet and Inception V3 model in detecting DR using Kaggle database. Lam et al. 2016, experimented with AlexNet, VGG16, and GoogLeNet for detecting DR in Kaggle and Messidor database. GoogLeNet performed highest than other two models and obtained the 95% sensitivity and 96% specificity in detecting normal and mild NPDR. Inception V3 in (Gulshan et al. 2016) obtained 97.5% sensitivity and 93.4% for detecting normal and referable DR (moderate NPDR, severe NPDR, and PDR). On the other hand, the proposed CNN obtained 99.7% sensitivity and 93.2% specificity for detecting normal and mild NPDR. Clearly, the proposed CNN model has outperformed the popular deep models in detecting DR or mild-NPPDR. Therefore, this evaluation has validated the hypothesis.

Now, for RVO detection, fine-tuning and transfer learning have been tested using VGG 16 and Inception V3. For detecting CRVO, BRVO, and Normal image, the confusion matrix for both models are shown in Figure 5.26 and Figure 5.27 respectively. VGG 16 obtained an accuracy of 77.67% and Inception V3 obtained an accuracy of 83.33% for detecting CRVO, BRVO and Normal Image. The proposed CNN architecture has achieved an accuracy of 97% for detecting CRVO, BRVO and Swinburne University of Technology Sarawak Campus | Experimental Validation 183

THESIS – DOCTOR OF PHILOSOPHY (PHD)

Normal image and the confusion matrix is shown in Figure 5.11. One of the main reasons that VGG16 and Inception V3 have shown poor result is because the RVO images are limited and those samples are not enough for training such deep models.

Therefore, all the comparisons regarding DR and RVO detection validate the other hypothesis of the research, which states that it is possible to design and use simple CNN models rather than using complex model for some particular task and still can attain equivalent performance. Although the proposed Cascaded Network for detecting BRVO, CRVO, HRVO, and Normal comprises three identical 13 layered CNNs, still it requires less parameter, memory, and data samples compared to 48 layered Inception V3 model. Therefore, from this research, it can be stated that if classification involves less than 5 classes, it is better to utilize simple CNN models rather than using deep models meant for classifying more than 10 classes. Carefully designed task-specific CNN models can be more effective and provide better time and

Figure 5.26: Confusion Matrix of RVO detection using VGG16

Swinburne University of Technology Sarawak Campus | Experimental Validation 184

THESIS – DOCTOR OF PHILOSOPHY (PHD)

Figure 5.27: Confusion Matrix of RVO detection using Inception V3

-memory management. Moreover, simple models don’t demand a large number of training samples. An efficient pre-processing algorithm can reduce the need for large training samples. If the training set contains data less than the requirement, but contains all good quality images, the pre-processing can help the CNN to build a meaningful feature representation model during the training session. Using such database, the CNN can perform better compared to a database containing the large amount of bad quality images.

5.8. Chapter Summary

In this chapter, the experimental analysis of the proposed deep learning based method for retinal blood vascular disease detection has been provided in details. Here, the details of the databases used for experiments have been elaborated. The training and testing methods for each task has been explained in depth. From the experimental results and the comparison with the state-of-the-art methods, it is proved that the proposed CNN architecture is an efficient method to detect DR at the earliest stage and grade the DR according to its severity. Moreover, the proposed CNN is powerful

Swinburne University of Technology Sarawak Campus | Experimental Validation 185

THESIS – DOCTOR OF PHILOSOPHY (PHD) enough to discriminate between DR and RVO irrespective of their common lesions. The proposed Cascaded CNN for RVO detection is the first of its kind to detect RVO and classify all three types. It shows high performance in detecting BRVO, CRVO, and HRVO. From the experimental validation, it can be concluded that the proposed deep learning based method has satisfied all the research objectives as previously discussed in Chapter-1.

Swinburne University of Technology Sarawak Campus | Experimental Validation 186

THESIS – DOCTOR OF PHILOSOPHY (PHD)

Chapter-6

6 Conclusion and Future Works

6.1. Conclusion

In this dissertation, the retinal analysis has been performed using deep learning in order to diagnose retinal abnormality and detect possible retinal blood vascular disease, particularly Diabetic Retinopathy (DR) and Retinal Vein Occlusion (RVO). The general architecture of the Convolutional Neural Network (CNN) has been exploited to design a simple computationally inexpensive CNN particularly to work with the retina image and diagnose retinal blood vascular diseases.

In Chapter-1, the motivation behind the research on retinal image analysis and detection of retinal abnormality, problem statement, and objectives of the research have been explained. The importance of automated methods for diagnosing retinal abnormality has been discussed and explained how we can prevent vision loss and any other associated health issues by detecting eye diseases at the initial stage.

In Chapter-2, a detailed illustration of the research’s background has been provided. In this chapter, a brief introduction has been given about the eye, retina, and retinal vascular diseases. The symptoms, types, risk factors, and available treatments for DR and RVO have been elaborated. This chapter has also discussed the various imaging techniques for capturing retinal structure.

In Chapter-3, an extensive literature review has been provided on the state-of- the-art for DR and RVO detection individually. The advantages and limitations of the available methods have been discussed. From the extensive literature, the research gaps are also discussed in order to have a proper view on research direction.

Swinburne University of Technology Sarawak Campus | Conclusion and Future 187 Works

THESIS – DOCTOR OF PHILOSOPHY (PHD)

In Chapter-4, the proposed deep learning based methods for retinal image analysis has been illustrated. After discussing about the popular deep learning models and their limitations, the idea for designing simple yet effective deep learning model has been evoked. In this chapter, two deep learning models are proposed. Both of the deep learning models are designed using Convolutional Neural Network (CNN). The models are designed inspired by LeNet-5 architecture. One CNN model is designed to analyse the retinal color fundus image, extract the features from the image and predict the possible retinal blood vascular disease. Compared to other deep learning models, the proposed CNN model is less complex and requires less training dataset and training time, and hence, it is computationally inexpensive. The hypothesis for designing CNN has been set and learning algorithm has been explained. This proposed CNN is capable of doing multiple tasks. It can detect DR at the earliest stage and can also grade the severity of DR, it can detect two types of RVO, and it can also discriminate between RVO affected and DR affected retina image irrespective of their common clinical lesions. Taking this proposed CNN as base model, a Deep Cascaded Network (CCNN) has been proposed, particularly for diagnosing RVO and detects all its three types. The Cascaded CNN is a chain of three CNNs where each CNN carefully learns the ambiguous features of HRVO and helps in taking final decision about the type of RVO. This Cascaded CNN overcomes the problem of single CNN not being able to detect HRVO due to its similar features to that of BRVO and CRVO. Unlike other ensemble methods, the Cascaded CNN does not take the average of the individual result of three CNNs. On the contrary, each CNN has its own individual task and based on their individual decisions the Cascaded CNN takes the final decision. In this chapter, the advantages of both CNN models have been discussed and the novelty of the proposed methods has been explained.

In Chapter-5, the experimental analysis of the proposed deep learning based method for retinal abnormality detection has been provided in details. The experimental set-up, the databases used, performance measures have been discussed. The details of the training and testing have been provided for each individual task. The performance of the proposed CNN has been evaluated in terms of four individual tasks: detecting DR at the earliest stage, grading the severity of DR, detect BRVO and CRVO, and detect DR, RVO, and normal image. For each individual task, the proposed CNN achieved accuracy above 97%. The proposed Cascaded Network has Swinburne University of Technology Sarawak Campus | Conclusion and Future 188 Works

THESIS – DOCTOR OF PHILOSOPHY (PHD) achieved an accuracy of 96.1%, sensitivity 97%, and specificity 95.3% for diagnosing RVO affected retina image and detecting BRVO, CRVO, and HRVO.

This research has the extreme potential for diagnosing retinal blood vascular diseases, especially the two prime retinal disorders causing blindness, viz., DR and RVO. In comparison to the existing traditional and machine learning methods, the proposed method is more accurate and robust for early detection of DR and RVO. From the Chapter-5, Section 5.6, Table 5.16 , it can be observed that the proposed method for DR detection has outperformed most of the existing machine learning and other traditional methods in terms of early DR detection and grading the severity level of DR provided that the experimented retinal image databases are different. The proposed deep learning based method has proven to be a better method for diagnosing DR compared to the other machine learning models such as k-Nearest Neighbor, Support Vector Machine, Neural Network, Hidden Markov Model, Naïve-Bayes classifier (Figure 5.19 ). Since majority of the existing methods (Table 5.16 and Figure 5.19 ) have used in-house database or hospital database, and the proposed method has been tested on multiple publicly available databases, it can be inferred that the proposed model is more robust, reliable, and versatile. However, for a fair comparison, the proposed method and some of the existing methods have been further tested using common publicly available databases of retinal image. In Messidor database, again, the proposed method has outdone existing machine learning methods in terms of sensitivity and specificity as can be observed in Figure 5.20 . Apart from that, in case of Kaggle database, which is a huge database mainly used for the deep learning models, the proposed CNN model has successfully outperformed other deep learning CNN based models for DR detection (Gargeya & Leng 2017; Gulshan et al. 2016; Lam et al. 2018) with 100% sensitivity and 93% specificity for Kaggle database and can be observed in Figure 5.22. The most attractive feature of the proposed model is that it can detect DR at the early stage, i.e. mild-NPDR, with a superior accuracy of 98.20% by outdoing other available a few methods as shown in Figure 5.21 and hence, it has provided its contribution towards solving the big challenge of detecting DR at the early stage (mid-NPDR). Moreover, it can grade the severity of DR in mild to moderate NPDR and severe-NPDR to PDR and has also outperformed other 3-class classification methods as can be seen in Figure 5.23. The additional feature of the proposed CNN model is that it can detect two types of RVO, viz., BRVO and CRVO. Swinburne University of Technology Sarawak Campus | Conclusion and Future 189 Works

THESIS – DOCTOR OF PHILOSOPHY (PHD)

While the existing methods for RVO detection is mainly for detecting one type of RVO, the proposed CNN model can classify two types of RVO. Another exciting and significant functionality of the proposed CNN model is its ability to discriminate different retinal disorders irrespective of their similar clinical features and common lesions. It can successfully two major diseases having common clinical features/lesions, DR and RVO, with 98.8% accuracy, 100% sensitivity, and 98.3% specificity.

The most important contribution of this research is the proposed Cascaded Convolutional Neural Network, which has provided the significant contribution to the field of RVO detection. It is the first of its kind to detect all three types of RVO, viz., CRVO, BRVO, and HRVO, successfully by breaking the barriers of HRVO detection that possesses features of both CRVO and BRVO. It has outdone all the available BRVO detection methods and has achieved an accuracy of 98.5% on multiple publicly available databases as can be seen in Table 5.18. It has obtained an accuracy of 96.1% for classifying Normal, BRVO, CRVO, and HRVO. The parallel way of extracting features by three CNNs with proposed configuration has made the proposed Cascaded CNN unique and different than other ensemble methods and CNN models.

In the end, the conclusion can be drawn that all the objectives of this research have been fulfilled by the proposed methodology with certain novelty. This research has provided a significant contribution in the field of medical health sector, especially for retinal image analysis. This research would reduce the workload of the ophthalmologist and would help them to diagnose the retinal abnormality faster. With the help of the proposed methodology, the eye specialist can detect the earliest textural changes caused by DR or RVO and, thereby, can provide fast-track treatment to avoid further degradation of the retinal health. By detecting the RVO, the specialist can also prevent the any other health issues, such as possible blockage in the cardiac vein or neural vein. This research can also help the patients in the remote areas, where ophthalmologists or the eye specialist are not easily available. Now-a-days, the remote diagnosis is in the increasing trend of Telemedicine. In addition to that, the deep learning model has provided a great contribution in the field of deep learning. This research has set the hypothesis for designing CNN, which will help the other

Swinburne University of Technology Sarawak Campus | Conclusion and Future 190 Works

THESIS – DOCTOR OF PHILOSOPHY (PHD) researchers to build their CNN for the particular task instead of using complicated deep models. The proposed computationally inexpensive CNN has the potential to diagnose and detect multiple retinal blood vascular diseases with similar visual features, which can be a stand-alone automated detection model. Again, the proposed Deep Cascaded Network has provided the most significant contribution towards the new field of automatic diagnosis of RVO. This proposed model can detect all variants of RVO and has filled up the gap in the literature.

6.2. Future Works

This particular research has a tremendous scope for the future. In the future, the focus will be more in detecting multiple eye diseases and show the location of the retinal abnormality. For that it is required to improve the algorithm. With the efficient model and improved learning algorithm, the exact location of the abnormality can be pointed out. The focus will be on the designing part, as it is preferred to design models such that it can be a stand-alone diagnostic methodology to detect most of the well-known retinal diseases. If the prediction of the possible retinal disease can be done correctly, it would be very helpful for the ophthalmologist to provide fast treatment and prevent vision loss. It will be also beneficial for telemedicine. Apart from the supervised learning, the future will be concerned about the unsupervised and reinforcement learning. Generally, at the present time it is nearly impossible to design deep learning model without labelled training models. As mentioned earlier, in the medical field getting labelled data is a time consuming and challenging task due to various factors as discussed in the Chapter-4. However, the researchers have put a step forward to find a way or design deep models those can learn to detect objects via unsupervised or reinforcement learning. Therefore, designing fully unsupervised deep model and reinforcement deep learning model will be the major areas to explore in the future.

Swinburne University of Technology Sarawak Campus | Conclusion and Future 191 Works

THESIS – DOCTOR OF PHILOSOPHY (PHD)

References

Abràmoff, MD, Folk, JC, Han, DP, Walker, JD, Williams, DF, Russell, SR, Massin, P, Cochener, B, Gain, P, Tang, L, Lamard, M, Moga, DC, Quellec, G & Niemeijer, M 2013, ‘Automated analysis of retinal images for detection of referable diabetic retinopathy’, JAMA Ophthalmology, vol. 131, no. 3, pp. 351– 357, DOI: http://doi.org/10.1001/jamaophthalmol.2013.1743.

Abràmoff, MD, Reinhardt, JM, Russell, SR, Folk, JC, Mahajan, VB, Niemeijer, M & Quellec, G 2010, ‘Automated Early Detection of Diabetic Retinopathy’, Ophthalmology. vol. 117, issue 6, pp.1147-1154, DOI: https://doi.org/10.1016/j.ophtha.2010.03.046.

Acharya U, R, Chua, CK, Ng, EYK, Yu, W & Chee, C 2008, ‘Application of higher order spectra for the identification of diabetes retinopathy stages’, Journal of Medical Systems, vol. 32, no. 6, pp. 481–488, DOI: https://doi.org/10.1007/s10916-008-9154-8.

Acharya, UR, Lim, CM, Ng, EYK, Chee, C & Tamura, T 2009, ‘Computer-based detection of diabetes retinopathy stages using digital fundus images’, Proceedings of the Institution of Mechanical Engineers, Part H: Journal of Engineering in Medicine, vol. 223, no. 5, pp. 545–553, DOI: https://doi.org/10.1243/09544119JEIM486.

Acharya, UR, Ng, EYK, Tan, JH, Sree, SV & Ng, KH 2012, ‘An integrated index for the identification of diabetic retinopathy stages using texture parameters’, Journal of Medical Systems, vol. 36, no. 3, pp. 2011–2020, DOI: https://doi.org/10.1007/s10916-011-9663-8.

Ageno, W & Squizzato, A 2011, ‘Retinal vein occlusion: Time for action has come’, Internal and Emergency Medicine, vol. 6, no. 4, pp. 293–295, DOI: https://doi.org/10.1007/s11739-011-0616-5.

Agurto, C, Joshi, V, Nemeth, S, Soliz, P & Barriga, S 2014, ‘Detection of hypertensive retinopathy using vessel measurements and textural features’,

Swinburne University of Technology Sarawak Campus | References 192

THESIS – DOCTOR OF PHILOSOPHY (PHD)

Conference proceedings : ... Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual Conference, vol. 2014, pp. 5406–5409, DOI: https://doi.org/10.1109/EMBC.2014.6944848.

Ali, S, Sidibé, D, Adal, KM, Giancardo, L, Chaum, E, Karnowski, TP & Mériaudeau, F 2013, ‘Statistical atlas based exudate segmentation’, Computerized Medical Imaging and Graphics, vol. 7, no. 5-6, pp. 358-368, DOI: https://doi.org/10.1016/j.compmedimag.2013.06.006.

Amin, J, Sharif, M & Yasmin, M 2016, ‘A Review on Recent Developments for Detection of Diabetic Retinopathy’, Scientifica, vol. 2016, DOI:http://dx.doi.org/10.1155/2016/6838976.

Andrearczyk, V & Whelan, PF 2017, ‘Deep Learning in Texture Analysis and Its Application to Tissue Image Classification’, Biomedical Texture Analysis, Elsevier, pp. 95–129, DOI: https://doi.org/10.1016/B978-0-12-812133-7.00004- 1.

Angelov, P & Sperduti, A 2016, ‘Challenges in Deep Learning’, European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning., no. April, pp. 27–29.

Anitha, J, Selvathi, D & Hemanth, DJ 2009, ‘Neural Computing Based Abnormality Detection in Retinal Optical Images’, 2009 IEEE International Advance Computing Conference, March, pp. 630–635, DOI: https://doi.org/10.1109/IADCC.2009.4809085.

Antal, B & Hajdu, A 2014, ‘An ensemble-based system for automatic screening of diabetic retinopathy’, Knowledge-Based Systems, vol. 60, pp 20-17, DOI: https://doi.org/10.1016/j.knosys.2013.12.023.

Aptel, F, Denis, P, Rouberol, F & Thivolet, C 2008, ‘Screening of diabetic retinopathy: Effect of field number and mydriasis on sensitivity and specificity of digital fundus photography’, Diabetes and Metabolism, vol. 34, no. 3, pp. 290–293, DOI: https://doi.org/10.1016/j.diabet.200.

Archana, DN, Padmasini, N, Yacin, SM & Umamaheshwari, R 2015, ‘Detection of

Swinburne University of Technology Sarawak Campus | References 193

THESIS – DOCTOR OF PHILOSOPHY (PHD)

abnormal blood vessels in diabetic retinopathy based on brightness variations in SDOCT retinal images’, ICETECH 2015 - 2015 IEEE International Conference on Engineering and Technology, March, DOI: https://doi.org/10.1109/ICETECH.2015.7275040.

Ball, GH & Hall, DJ 1965, ‘ISODATA, a novel method of data analysis and pattern classification’, Architectural Design: Stanford Research Institute, vol.699, no. 616.

Bengio, Y 2009, ‘Learning Deep Architectures for AI’, Foundations and Trends® in Machine Learning, vol. 2, no. 1, pp. 1–127, DOI: https://doi.org/10.1561/2200000006.

Berlin, L 1996, ‘Errors in judgment’, American Journal of Roentgenology, vol. 166, no. 6, pp. 1259–1261, DOI: https://doi.org/10.2214/ajr.166.6.8633426.

Besenczi, R, Tóth, J & Hajdu, A 2016, ‘A review on automatic analysis techniques for color fundus photographs’, Computational and Structural Biotechnology Journal, vol. 14, Elsevier, pp. 371–384, DOI: https://doi.org/10.1016/j.csbj.2016.10.001.

Bock, R, Meier, J, Nyúl, LG, Hornegger, J & Michelson, G 2010, ‘Glaucoma risk index: Automated glaucoma detection from color fundus images’, Medical Image Analysis, vol. 14, no. 3, pp. 471–481, DOI: https://doi.org/10.1016/j.media.2009.12.006.

Bodnar, ZM, Desai, A & Akduman, L 2016, ‘Diabetic macular edema’, Spectral Domain Optical Coherence Tomography in Macular Diseases, pp. 117-127, DOI: https://doi.org/10.1007/978-81-322-3610-8_8.

Brady, AP & Brady, AP 2017, ‘Error and discrepancy in radiology : inevitable or avoidable ?’, Insights into Imaging, Insights into Imaging, pp. 171–182, DOI: http://dx.doi.org/10.1007/s13244-016-0534-1.

Bruno, Michael A. Walker, EA & Abujudeh, HH 2015, ‘Understanding and Confronting Our Mistakes : The Epidemiology of Error in Radiology and Strategies for Error Reduction 1’, RadioGraphics, vol. 35, no. 6, pp. 1668–1676, DOI: https://doi.org/10.1148/rg.2015150023.

Swinburne University of Technology Sarawak Campus | References 194

THESIS – DOCTOR OF PHILOSOPHY (PHD)

Bucciarelli, P, Passamonti, SM, Gianniello, F, Artoni, A & Martinelli, I 2017, ‘Thrombophilic and cardiovascular risk factors for retinal vein occlusion’, European Journal of Internal Medicine, vol. 44, pp. 44-48, DOI: https://doi.org/10.1016/j.ejim.2017.06.022.

Burlina, P, Freund, DE, Dupas, B & Bressler, N 2011, ‘Automatic screening of age- related macular degeneration and retinal abnormalities’, Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS, DOI: https://doi.org/10.1109/IEMBS.2011.6090984.

Burn, RP & Mandelbrot, BB 2007, ‘The Fractal Geometry of Nature’, The Mathematical Gazette, vol. 68, no. 443, pp. 71-72, ISBN 0-7167-1186-9 , DOI: https://doi.org/10.2307/3615422.

Cortes, C.,Vapnik, V., 1995, ‘Support Vector Networks’, Machine Learning, vol. 20, no.3, pp. 273-297, DOI: https://doi.org/10.1023/A:1022627411411.

Camino, A, Wang, Z, Wang, J, Pennesi, ME, Yang, P, Huang, D, Li, D & Jia, Y 2018, ‘Deep learning for the segmentation of preserved photoreceptors on en face optical coherence tomography in two inherited retinal diseases’, Biomedical Optics Express, vol. 9, no. 7, p. 3092, DOI: https://doi.org/10.1364/BOE.9.003092.

Carrera, E V & Carrera, R 2017, ‘Automated detection of diabetic retinopathy using SVM’, pp. 6–9 , DOI: https://doi.org/10.1109/INTERCON.2017.8079692.

Castellino, RA 2005, ‘Computer aided detection (CAD): An overview’, Cancer Imaging, vol. 5, no.1, pp.17-19, DOI: https://doi.org/10.1102/1470- 7330.2005.0018.

Chandore Vishakha, SA 2017, ‘Automatic Detection of Diabetic Retinopathy using deep Convolutional Neural Network’, Ijar, vol. 3, no. 4, pp. 633–641.

Chaum, E, Karnowski, TP, Govindasamy, VP, Abdelrahman, M & Tobin, KW 2008, ‘Automated diagnosis of retinopathy by content-based image retrieval.’, Retina (Philadelphia, Pa.), vol. 28, no.10, pp. 1463-1477, DOI: https://doi.org/10.1097/IAE.0b013e31818356dd.

Swinburne University of Technology Sarawak Campus | References 195

THESIS – DOCTOR OF PHILOSOPHY (PHD)

Cheung, N, Mitchell, P & Wong, TY 2010, ‘Diabetic retinopathy’, The Lancet, vol. 376, no. 9735, Elsevier Ltd, pp. 124–136, DOI: https://doi.org/10.1016/S0140- 6736(09)62124-3.

Cho, K.H., Kim, CK, Oh, K, Oh, S-W, Park, KH & Park, SJ 2017, ‘Retinal vein occlusion as the surrogate marker for premature brain aging in young patients’, Investigative Ophthalmology and Visual Science, vol. 58, no. 6, pp. BIO82– BIO87, DOI: https://doi.org/10.1167/iovs.17-21413.

Coleman, W 2007, ‘Biology in the Nineteenth Century: Problems of Form, Function and Transformation’, University Press, Cambridge, vol. 44, no. 1, pp. 118–130, ISBN 052129293X.

Cover, T & Hart, P 1967, ‘Nearest neighbor pattern classification’, IEEE Transactions on Information Theory, vol10. no.1, pp. 21-27, DOI: https://doi.org/10.1109/TIT.1967.1053964.

Cree, MJ, Olson, JA, Mchardy, KC, Sharp, PF & Forrester, J V. 1997, ‘A fully automated comparative microaneurysm digital detection system’, Eye, vol. 11, no. 5, pp. 622–628, DOI: https://doi.org/10.1038/eye.1997.166.

Cvijovic, M, Höfer, T, Jure, A, Alberghina, L, Almaas, E, Besozzi, D & Blomberg, A 2016, ‘Strategies for structuring interdisciplinary education in Systems Biology : an Strategies for structuring interdisciplinary education in Systems Biology : an European perspective’, System Biology and Application, vol. 2, DOI: https://doi.org/10.1038/npjsba.2016.11.

Dai, Y, Zhu, C, Shan, X, Cheng, Z & Zou, B 2019, ‘A Survey on Intelligent Screening for Diabetic Retinopathy’, Chinese Medical Sciences Journal, vol. 34, no. 2, pp. 120–132, DOI: https://doi.org/10.24920/003587.

Dashtbozorg, B, Zhang, J, Huang, F & Romeny, BMTH 2018, ‘Retinal Microaneurysms Detection Using Local Convergence Index Features’, IEEE Transactions on Image Processing, vol. 27, no. 7, IEEE, pp. 3300–3315 , DOI: https://doi.org/10.1109/TIP.2018.2815345.

Davis, LJ & Offord, KP 1997, ‘Logistic regression’, Journal of Personality Assessment, vol. 68, no. 3, pp.497-507, DOI:

Swinburne University of Technology Sarawak Campus | References 196

THESIS – DOCTOR OF PHILOSOPHY (PHD)

https://doi.org/10.1207/s15327752jpa6803_3.

Decencière, E, Zhang, X, Cazuguel, G, Laÿ, B, Cochener, B, Trone, C, Gain, P, Ordóñez-Varela, JR, Massin, P, Erginay, A, Charton, B & Klein, JC 2014, ‘Feedback on a publicly distributed image database: The Messidor database’, Image Analysis and Stereology, vol. 33, no. 3, pp. 231–234, DOI: https://doi.org/10.5566/ias.1155.

Dempster, AP, Laird, NM & Rubin, DB 1977, ‘Maximum Likelihood from Incomplete Data via the EM Algorithm’, Journal of the Royal Statistical Society. Series B, vol. 39, no. 1, pp. 1-38, https://www.jstor.org/stable/2984875.

Doi, K 2005, ‘Current status and future potential of computer-aided diagnosis in medical imaging’, British Journal of Radiology, vol.78, no.1, pp. s3-s19, DOI: https://doi.org/10.1259/bjr/82933343.

Doshi, D, Sheqoy, A, Sidhpura, D & Gharpure, P 2016, ‘Diabetic Retinopathy Detection using Deep Convolutional Neural Networks’, 2016 International Conference on Computing, Analytics and Security Trends, DOI: https://doi.org/10.1109/CAST.2016.7914977.

Duh, EJ, Sun, JK & Stitt, AW 2017, ‘Diabetic retinopathy: current understanding, mechanisms, and treatment strategies’, JCI Insight, vol. 2, no.14, DOI: https://doi.org/10.1172/jci.insight.93751.

Dulek, R 2013, ‘Properties of the Hypothesis Space and their Effect on Machine Learning’, pp. 1–23.

Dunn, JC 1973, ‘A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters’, Journal of Cybernetics, vol. 3, no. 3, pp. 32- 57, DOI: https://doi.org/10.1080/01969727308546046.

Dupas, B, Walter, T, Erginay, A, Ordonez, R, Deb-Joardar, N, Gain, P, Klein, JC & Massin, P 2010, ‘Evaluation of automated fundus photograph analysis algorithms for detecting microaneurysms, haemorrhages and exudates, and of a computer-assisted diagnostic system for grading diabetic retinopathy’, Diabetes and Metabolism, vol. 36, no. 3, pp. 213-220, DOI: https://doi.org/10.1016/j.diabet.2010.01.002.

Swinburne University of Technology Sarawak Campus | References 197

THESIS – DOCTOR OF PHILOSOPHY (PHD)

Fazekas, Z, Hajdu, A, Lázár, I, Kovács, G, Csákány, B, Calugaru, DM, Shah, R, Adam, EI & Talu, S 2015, ‘Influence of Using Different Segmentation Methods on the Fractal Properties of the Identified Retinal Vascular Networks in Healthy Retinas and in Retinas with Vein Occlusion’, Proc. of KÉPAF 2015, pp. 360– 373.

Felfeli, T, Alon, R, Merritt, R & Brent, MH 2019, ‘Toronto tele-retinal screening program for detection of diabetic retinopathy and macular edema’, Canadian Journal of Ophthalmology, vol. 54, no. 2, Elsevier, pp. 203–211, DOI: https://doi.org/10.1016/j.jcjo.2018.07.004.

Flammer, J, Konieczka, K, Bruno, RM, Virdis, A, Flammer, AJ & Taddei, S 2013, ‘The eye and the heart’, European Heart Journal, vol. 34, no. 17, pp. 1270– 1278, DOI: https://doi.org/10.1093/eurheartj/eht023.

Fleming, AD, Goatman, KA, Philip, S, Williams, GJ, Prescott, GJ, Scotland, GS, McNamee, P, Leese, GP, Wykes, WN, Sharp, PF & Olson, JA 2010, ‘The role of haemorrhage and exudate detection in automated grading of diabetic retinopathy’, British Journal of Ophthalmology, vol. 94, no. 6, pp. 706-711, DOI: https://doi.org/10.1136/bjo.2008.149807.

Fleming, AD, Philip, S, Goatman, KA, Olson, JA & Sharp, PF 2006, ‘Automated microaneurysm detection using local contrast normalization and local vessel detection’, IEEE Transactions on Medical Imaging, vol. 25, no. 9, pp. 1223– 1232, DOI: https://doi.org/10.1109/TMI.2006.879953.

Fleming, AD, Philip, S, Goatman, KA, Williams, GJ, Olson, JA & Sharp, PF 2007, ‘Automated detection of exudates for diabetic retinopathy screening’, Physics in Medicine and Biology, vol. 52, no. 24, pp. 7385–7396, DOI: https://doi.org/10.1088/0031-9155/52/24/012.

Flora, JD 1982, ‘Chi-Square Test’, Annals of Thoracic Surgery, vol.18, no. 10, pp. 701-705, DOI: https://doi.org/10.1097/00005373-197810000-00003.

Fong, DS, Aiello, L, Gardner, TW, King, GL, Blankenship, G, Cavallerano, JD, Ferris, FL & Klein, R 2004, ‘Retinopathy in Diabetes’, Diabetes Care, vol. 27, no.1, pp. s84-s87, DOI: https://doi.org/10.2337/diacare.27.2007.s84.

Swinburne University of Technology Sarawak Campus | References 198

THESIS – DOCTOR OF PHILOSOPHY (PHD)

Foroozan, R, Savino, PJ & Sergott, RC 2002, ‘Embolic central retinal artery occlusion detected by orbital color doppler imaging’, Ophthalmology, vol. 104, no. 4, pp. 744-747, DOI: https://doi.org/10.1016/s0161-6420(01)01011-9.

Franklin, SW & Rajan, SE 2014, ‘Diagnosis of diabetic retinopathy by employing image processing technique to detect exudates in retinal images’, IET Image Processing, vol. 8, no. 10, pp. 601–609, DOI: https://doi.org/10.1049/iet- ipr.2013.0565.

García, M, López, MI, Álvarez, D & Hornero, R 2010, ‘Assessment of four neural network based classifiers to automatically detect red lesions in retinal images’, Medical Engineering and Physics, vol. 32, no. 10, pp. 1085–1093, DOI: https://doi.org/10.1016/j.medengphy.2010.07.014.

García, M, Sánchez, CI, López, MI, Díez, A & Hornero, R 2008, ‘Automatic detection of red lesions in retinal images using a multilayer perceptron neural network.’, Conference proceedings : ... Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Conference, pp. 5425–5428.

Gardner, GG, Keating, D, Williamson, TH & Elliott, AT 1996, ‘Automatic detection of diabetic retinopathy using an artificial neural network: a screening tool.’, British Journal of Ophtalamology, vol. 80, no. 11, pp. 940–944, DOI: https://doi.org/10.1136/bjo.80.11.940.

Gargeya, R & Leng, T 2017, ‘Automated Identification of Diabetic Retinopathy Using Deep Learning’, Ophthalmology, vol. 124, no. 7, pp. 962–969 , DOI: https://doi.org/10.1016/j.ophtha.2017.02.008.

Garhart, C & Lakshminarayanan, V 2016, ‘Anatomy of the eye’, Handbook of Visual Display Technology, pp. 73-83, ISBN 978-3-540-79566-7, DOI: https://doi.org/10.1007/978-3-319-14346-0_4.

Gayathri, R, Vijayan, R, Prakash, JS & Chandran, NS 2014, ‘CLBP for Retinal Vascular Occlusion Detection’, International Journal of Computer Science, vol. 11, no. 2, pp. 204–209.

Ghaffar, F, Uyyanonvara, B, Sinthanayothin, C, Ali, L & Kaneko, H 2016, ‘Detection

Swinburne University of Technology Sarawak Campus | References 199

THESIS – DOCTOR OF PHILOSOPHY (PHD)

of exudates from retinal images using morphological compact tree’, 2016 13th International Joint Conference on Computer Science and Software Engineering, JCSSE 2016, IEEE, pp. 1–5 , DOI: https://doi.org/10.1109/JCSSE.2016.7748858.

Giancardo, L, Meriaudeau, F, Karnowski, TP, Li, Y, Garg, S, Tobin, KW & Chaum, E 2012, ‘Exudate-based diabetic macular edema detection in fundus images using publicly available datasets’, Medical Image Analysis, vol. 16, no. 1, pp. 216-226, DOI: https://doi.org/10.1016/j.media.2011.07.004.

Gligorijevic, V 2015, ‘Methods for biological data integration : perspectives and challenges’, Journal of Royal Society Interface, vol. 12, no. 112, DOI: https://doi.org/10.1098/rsif.2015.0571.

Glorot, X, Bordes, A & Bengio, Y 2011, ‘ReLU’, AISTATS ’11: Proceedings of the 14th International Conference on Artificial Intelligence and Statistics.

Goatman, K, Charnley, A, Webster, L & Nussey, S 2011, ‘Assessment of automated disease detection in diabetic retinopathy screening using two-field photography’, PLoS ONE, vol. 6, no.12, e27524, DOI: https://doi.org/10.1371/journal.pone.0027524.

Goodfellow, Ian, Bengio, Yoshua, Courville, A 2016, ‘Deep Learning’, MIT Press, ISBN: 9780262035613.

Goodman, MS 2017, ‘Logistic regression’, Biostatistics for Clinical and Public Health Research, ISBN: 1138196355.

Gross, H, Blechinger, F & Achtner, B 2008, Human Eye, Handbook of Optical Systems, Volume 4, Survey of Optical Instruments, DOI: https://doi.org/10.1002/9783527699247.ch1.

Gu, J, Wang, Z, Kuen, J, Ma, L, Shahroudy, A, Shuai, B, Liu, T, Wang, X, Wang, L, Wang, G, Cai, J & Chen, T 2018, ‘Recent Advances in Convolutional Neural Networks’, Pattern Recognition, vol. 77, pp. 354-377, DOI: https://doi.org/10.1016/j.patcog.2017.10.013.

Gulshan, V, Peng, L, Coram, M, Stumpe, MC, Wu, D, Narayanaswamy, A, Venugopalan, S, Widner, K, Madams, T, Cuadros, J, Kim, R, Raman, R, Nelson,

Swinburne University of Technology Sarawak Campus | References 200

THESIS – DOCTOR OF PHILOSOPHY (PHD)

PC, Mega, JL & Webster, DR 2016, ‘Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs’, JAMA, vol. 316, no. 22, pp. 2402-2410, DOI: https://doi.org/10.1001/jama.2016.17216.

Haeffele, BD & Vidal, R 2017, ‘Global optimality in neural network training’, Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, vol. 2017-Janua, no. 3, pp. 4390–4398 , DOI: https://doi.org/10.1109/CVPR.2017.467.

Haloi, M 2015, ‘Improved Microaneurysm Detection using Deep Neural Networks’, http://arxiv.org/abs/1505.04424.

Han, D, Kim, Jiwhan & Kim, Junmo 2016, ‘Deep Residual Networks’, ICML Tutorials.

Hansen, AB, Hartvig, N V, Jensen, MS, Borch-Johnsen, K, Lund-Andersen, H & Larsen, M 2004, ‘Diabetic retinopathy screening using digital non-mydriatic fundus photography and automated image analysis.’, Acta ophthalmologica Scandinavica, vol. 82, no. 6, pp. 666-672, DOI: https://doi.org/10.1111/j.1600- 0420.2004.00350.x.

Hartigan, JA 1975, ‘Clustering Algorithms’, Information Retrieval Data Structures and Algorithms, ISBN: 047135645X.

Hartnett, ME, Baehr, W & Le, YZ 2017, ‘Diabetic retinopathy, an overview’, Vision Research, vol. 139, pp. 1-6, DOI: https://doi.org/10.1016/j.visres.2017.07.006

Hayreh, SS, Podhajsky, PA & Zimmerman, MB 2011, ‘Natural history of visual outcome in central retinal vein occlusion’, Ophthalmology, vol. 118, no. 1, pp. 119-133, DOI: https://doi.org/10.1016/j.ophtha.2010.04.019.

Hayreh, SS, Zimmerman, B, McCarthy, MJ & Podhajsky, P 2001, ‘Systemic diseases associated with various types of retinal vein occlusion.’, American journal of ophthalmology, vol. 131, no. 1, pp. 61-77, DOI: https://doi.org/10.1016/s0002- 9394(00)00709-1.

Hayreh, SS & Zimmerman, MB 2014, ‘Branch retinal vein occlusion: Natural history

Swinburne University of Technology Sarawak Campus | References 201

THESIS – DOCTOR OF PHILOSOPHY (PHD)

of visual outcome’, JAMA Ophthalmology, vol. 132, no.1, pp. 13-22, DOI: https://doi.org/10.1001/jamaophthalmol.2013.5515.

He, K, Zhang, X, Ren, S & Sun, J 2016, ‘Identity mappings in deep residual networks’, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 9908 LNCS, pp. 630–645, DOI: https://doi.org/10.1007/978-3-319-46493-0_38.

Heckerman, D 1998, ‘A Tutorial on Learning with Bayesian Networks’, Jordan M.I. (eds) Learning in Graphical Models. NATO ASI Series (Series D: Behavioural and Social Sciences) Springer Netherlands, vol. 89, pp. 301-354, DOI: https://doi.org/10.1007/978-94-011-5014-9_11.

Hickey, KT, Bakken, S, Byrne, MW, Bailey, D (Chip) E, Demiris, G, Docherty, SL, Dorsey, SG, Guthrie, BJ, Heitkemper, MM, Jacelon, CS, Kelechi, TJ, Moore, SM, Redeker, NS, Renn, CL, Resnick, B, Starkweather, A, Thompson, H, Ward, TM, McCloskey, DJ, Austin, JK & Grady, PA 2019, ‘Precision health: Advancing symptom and self-management science’, Nursing Outlook, vol. 67, no. 4, pp. 462-475, DOI: https://doi.org/10.1016/j.outlook.2019.01.003.

Hinton, GE 1989, ‘Connectionist learning procedures’, Artificial Intelligence, vol. 40, no. 1-3, pp.185-234, DOI: https://doi.org/10.1016/0004-3702(89)90049-0.

Hinton, GE, Osindero, S & Teh, Y-W 2006, ‘A Fast Learning Algorithm for Deep Belief Nets’, Neural Computation, vol. 18, no.7, pp. 1527-1554, DOI: https://doi.org/10.1162/neco.2006.18.7.1527.

Hoover, A & Goldbaum, M 2003, ‘Locating the optic nerve in a retinal image using the fuzzy convergence of the blood vessels’, IEEE Transactions on Medical Imaging, vol. 22, no. 8, pp. 951–958 , DOI: https://doi.org/10.1109/TMI.2003.815900.

Hopfield, JJ 1988, ‘Artificial neural networks’, IEEE Circuits and Devices Magazine, vol. 4, no.5, pp. 3-10, DOI: https://doi.org/10.1109/101.8118.

Hu, FB 2003, ‘Sedentary lifestyle and risk of obesity and type 2 diabetes’, Lipids, vol. 38, no. 2, pp. 103–108, DOI: https://doi.org/10.1007/s11745-003-1038-4.

Swinburne University of Technology Sarawak Campus | References 202

THESIS – DOCTOR OF PHILOSOPHY (PHD)

Hughes, GF 1968, ‘On the Mean Accuracy of Statistical Pattern Recognizers’, IEEE Transactions on Information Theory, vol.14, no.1, pp. 55-63, DOI: https://doi.org/10.1109/TIT.1968.1054102.

Hykin, P 2015, ‘Retinal Vein Occlusion (RVO) Guidelines’, The Royal College of Ophthalmologists, no. July, pp. 4–35.

Inoue, T, Hatanaka, Y, Okumura, S, Muramatsu, C & Fujita, H 2013, ‘Automated microaneurysm detection method based on eigenvalue analysis using hessian matrix in retinal fundus images’, Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS, pp. 5873–5876 , DOI: https://doi.org/10.1109/EMBC.2013.6610888.

Ioffe, S & Szegedy, C 2015a, ‘Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift’, Proceeding ICML'15 32nd International Conference on International Conference on Machine Learning, vol. 37, pp. 448-456 .

Ioffe, S & Szegedy, C 2015b, ‘Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift’.

Irshad, S, Salman, M, Akram, MU & Yasin, U 2015, ‘Automated detection of Cotton Wool Spots for the diagnosis of Hypertensive Retinopathy’, Proceedings of the 7th Cairo International Biomedical Engineering Conference, CIBEC 2014, pp. 121–124.

J. Anderson, A 2017, ‘Eye movements’, Handbook of Visual Optics, Volume One: Fundamentals and Eye Optics, ISBN: 9781482237856.

Jaafar, HF, Nandi, AK & Al-Nuaimy, W 2011a, ‘Automated detection of red lesions from digital colour fundus photographs’, Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS, pp. 6232- 6235, DOI: https://doi.org/10.1109/IEMBS.2011.6091539.

Jaafar, HF, Nandi, AK & Al-Nuaimy, W 2011b, ‘Decision support system for the detection and grading of hard exudates from color fundus photographs’, Journal of Biomedical Optics, vol. 16, no. 11, p. 116001, DOI:

Swinburne University of Technology Sarawak Campus | References 203

THESIS – DOCTOR OF PHILOSOPHY (PHD)

https://doi.org/10.1117/1.3643719.

Jelinek, HF, Cree, MJ, Leandro, JJG, Soares, JVB, Cesar, RM & Luckie, A 2007, ‘Automated segmentation of retinal blood vessels and identification of proliferative diabetic retinopathy’, Journal of the Optical Society of America A, vol. 24, no. 5, p. 1448, DOI: https://doi.org/10.1364/JOSAA.24.001448.

Jelinek, HJ, Cree, MJ, Worsley, D, Luckie, A & Nixon, P 2006, ‘An automated microaneurysm detector as a tool for identification of diabetic retinopathy in rural optometric practice’, Clinical and Experimental Optometry, vol. 89, no. 5, pp. 299–305, DOI: https://doi.org/10.1111/j.1444-0938.2006.00071.x.

Jenkins, AJ, Joglekar, M V., Hardikar, AA, Keech, AC, O’Neal, DN & Januszewski, AS 2015, ‘Biomarkers in diabetic retinopathy’, Review of Diabetic Studies, vol. 12, no. 1-2, pp. 159-195, DOI: https://doi.org/10.1900/RDS.2015.12.159.

Jitpakdee, P, Aimmanee, P & Uyyanonvara, B 2012, ‘A survey on hemorrhage detection in diabetic retinopathy retinal images’, 2012 9th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology, ECTI-CON 2012, pp. 12–15 DOI: https://doi.org/10.1109/ECTICon.2012.6254356.

Joussen, A, Gardner, T, Kirchhof, B & Ryan, S 2007a, Retinal vascular disease, Vasa, ISBN: 978-3-540-29542-6.

Joussen, A, Gardner, T, Kirchhof, B & Ryan, S 2007b, Retinal vascular disease, Vasa.

Julesz, B, Gilbert, EN, Shepp, LA & Frisch, HL 1973, ‘Inability of humans to discriminate between visual textures that agree in second order statistics: revisited’, Perception, vol. 2, no. 4, pp. 391–405 , DOI: https://doi.org/10.1068/p020391.

Kahai, P, Namuduri, KR & Thompson, H 2006, ‘A decision support framework for automated screening of diabetic retinopathy’, International Journal of Biomedical Imaging, vol. 2006, pp. 1–8 , DOI: https://doi.org/10.1155/IJBI/2006/45806.

Swinburne University of Technology Sarawak Campus | References 204

THESIS – DOCTOR OF PHILOSOPHY (PHD)

Karaca, Y & Cattani, C 2018, ‘9. k-Nearest neighbor algorithm’, Computational Methods for Data Analysis, ISBN: 978-3-11-049635-2.

Karnowski, TP, Aykac, D, Giancardo, L, Li, Y, Nichols, T, Tobin, KW & Chaum, E 2011, ‘Automatic detection of retina disease: Robustness to image quality and localization of anatomy structure’, Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS, pp. 5959-5964, DOI: https://doi.org/10.1109/IEMBS.2011.6091473.

Khitran, S, Akram, MU, Usman, A & Yasin, U 2015, ‘Automated system for the detection of hypertensive retinopathy’, 2014 4th International Conference on Image Processing Theory, Tools and Applications, IPTA 2014, pp. 14-17, DOI: https://doi.org/10.1109/IPTA.2014.7001984.

Kohavi, R & Quinlan, JR 2002, ‘Data mining tasks and methods: Classification: decision-tree discovery’, Handbook of data mining and knowledge discovery, pp. 267-276 , ISBN: 0-19-511831-6.

Kohonen, T 1982, ‘Self-organized formation of topologically correct feature maps’, Biological Cybernetics, vol. 43, no.1, pp. 59-69, DOI: https://doi.org/10.1007/BF00337288.

Komura, D & Ishikawa, S 2018, ‘Machine Learning Methods for Histopathological Image Analysis’, Computational and Structural Biotechnology Journal, vol. 16, pp. 34-42, DOI: https://doi.org/10.1016/j.csbj.2018.01.001.

Krizhevsky, A, Sutskever, I & Hinton, GE 2012, ‘ImageNet Classification with Deep Convolutional Neural Networks’, ImageNet Classification with Deep Convolutional Neural Networks, pp. 1097–1105 , DOI: https://doi.org/10.1145/3065386.

Lachure, J, Deorankar, A V., Lachure, S, Gupta, S & Jadhav, R 2015, ‘Diabetic Retinopathy using morphological operations and machine learning’, Souvenir of the 2015 IEEE International Advance Computing Conference, IACC 2015, DOI: https://doi.org/10.1109/IADCC.2015.7154781.

Lam, C, Guo, M, Lindsey, T, Darvin Yi, A, Rubin Rishi Bedi, D & staff, N 2016, ‘Automated Detection of Diabetic Retinopathy using Deep Learning

Swinburne University of Technology Sarawak Campus | References 205

THESIS – DOCTOR OF PHILOSOPHY (PHD)

Preprocessing Real-time rotation, flips, and translation to augment data’, pp. 2818–2826, viewed .

Lam, C, Yi, D, Guo, M & Lindsey, T 2018, ‘Automated Detection of Diabetic Retinopathy using Deep Learning.’, AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science, vol. 2017, American Medical Informatics Association, pp. 147–155.

Lang, GE & Lang, SJ 2018, ‘Retinal Vein Occlusions’, Klinische Monatsblatter fur Augenheilkunde, vol. 235, no. 11, pp. 1297-1315, DOI: https://doi.org/10.1055/a- 0662-1197.

Lariche, MJ, Kalkhajeh, SG & Lodariche, MJ 2017, ‘Diabetic retinopathy detection based on region growing and bat algorithm’, Arvand Journal of Health & Medical Sciences, vol. 2, no. 1, pp. 29–34 , DOI: https://doi.org/10.22631/ajhms.2017.82128.1006.

Larsen, N, Godt, J, Grunkin, M, Lund-Andersen, H & Larsen, M 2003, ‘Automated detection of diabetic retinopathy in a fundus photographic screening population’, Investigative Ophthalmology and Visual Science, vol. 44. pp. 767-771, DOI: https://doi.org/10.1167/iovs.02-0417.

Lazar, I & Hajdu, A 2013, ‘Retinal microaneurysm detection through local rotating cross-section profile analysis’, IEEE Transactions on Medical Imaging, vol. 32, no. 2, pp. 400–407, DOI: https://doi.org/10.1109/TMI.2012.2228665.

Lechner, J, O’Leary, OE & Stitt, AW 2017, ‘The pathology associated with diabetic retinopathy’, Vision Research, vol. 139, Pergamon, pp. 7–14, DOI: https://doi.org/10.1016/j.visres.2017.04.003.

LeCun, Y & Bengio, Y 1995, ‘Convolutional networks for images, speech, and time series’, The handbook of brain theory and neural networks, MIT press.

Lecun, Y, Bengio, Y & Hinton, G 2015, ‘Deep learning’, Nature, vol. 521, no. 7553, pp. 436–444, DOI: https://doi.org/10.1038/nature14539.

LeCun, Y, Bottou, L, Bengio, Y & Haffner, P 1998, ‘Gradient-based learning applied to document recognition’, Proceedings of the IEEE, vol. 86, no. 11, pp. 2278 -

Swinburne University of Technology Sarawak Campus | References 206

THESIS – DOCTOR OF PHILOSOPHY (PHD)

2324, DOI: https://doi.org/10.1109/5.726791.

LeCun, YA, Bengio, Y & Hinton, GE 2015, ‘Deep learning’, Nature, vol. 521, no. 7553, pp. 436–444.

LeCun, YA, Bottou, L, Orr, GB & Müller, KR 2012, ‘Efficient backprop’, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 7700, pp. 9-48, DOI: https://doi.org/10.1007/978-3-642-35289-8_3.

Lee, A, Sandvei, M, Asmussen, HC, Design, MA & Skougaard, M 2018, ‘The Development of Complex Digital Health Solutions : Formative Evaluation Combining Different Methodologies Corresponding Author ’:, JMIR Res Protoc, vol. 7, no. 7, pp. 1–10, DOI: https://doi.org/10.2196/resprot.9521.

Lee, SC, Lee, ET, Wang, Y, Klein, R, Kingsley, RM & Warn, A 2005, ‘Computer classification of nonproliferative diabetic retinopathy’, Archives of Ophthalmology, vol. 123, no. 6, pp. 759–764, DOI: https://doi.org/10.1001/archopht.123.6.759.

Li, L & Shan, J 2018, ‘Automated Microaneurysm Detection in Fundus Images through Region Growing’, Proceedings - 2017 IEEE 17th International Conference on Bioinformatics and Bioengineering, BIBE 2017, vol. 2018- Janua, pp. 125–130, DOI: https://doi.org/10.1109/BIBE.2017.00-67.

Li, Y & Chen, L 2014, ‘Big Biological Data: Challenges and Opportunities’, Genomics, Proteomics & Bioinformatics, vol. 12, no. 5, pp. 187–189, DOI: https://doi.org/10.1016/j.gpb.2014.10.001.

Liew, G, Mitchell, P, Rochtchina, E, Wong, TY, Hsu, W, Lee, ML, Wainwright, A & Wang, JJ 2011, ‘Fractal analysis of retinal microvasculature and coronary heart disease mortality’, European Heart Journal, vol. 32, no. 4, pp. 422–429, DOI: https://doi.org/10.1093/eurheartj/ehq431.

Lippi, M 2017, ‘Reasoning with deep learning: An open challenge’, CEUR Workshop Proceedings, pp. 38–43.

Long, S, Huang, X, Chen, Z, Pardhan, S & Zheng, D 2019, ‘Automatic Detection of

Swinburne University of Technology Sarawak Campus | References 207

THESIS – DOCTOR OF PHILOSOPHY (PHD)

Hard Exudates in Color Retinal Images Using Dynamic Threshold and SVM Classification : Algorithm Development and Evaluation’, BioMed Research International, vol. 2019, DOI: https://doi.org/10.1155/2019/3926930.

Macdonald, D 2014, ‘The ABCs of RVO: A review of retinal venous occlusion’, Clinical and Experimental Optometry, vol. 97, no. 4, pp. 311-323, DOI: https://doi.org/10.1111/cxo.12120.

Madhura Jagannath Paranjpe & M N Kakatkar 2014, ‘Diabetic Retinopathy Detection and Severity Classification’, International Journal of Research in Engineering and Technology, vol. 03, no. 03, pp. 619–624.

Madhusudhana, KC & Newsom, RSB 2007, ‘Central retinal vein occlusion: the therapeutic options’, Canadian Journal of Ophthalmology, vol. 42, no. 2, Elsevier, pp. 193–195, DOI: https://doi.org/10.3129/can j ophthalmol.i07-011.

Mahmud, M, Kaiser, MS, Hussain, A & Vassanelli, S 2018, ‘Applications of Deep Learning and Reinforcement Learning to Biological Data’, IEEE Transactions on Neural Networks and Learning Systems, vol. 29, no. 6, pp. 2063 - 2079, DOI: https://doi.org/10.1109/TNNLS.2018.2790388.

Mahmud, M & Vassanelli, S 2016, ‘Processing and Analysis of Multichannel Extracellular Neuronal Signals : State-of-the-Art and Challenges’, Frontier of Neuroscience, vol. 10, no. 16, DOI: https://doi.org/10.3389/fnins.2016.00248.

Mahmudi, T, Kafieh, R, Rabbani, H, Mehri Dehnavi, A & Akhlagi, M 2014, ‘Comparison of macular OCTs in right and left eyes of normal people’, Progress in Biomedical Optics and Imaging - Proceedings of SPIE, vol. 9038, DOI: https://doi.org/10.1117/12.2044046.

Maji, D, Santara, A, Mitra, P & Sheet, D 2016, ‘Ensemble of Deep Convolutional Neural Networks for Learning to Detect Retinal Vessels in Fundus Images’, viewed .

Mandelbrot, BB & Wheeler, JA 1983, ‘Fractals and the Geometry of Nature’, American Journal of Physics, vol. 51, no. 3, DOI: https://doi.org/10.1119/1.13295.

Swinburne University of Technology Sarawak Campus | References 208

THESIS – DOCTOR OF PHILOSOPHY (PHD)

Marwala, T 2018, ‘Support Vector Machines’, Handbook of Machine Learning, DOI: https://doi.org/10.1142/11013.

Marx, V 2013, ‘The big challenges of big data’, Nature, vol. 498, no. 7453, pp. 255– 260, DOI: hhttps://doi.org/10.1038/498255a.

McAllister, IL 2012, ‘Central retinal vein occlusion: a review’, Clinical & Experimental Ophthalmology, vol. 40, no. 1, pp. 48-58, DOI: https://doi.org/10.1111/j.1442-9071.2011.02713.x.

Mizutani, A, Muramatsu, C, Hatanaka, Y, Suemori, S, Hara, T & Fujita, H 2009, ‘Automated microaneurysm detection method based on double ring filter in retinal fundus images’, vol. 7260, p. 72601N, DOI: https://doi.org/10.1117/12.813468

Mookiah, Muthu Rama Krishnan, Acharya, UR, Chua, CK, Lim, CM, Ng, EYK & Laude, A 2013, ‘Computer-aided diagnosis of diabetic retinopathy: A review’, Computers in Biology and Medicine, vol. 43, no. 12, pp. 2136–2155, DOI: https://doi.org/10.1016/j.compbiomed.2013.10.007.

Mookiah, M. R.K., Acharya, UR, Martis, RJ, Chua, CK, Lim, CM, Ng, EYK & Laude, A 2013, ‘Evolutionary algorithm based classifier parameter tuning for automatic diabetic retinopathy grading: A hybrid feature extraction approach’, Knowledge-Based Systems, vol. 39, pp. 9–22, DOI: https://doi.org/10.1016/j.knosys.2012.09.008.

Nayak, J, Bhat, PS, Acharya U, R, Lim, CM & Kagathi, M 2008, ‘Automated identification of diabetic retinopathy stages using digital fundus images’, Journal of Medical Systems, vol. 32, no. 2, pp. 107–115, DOI: https://doi.org/10.1007/s10916-007-9113-9.

Nentwich, MM 2015, ‘Diabetic retinopathy - ocular complications of diabetes mellitus’, World Journal of Diabetes, vol. 6, no. 3, pp. 489-499, DOI: https://doi.org/10.4239/wjd.v6.i3.489.

Niemeijer, M, Van Ginneken, B, Russell, SR, Suttorp-Schulten, MSA & Abràmoff, MD 2007, ‘Automated detection and differentiation of drusen, exudates, and cotton-wool spots in digital color fundus photographs for diabetic retinopathy

Swinburne University of Technology Sarawak Campus | References 209

THESIS – DOCTOR OF PHILOSOPHY (PHD)

diagnosis’, Investigative Ophthalmology and Visual Science, vol. 48, no. 5, pp. 2260–2267, DOI: https://doi.org/10.1167/iovs.06-0996.

Niemeijer, M, Van Ginneken, B, Staal, J, Suttorp-Schulten, MSA & Abràmoff, MD 2005, ‘Automatic detection of red lesions in digital color fundus photographs’, IEEE Transactions on Medical Imaging, vol. 24, no. 5, pp. 584-592, DOI: https://doi.org/10.1109/TMI.2005.843738.

Noma, H 2013, ‘Clinical Diagnosis in Central Retinal Vein Occlusion’, Journal of Medical Diagnostic Methods, vol. 02, no. 02, pp. 2–5, DOI: https://doi.org/10.4172/2168-9784.1000119.

Oliveira, CM, Cristóvão, LM, Ribeiro, ML & Abreu, JRF 2011, ‘Improved automated screening of diabetic retinopathy’, Ophthalmologica, vol. 226, no. 4, pp. 191- 197, DOI: https://doi.org/10.1159/000330285.

Ortíz, D, Cubides, M, Suárez, A, Zequera, M, Quiroga, J, Gómez, J & Arroyo, N 2010, ‘Support system for the preventive diagnosis of hypertensive retinopathy’, 2010 Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBC’10, pp. 5649–5652 , DOI: https://doi.org/10.1109/IEMBS.2010.5628047.

Osareh, A, Shadgar, B & Markham, R 2009, ‘A computational-intelligence-based approach for detection of exudates in diabetic retinopathy images’, IEEE Transactions on Information Technology in Biomedicine, vol. 13, no. 4, pp. 535– 545, DOI: https://doi.org/10.1109/TITB.2008.2007493.

Pappuru, RKR, Ribeiro, L, Lobo, C, Alves, D & Cunha-Vaz, J 2019, ‘Microaneurysm turnover is a predictor of diabetic retinopathy progression’, British Journal of Ophthalmology, vol. 103, no. 2, pp. 222-226, DOI:http://dx.doi.org/10.1136/bjophthalmol-2018-311887.

Park, SJ, Choi, NK, Yang, BR, Park, KH & Woo, SJ 2015, ‘Risk of stroke in retinal vein occlusion’, Neurology, vol. 85, no. 18, pp. 1578-1584, DOI: https://doi.org/10.1212/WNL.0000000000002085.

Pascanu, R, Gülçehre, Ç, Cho, K, Bengio, Y, Gulcehre, C, Cho, K & Bengio, Y 2013, ‘How to Construct Deep Recurrent Neural Networks’, CoRR.

Swinburne University of Technology Sarawak Campus | References 210

THESIS – DOCTOR OF PHILOSOPHY (PHD)

Peng, J-J, Xiong, S-Q, Ding, L-X, Peng, J & Xia, X-B 2019, ‘Diabetic retinopathy: Focus on NADPH oxidase and its potential as therapeutic target’, European Journal of Pharmacology, vol. 853, Elsevier, pp. 381–387, DOI: https://doi.org/10.1016/j.ejphar.2019.04.038.

Pesapane, F, Codari, M & Sardanelli, F 2018, ‘Artificial intelligence in medical imaging : threat or opportunity ? Radiologists again at the forefront of innovation in medicine’, European Radiology Experimental, vol. 2, no. 35, DOI: https://doi.org/10.1186/s41747-018-0061-6.

Peterson, L 2009, ‘K-nearest neighbor’, Scholarpedia.

Philip, S, Fleming, AD, Goatman, KA, Fonseca, S, Mcnamee, P, Scotland, GS, Prescott, GJ, Sharp, PF & Olson, JA 2007, ‘The efficacy of automated “disease/no disease” grading for diabetic retinopathy in a systematic screening programme’, British Journal of Ophthalmology, vol. 91. no. 11, pp. 1512-1517, DOI: https://doi.org/10.1136/bjo.2007.119453.

Pielen, A, Junker, B & Feltgen, N 2016, ‘Retinal Vein Occlusion’, Anti-Angiogenic Therapy in Ophthalmology, DOI: https://doi.org/10.1007/978-3-319-24097-8_7.

Popović, ZB & Thomas, JD 2017, ‘Assessing observer variability: a user’s guide’, Cardiovascular Diagnosis and Therapy, vol. 7, no. 3, pp. 317–324 , DOI: https://doi.org/10.21037/cdt.2017.03.12.

Prasad, DK, Vibha, L & Venugopal, KR 2016, ‘Early detection of diabetic retinopathy from digital retinal fundus images’, 2015 IEEE Recent Advances in Intelligent Computational Systems (RAICS), no. December, IEEE, pp. 240–245, DOI: https://doi.org/10.1109/RAICS.2015.7488421.

Quellec, G, Lamard, M, Josselin, PM, Cazuguel, G, Cochener, B & Roux, C 2008, ‘Optimal Wavelet Transform for the Detection of Microaneurysms in Retina Photographs’, IEEE Transactions on Medical Imaging, vol. 27, no. 9, pp. 1230– 1241, DOI: https://doi.org/10.1109/TMI.2008.920619.

Rabiner, L & Juang, B 1986, ‘An introduction to hidden Markov models’, IEEE ASSP Magazine.

Swinburne University of Technology Sarawak Campus | References 211

THESIS – DOCTOR OF PHILOSOPHY (PHD)

Rehak, J & Rehak, M 2008, ‘Branch retinal vein occlusion: pathogenesis, visual prognosis, and treatment modalities.’, Curr Eye Res, vol. 33, no. 2, pp. 111-131, DOI: https://doi.org/10.1080/02713680701851902.

Reza, AW & Eswaran, C 2011, ‘A decision support system for automatic screening of non-proliferative diabetic retinopathy’, Journal of Medical Systems, vol. 35, no. 1, pp. 17–24, DOI: https://doi.org/10.1007/s10916-009-9337-y.

Ribeiro, L, Oliveira, CM, Neves, C, Ramos, JD, Ferreira, H & Cunha-Vaz, J 2015, ‘Screening for diabetic retinopathy in the Central Region of Portugal. Added value of automated “disease/no disease” grading’, Ophthalmologica, vol. 233, pp. 96-103, DOI: https://doi.org/10.1159/000368426.

Riccardi, A, Siniscalchi, C & Lerza, R 2016, ‘Embolic Central Retinal Artery Occlusion Detected with Point-of-care Ultrasonography in the Emergency Department’, Journal of Emergency Medicine, vol. 50, no. 4, pp. e183-e185, DOI: https://doi.org/10.1016/j.jemermed.2015.12.022.

Rogers, S, McIntosh, RL, Cheung, N, Lim, L, Wang, JJ, Mitchell, P, Kowalski, JW, Nguyen, H, Wong, TY & International Eye Disease Consortium 2010, ‘The prevalence of retinal vein occlusion: pooled data from population studies from the United States, Europe, Asia, and Australia.’, Ophthalmology, vol. 117, no. 2, NIH Public Access, pp. 313–319.e1, DOI: https://doi.org/10.1016/j.ophtha.2009.07.017.

Rogers, SL, McIntosh, RL, Lim, L, Mitchell, P, Cheung, N, Kowalski, JW, Nguyen, HP, Wang, JJ & Wong, TY 2010, ‘Natural History of Branch Retinal Vein Occlusion: An Evidence-Based Systematic Review’, Ophthalmology, vol. 117, no. 6, pp. 1094-1101,e5, DOI: https://doi.org/10.1016/j.ophtha.2010.01.058.

Roychowdhury, S, Koozekanani, D & Parhi, K 2013, ‘DREAM: Diabetic Retinopathy Analysis using Machine Learning’, Biomedical and Health Informatics, IEEE Journal of, vol. 18, no. 5, pp. 1717 - 1728, DOI: https://doi.org/10.1109/JBHI.2013.2294635.

Ruba, T & Ramalakshmi, K 2015, ‘Identification and segmentation of exudates using SVM classifier’, ICIIECS 2015 - 2015 IEEE International Conference on

Swinburne University of Technology Sarawak Campus | References 212

THESIS – DOCTOR OF PHILOSOPHY (PHD)

Innovations in Information, Embedded and Communication Systems, IEEE, pp. 1–6.

Sabanayagam, C, Banu, R, Chee, ML, Lee, R, Wang, YX, Tan, G, Jonas, JB, Lamoureux, EL, Cheng, C-Y, Klein, BEK, Mitchell, P, Klein, R, Cheung, CMG & Wong, TY 2019, ‘Incidence and progression of diabetic retinopathy: a systematic review’, The Lancet Diabetes & Endocrinology, vol. 7, no. 2, Elsevier, pp. 140–149, viewed 27 July, 2019, DOI: https://doi.org/10.1016/S2213-8587(18)30128-1.

Sadek, I 2016, 'Automatic Discrimination of Color Retinal Images using the Bag of Words Approach', Proceedings Medical Imaging 2015: Computer-Aided Diagnosis, SPIE, vo. 9414, DOI: https://doi.org/10.1117/12.2075824.

Salakhutdinov, R & Hinton, G 2009, ‘Deep Boltzmann Machines’, 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2009.

Salamat, N, Missen, MMS & Rashid, A 2019, ‘Diabetic retinopathy techniques in retinal images: A review’, Artificial Intelligence in Medicine, vol. 97, Elsevier, pp. 168–188, DOI: https://doi.org/10.1016/j.artmed.2018.10.009.

Samer, CH, Rishi, K & Rowen 2015, ‘Image Recognition Using Convolutional Neural Networks’, Cadence Whitepaper, pp. 1–12.

Sánchez, CI, Hornero, R, López, MI, Aboy, M, Poza, J & Abásolo, D 2008, ‘A novel automatic image processing algorithm for detection of hard exudates based on retinal image analysis’, Medical Engineering & Physics, vol. 30, no. 3, pp. 350- 357, DOI: https://doi.org/10.1016/j.medengphy.2007.04.010.

Sánchez, CI, Hornero, R, Mayo, A & García, M 2009, ‘Mixture model-based clustering and logistic regression for automatic detection of microaneurysms in retinal images’, Proceedings of Medical Imaging 2009: Computer-Aided Diagnosis; 72601M, vol. 7260, DOI: https://doi.org/10.1117/12.81208.

Sánchez, CI, Mayo, A, García, M, López, MI & Hornero, R 2006, ‘Automatic Image Processing Algorithm to Detect Hard Exudates based on Mixture Models’, Ieee, vol. 1, pp. 4453–4456, DOI: https://doi.org/10.1109/IEMBS.2006.260434.

Swinburne University of Technology Sarawak Campus | References 213

THESIS – DOCTOR OF PHILOSOPHY (PHD)

Sánchez, CI, Niemeijer, M, Dumitrescu, A V, Suttorp-Schulten, MSA, Abràmoff, MD & van Ginneken, B 2011, ‘Evaluation of a computer-aided diagnosis system for diabetic retinopathy screening on public data.’, Investigative ophthalmology & visual science, vol. 52, no. 7, pp. 4866-4871, DOI: https://doi.org/10.1167/iovs.10-6633.

Saxe, AM, McClelland, JL & Ganguli, S 2013, ‘Exact solutions to the nonlinear dynamics of learning in deep linear neural networks’, CoRR.

Scanlon, PH 2019, ‘Diabetic retinopathy’, Medicine, vol. 47, no. 2, Elsevier, pp. 77– 85, DOI: https://doi.org/10.1159/000499539.

Schlegl Thomas, Vogl Wolf-Dieter, Langs Georg, Waldstein Sebastian, Gerendas Bianca, S-EU 2018, ‘Computerized device and method for processing image data’, US Patent Application Publication, viewed 8 April, 2019, .

Seiffert, C, Khoshgoftaar, TM, Van Hulse, J & Napolitano, A 2010, ‘RUSBoost: A hybrid approach to alleviating class imbalance’, IEEE Transactions on Systems, Man, and Cybernetics Part A:Systems and Humans, vol. 40, no. 1, IEEE, pp. 185–197, DOI: https://doi.org/10.1109/TSMCA.2009.2029559.

Semeraro, F, Cancarini, A, Dell’Omo, R, Rezzola, S, Romano, MR & Costagliola, C 2015, ‘Diabetic retinopathy: Vascular and inflammatory disease’, Journal of Diabetes Research, vol. 2015, DOI: https://doi.org/10.1155/2015/582060.

Shen, XF, Huang, P, Fox, DA, Lin, Y, Zhao, ZH, Wang, W, Wang, JY, Liu, XQ, Chen, JY & Luo, WJ 2016, ‘Adult lead exposure increases blood-retinal permeability: A risk factor for retinal vascular disease’, NeuroToxicology, vol. 57, pp. 145-152, DOI: https://doi.org/10.1016/j.neuro.2016.09.013.

Shiraishi, J, Li, Q, Appelbaum, D & Doi, K 2011, ‘Computer-aided diagnosis and artificial intelligence in clinical imaging’, Seminars in Nuclear Medicine, vol. 46, no. 1, pp. 449-462, DOI: https://doi.org/10.1053/j.semnuclmed.2011.06.004.

Sim, DA, Keane, PA, Tufail, A, Egan, CA, Aiello, LP & Silva, PS 2015, ‘Automated Retinal Image Analysis for Diabetic Retinopathy in Telemedicine’, Current Diabetes Reports, vol. 15, no.3, pp. 14-15 DOI: https://doi.org/10.1007/s11892-

Swinburne University of Technology Sarawak Campus | References 214

THESIS – DOCTOR OF PHILOSOPHY (PHD)

015-0577-6.

Simonyan, K, Andrew Zisserman & Zisserman, A 2014, ‘VGGNet’, ICLR, pp. 1–14, viewed .

Simonyan, K, Zisserman, A, Ruder, S, Pan, SJ, Yang, Q, OpenAI, Goodfellow, IJ, Warde-Farley, D, Mirza, M, Courville, A, Bengio, Y, Courville, A, Frossard, D, Clevert, D-A, Unterthiner, T, Hochreiter, S, Cook, A, Mnih, V, Badia, AP, Mirza, M, Graves, A, Lillicrap, TP, Harley, T, Silver, D, Kavukcuoglu, K, Yosinski, J, Clune, J, Bengio, Y, Lipson, H, Mnih, V, Kavukcuoglu, K, Silver, D, Rusu, AA, Veness, J, Bellemare, MG, Graves, A, Riedmiller, M, Fidjeland, AK, Ostrovski, G, Petersen, S, Beattie, C, Sadik, A, Antonoglou, I, King, H, Kumaran, D, Wierstra, D, Legg, S & Hassabis, D 2016, ‘CS231n Convolutional Neural Networks for Visual Recognition’, arXiv preprint arXiv:1511.07289.

Singalavanija, A, Supokavej, J, Bamroongsuk, P, Sinthanayothin, C, Phoojaruenchanachai, S & Kongbunkiat, V 2006, ‘Feasibility study on computer-aided screening for diabetic retinopathy’, Japanese Journal of Ophthalmology, vol. 50, no. 4, pp. 361–366 , DOI: https://doi.org/10.1007/s10384-005-0328-3.

Sinthanayothin, C, Kongbunkiat, V, Phoojaruenchanachai, S & Singalavanija, A 2003, ‘Automated Screening System for Diabetic Retinopathy’, Proceedings of the 3rd International Symposium on Image and Signal Processing and Analysis, 2003, ISPA, DOI: https://doi.org/10.1109/ISPA.2003.1296409.

Sivaprasad, S, Amoaku, WM, Hykin, P, Sivaprasad, Sobha, Amoaku, W, Williamson, T, Dodson, P, Talks, J, Talks, K, Bhan, K & Hykin, Philip 2015, ‘The Royal College of Ophthalmologists: Clinical Guidelines’, Eye, vol. 29, no. 12, pp. 1633–1638, DOI: https://doi.org/10.1038/eye.2015.164.

Smith, SW 2003, ‘Convolution’, Digital Signal Processing, ISBN: 0-9660176-6-8.

Solomon, SD, Chew, E, Duh, EJ, Sobrin, L, Sun, JK, VanderBeek, BL, Wykoff, CC & Gardner, TW 2017, ‘Diabetic retinopathy: A position statement by the American Diabetes Association’, Diabetes Care, vol. 40, no. 3, pp. 412-418, DOI: https://doi.org/10.2337/dc16-2641.

Swinburne University of Technology Sarawak Campus | References 215

THESIS – DOCTOR OF PHILOSOPHY (PHD)

Sopharak, A, Uyyanonvara, B, Barman, S & Williamson, TH 2008, ‘Automatic detection of diabetic retinopathy exudates from non-dilated retinal images using mathematical morphology methods’, Computerized Medical Imaging and Graphics, vol. 32, no. 8, pp. 720–727, DOI: https://doi.org/10.1016/j.compmedimag.2008.08.009.

Soto-Pedre, E, Navea, A, Millan, S, Hernaez-Ortega, MC, Morales, J, Desco, MC & Pérez, P 2015, ‘Evaluation of automated image analysis software for the detection of diabetic retinopathy to reduce the ophthalmologists’ workload’, Acta Ophthalmologica, vol. 93, no. 1, pp. e52-e56, DOI: https://doi.org/10.1111/aos.12481.

Spencer, T, Olson, JA, McHardy, KC, Sharp, PF & Forrester, J V. 1996, ‘An image- processing strategy for the segmentation and quantification of microaneurysms in fluorescein angiograms of the ocular fundus’, Computers and Biomedical Research, vol. 29, no. 4, pp. 284–302.

Sreng, S, Maneerat, N, Hamamoto, K & Panjaphongse, R 2019, ‘Cotton wool spots detection in diabetic retinopathy based on adaptive thresholding and ant colony optimization coupling support vector machine’, IEEJ Transactions on Electrical and Electronic Engineering, vol. 14, no. 6, pp. 884-893, DOI: https://doi.org/10.1002/tee.22878.

Sreng, S, Maneerat, N, Isarakorn, D, Hamamoto, K & Panjaphongse, R 2018, ‘Primary screening of diabetic retinopathy based on integrating morphological operation and support vector machine’, ICIIBMS 2017 - 2nd International Conference on Intelligent Informatics and Biomedical Sciences, vol. 2018- Janua, pp. 250–254, DOI: https://doi.org/10.1109/ICIIBMS.2017.8279750.

Staal, J, Abràmoff, MD, Niemeijer, M, Viergever, MA & Van Ginneken, B 2004, ‘Ridge-based vessel segmentation in color images of the retina’, IEEE Transactions on Medical Imaging, vol. 23, no. 4, pp. 501-509, DOI: https://doi.org/10.1109/TMI.2004.825627.

Stein, HA, Stein, RM & Freeman, MI 2006, The Ophthalmic Assistant, ISBN: 9780323394772

Swinburne University of Technology Sarawak Campus | References 216

THESIS – DOCTOR OF PHILOSOPHY (PHD)

Stitt, AW, Curtis, TM, Chen, M, Medina, RJ, McKay, GJ, Jenkins, A, Gardiner, TA, Lyons, TJ, Hammes, HP, Simó, R & Lois, N 2016, ‘The progress in understanding and treatment of diabetic retinopathy’, Progress in Retinal and Eye Research, vol. 51, pp. 156-186, DOI: https://doi.org/10.1016/j.preteyeres.2015.08.001.

Sudhir. S. Kanade 2015, ‘An Amalgamation-Based System for Micro Detection and Diabetic Retinopathy Grading’, Ijmer, vol. 5, no. 1, pp. 70–77.

Syahputra, MF, Nurrahmadayeni, Aulia, I & Rahmat, RF 2017, ‘Hypertensive retinopathy identification from retinal fundus image using probabilistic neural network’, Proceedings - 2017 International Conference on Advanced Informatics: Concepts, Theory and Applications, ICAICTA 2017, pp. 2–7, DOI: https://doi.org/10.1109/ICAICTA.2017.8090989.

Szegedy, C, Liu, W, Jia, Y, Sermanet, P, Reed, S, Anguelov, D, Erhan, D, Vanhoucke, V, Rabinovich, A, Farneb, G, Fleet, D, Weiss, Y, Campos, GO, Zimek, A, Sander, JJ, Campello, RJGBGB, Micenková, B, Schubert, E, Assent, I, Houle, ME, Hochreiter, J, Olah, C, Hawkins, J, Kandhari, R, Chandola, V, Banerjee, A, Kumar, V, Kandhari, R, Liu, C, Adviser-Freeman, WT, Adviser- Adelson, EH, Chen, Y, Qian, J, Saligrama, V, Moore, AW, Breiman, L, Manevitz, LM, Yousef, M, Cristianini, N, Shawe-Taylor, J, Williamson, B, Cortes, C, Vapnik, V, Zaremba, W, Sutskever, I, Vinyals, O, Schuster, M, Paliwal, KK, Liang, M, Hu, X, Le, Q V, Ranzato, MA, Devin, M, Corrado, GS, Ng, AY, Singh, A, Ahmad, S, Lavin, A, Purdy, S, Agha, Z, Zimek, A, Campello, RJGBGB, Sander, JJ, Hill, P, Zamirai, B, Lu, S, Chao, Y, Laurenzano, M, Samadi, M, Papaefthymiou, M, Mahlke, S, Wenisch, T, Deng, J, Tang, L, Mars, J, Karnin, Z, Liberty, E, Lovett, S, Schwartz, R, Weinstein, O, Hochreiter, S, Schmidhuber, JJ, Malhotra, P, Vig, L, Shroff, G, Agarwal, P, Medel, JR, Savakis, A, Taylor, A, Leblanc, S, Japkowicz, N, LeCun, Y, Bottou, L, Bengio, Y, Haffner, P, Szegedy, C, Liu, W, Jia, Y, Sermanet, P, Reed, S, Anguelov, D, Erhan, D, Vanhoucke, V, Rabinovich, A, LeCun, Y, He, K, Zhang, X, Ren, S, Sun, J, Sergey Ioffe, G, Christian Szegedy, G, Simonyan, K, Zisserman, A, Lin, M, Chen, Q, Yan, S, Krizhevsky, A, Sutskever, I, Geoffrey E., H, Srivastava, N, Hinton, G, Krizhevsky, A, Sutskever, I, Salakhutdinov, R, Bengio, Y, LeCun, Y,

Swinburne University of Technology Sarawak Campus | References 217

THESIS – DOCTOR OF PHILOSOPHY (PHD)

Boser, B, Denker, JS, Henderson, D, Howard, RE, Hubbard, W, Jackel, LD, Kingma, DP & Ba, JL 2015, ‘GoogleNet’, Journal of Machine Learning Research.

Szegedy, C, Liu, W, Jia, Y, Sermanet, P, Reed, S, Anguelov, D, Erhan, D, Vanhoucke, V, Rabinovich, A, Hill, C & Arbor, A 2014, ‘Going Deeper with Convolutions’, pp. 1–9, DOI: https://doi.org/10.1109/CVPR.2015.7298594.

Tah, V, Orlans, HO, Hyer, J, Casswell, E, Din, N, Sri Shanmuganathan, V, Ramskold, L & Pasu, S 2015, ‘Anti-VEGF therapy and the retina: An update’, Journal of Ophthalmology, vol. 2015, DOI: https://doi.org/10.1155/2015/627674.

Takahashi, R & Kajikawa, Y 2017, ‘Computer-aided diagnosis: A survey with bibliometric analysis’, International Journal of Medical Informatics, vol. 101, Elsevier, pp. 58–67, DOI: https://doi.org/10.1016/j.ijmedinf.2017.02.004.

Talu, S, Calugaru, DM & Lapascu, CA 2015, ‘Characterisation of human non- proliferative diabetic retinopathy using the fractal analysis’, International Journal Of Ophthalmology, vol. 8, no. 4, pp. 770–776, DOI: https://doi.org/10.3980/j.issn.2222-3959.2015.04.23.

Tang, L, Niemeijer, M, Reinhardt, JM, Garvin, MK & Abramoff, MD 2013, ‘Splat feature classification with application to retinal hemorrhage detection in fundus images’, IEEE Transactions on Medical Imaging, vol. 32, no. 2, pp. 364–375, DOI: https://doi.org/10.1109/TMI.2012.2227119.

Tarca, AL, Carey, VJ, Chen, X, Romero, R & Drăghici, S 2007, ‘Machine Learning and Its Applications to Biology’, PLoS Computational Biology, vol. 3, no. 6, p. e116, DOI: https://doi.org/10.1371/journal.pcbi.0030116.

Usher, D, Dumskyj, M, Himaga, M, Williamson, TH, Nussey, S & Boyce, J 2003, ‘Automated detection of diabetic retinopathy in digital retinal images : a tool for diabetic retinopathy screening’, Diabetic Medicine, vol. 21, no.1, pp. 84–90.

Usman Akram, M, Khalid, S, Tariq, A, Khan, SA & Azam, F 2014, ‘Detection and classification of retinal lesions for grading of diabetic retinopathy’, Computers in Biology and Medicine, Elsevier, vol. 45, no. 1, pp. 161–171, DOI: https://doi.org/10.1016/j.compbiomed.2013.11.014.

Swinburne University of Technology Sarawak Campus | References 218

THESIS – DOCTOR OF PHILOSOPHY (PHD)

Vedaldi, A & Lenc, K 2014, ‘MatConvNet - Convolutional Neural Networks for MATLAB’, viewed .

Vincent PASCALVINCENT, P & Larochelle LAROCHEH, H 2010, ‘Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion Pierre-Antoine Manzagol’, Journal of Machine Learning Research, vol. 11, pp. 3371-3408, .

Walter, T, Massin, P, Erginay, A, Ordonez, R, Jeulin, C & Klein, JC 2007, ‘Automatic detection of microaneurysms in color fundus images’, Medical Image Analysis, vol. 11, no. 6, pp. 555–566, DOI: https://doi.org/10.1016/j.media.2007.05.001.

Wan, TT, Li, XF, Sun, YM, Li, YB & Su, Y 2015, ‘Recent advances in understanding the biochemical and molecular mechanism of diabetic retinopathy’, Biomedicine and Pharmacotherapy, vol. 74, pp. 145-147, DOI:http://dx.doi.org/10.1016/j.biopha.2015.08.002.

Wiatowski, T & Bölcskei, H 2018, ‘A Mathematical Theory of Deep Convolutional Neural Networks for Feature Extraction’, IEEE Transactions on Information Theory, vol. 64, no. 3, pp. 1845 - 1866, DOI:http://dx.doi.org/10.1109/TIT.2017.2776228.

Wickware, P 2000, ‘Next-generation biologists must straddle computation and biology’, Nature, vol. 404, no. 6778, pp. 683–684, DOI:http://dx.doi.org/10.1038/35007262.

Wild S, Roglic G, Green A, Sicree R, KH 2004, ‘Global Prevalence of Diabetes: Estimates for the Year 2000 and Projections for 2030’, Diabetes Care, vol. 27, no. 10, pp. 1047–1053, OI:http://dx.doi.org/10.2337/diacare.27.5.1047.

Willard, C 2019, ‘Chi-Square’, Statistical Methods.

Win, KY & Choomchuay, S 2017, ‘Automated detection of exudates using histogram analysis for Digital Retinal Images’, 2016 International Symposium on Intelligent Signal Processing and Communication Systems, ISPACS 2016, IEEE, pp. 1–6, DOI:http://dx.doi.org/10.1109/ISPACS.2016.7824768.

Swinburne University of Technology Sarawak Campus | References 219

THESIS – DOCTOR OF PHILOSOPHY (PHD)

Wolf, TMA 2019, ‘Cascaded convolutional neural network’, US Patent Application Publication, viewed 9 April, 2019, .

Wong, TY, Larsen, EKM, Klein, R, Mitchell, P, Couper, DJ, Klein, BEK, Hubbard, LD, Siscovick, DS & Sharrett, AR 2005, ‘Cardiovascular risk factors for retinal vein occlusion and arteriolar emboli: The atherosclerosis risk in communities & cardiovascular health studies’, Ophthalmology, vol. 112, no. 4, pp. 540-547, DOI:http://dx.doi.org/10.1016/j.ophtha.2004.10.039.

Woo, SCY, Lip, GYH & Lip, PL 2016, ‘Associations of retinal artery occlusion and retinal vein occlusion to mortality, stroke, and : A systematic review’, Eye (Basingstoke), vol. 30, no. 8, pp. 1031-1038, DOI:http://dx.doi.org/10.1038/eye.2016.111.

Xiao Zhitao, Li Feng, ZF & Zhang, YJ 2015, ‘Hard Exudates Detection Method Based on Background-Estimation’, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 9218, pp. 361–372 , DOI: https://doi.org/10.1007/978-3- 319-21963-9_33.

Yanase, J & Triantaphyllou, E 2019, ‘A Systematic Survey of Computer-Aided Diagnosis in Medicine: Past and Present Developments’, Expert Systems with Applications, Pergamon, p. 112821, DOI: https://doi.org/10.1016/j.eswa.2019.112821.

Yannuzzi, LA, Ober, MD, Slakter, JS, Spaide, RF, Fisher, YL, Flower, RW & Rosen, R 2004, ‘Ophthalmic fundus imaging: Today and beyond’, American Journal of Ophthalmology, vol. 137, no. 3, pp. 511-524, DOI: https://doi.org/10.1016/j.ajo.2003.12.035.

Yen, GG & Leong, WF 2008, ‘A sorting system for hierarchical grading of diabetic fundus images: A preliminary study’, IEEE Transactions on Information Technology in Biomedicine, vol. 12, no. 1, pp. 118–130, DOI: https://doi.org/10.1109/TITB.2007.910453.

Yuan, GX, Ho, CH & Lin, CJ 2012, ‘Recent advances of large-scale linear

Swinburne University of Technology Sarawak Campus | References 220

THESIS – DOCTOR OF PHILOSOPHY (PHD)

classification’, Proceedings of the IEEE, vol. 100, no. 9, pp. 2584-2603, DOI: https://doi.org/10.1109/JPROC.2012.2188013.

Yun, WL, Rajendra Acharya, U, Venkatesh, Y V., Chee, C, Min, LC & Ng, EYK 2008, ‘Identification of different stages of diabetic retinopathy using retinal optical images’, Information Sciences, vol. 178, no. 1, pp. 106–121, DOI: https://doi.org/10.1016/j.ins.2007.07.020.

Zeiler, MD & Fergus, R 2014, ‘Visualizing and understanding convolutional networks’, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 8689, pp. 818-833, DOI: https://doi.org/10.1007/978-3-319-10590-1_53.

Zhang, B, Wu, X, You, J, Li, Q & Karray, F 2010, ‘Detection of microaneurysms using multi-scale correlation coefficients’, Pattern Recognition, vol. 43, no. 6, pp. 2237–2248 , DOI: https://doi.org/10.1016/j.patcog.2009.12.017.

Zhang, H, Chen, Z, Chi, Z & Fu, H 2014, ‘Hierarchical local binary pattern for branch retinal vein occlusion recognition with fluorescein angiography images’, Electronics Letters, vol. 50, no. 25, pp. 1902-1904 , DOI: https://doi.org/10.1049/el.2014.2854.

Zhang, X, Thibault, G, Decencière, E, Marcotegui, B, Laÿ, B, Danno, R, Cazuguel, G, Quellec, G, Lamard, M, Massin, P, Chabouis, A, Victor, Z & Erginay, A 2014, ‘Exudate detection in color retinal images for mass screening of diabetic retinopathy’, Medical Image Analysis, vol. 18, no. 7, Elsevier B.V., pp. 1026– 1043, DOI: https://doi.org/10.1016/j.media.2014.05.004.

Zhao, R, Chen, Z & Chi, Z 2015, ‘Convolutional Neural Networks for Branch Retinal Vein Occlusion recognition?’, 2015 IEEE International Conference on Information and Automation, August, pp. 1633–1636 , DOI: https://doi.org/10.1109/ICInfA.2015.7279547.

Zhou, W, Wu, C & Yu, X 2018, ‘Computer Aided Diagnosis for Diabetic Retinopathy based on Fundus Image’, 2018 37th Chinese Control Conference (CCC), Technical Committee on Control Theory, Chinese Association of Automation, pp. 9214–9219 , DOI: https://doi.org/10.1155/2019/6142839.

Swinburne University of Technology Sarawak Campus | References 221

THESIS – DOCTOR OF PHILOSOPHY (PHD)

Zode, JJ 2017, ‘Detection of Branch Retinal Vein Occlusion using Fractal Analysis’, vol. 162, no. 8, pp. 28–32, DOI: https://doi.org/10.33130/asian%20journals.v3i3.228.

Zuiderveld, K 1994, ‘Contrast Limited Adaptive Histogram Equalization’, Graphics Gems, pp. 474–485, ISBN: 0-12-336155-9.

Zurada, J 1992, Introduction to artificial neural systems, Mol Cancer Res, ISBN: 0- 314-93391-3.

STARE Database: http://cecas.clemson.edu/~ahoover/stare/

DRIVE Database: https://www.isi.uu.nl/Research/Databases/DRIVE/

MESSIDOR Database: http://www.adcis.net/en/third-party/messidor/

Kaggle Database: https://www.kaggle.com/c/diabetic-retinopathy-detection/data

Retina Image Bank Database: https://imagebank.asrs.org/

Dr. Hossein Rabbani Database: https://sites.google.com/site/hosseinrabbanikhorasgani/datasets-1

Swinburne University of Technology Sarawak Campus | References 222

THESIS – DOCTOR OF PHILOSOPHY (PHD)

Swinburne University of Technology Sarawak Campus | References 223