INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH VOLUME 8, ISSUE 11, NOVEMBER 2019 ISSN 2277-8616

Deep Learning Pre-Trained Architecture Of Alex Net And Googlenet For DICOM Image Classification P.Haripriya, R.Porkodi

Abstract: is a subset of and it is dedicated to the development of machines which would learn based on the given inputs and eventually attaining inspired by the human brain. This learning model is used to extract the complicated features from query image and increase the classification performance. In the medical domain, medical image classification is the descriptiveness and discriminative power of features extraction are critical to attain good classification performance by using traditional algorithms. Recently, Deep Learning have resulted in significant performance of medical image classification by use of the Deep Convolutional Neural Network. In this paper, Pre-trainined DCNN architecture such as AlexNet and GoogleNet are implemented and analyzed the classification performance. Pre-trained Networks are used to easily customize the model for own data set, provide state-of-the-art performance and easy access. This Experimental results are used four different significant ratio of training and testing dataset like 50:50, 60:40, 70:30 and 80:20 respectively for AlexNet and GoogleNet. The obtained result achieved the highest classification accuracy of GoogleNet is 97.02% with error rate is 0.01 in particular ratio of 70:30 when compared with other ratios and AlexNet performance results.

Keywords: Deep Learning, DCNN, AlexNet, GoogleNet, Image Classification ——————————◆——————————

1. INTRODUCTION on Tensorflow, , Python along with Matplot lib for A recent advance of medical imaging devices has led to plotting data visualization. The hello world implementation producing the massive amount of image data in every day used Jupyter Notebook to build pretrained Inception v3 [1]. The medical images are stored in Digital Image Network which has accuracy greater than 94% with 65 Communication and Medicine (DICOM) format which is a training cases. The network was built for abdominal or chest very complex object due to store image data along with Meta radiograph using binary cross-entropy. This provided a data information [2]. Thus, the medical image classification foundation for experimenting deep learning on data mining is very important for diagnose and treatment purpose which projects in medical imaging. Maruyama et al., [7] presented is used to classify the historical medical images from the the comparative study for medical image classification based huge amount of data sets. So it needs an efficient medical on the clinical image inspection. The three methods which image classification architecture to classify the medical were used to compare namely SVM, ANN and CNN. Single images along with a high accuracy rate. [3] Recent Dataset from both DICOM and JPEG images were used for development of Deep learning has found many applications evaluating the results of the 3 machine learning methods. in various fields not limited to , Natural JPEG was understandable having less color information Language Processing, Image Processing and Automatic than the DICOM format. CNN found to be more accurate . This would predominantly use both when compared with SVM and ANN. These experiments supervised and with the combination performed primarily to distinguish the influence of the quality of parametric and non-parametric models. Supervised of the medical images with respect to the machine learning parametric models would be based trial and error. For image methods and tools. Zhiyun Xue et al., [8] presented a method classification, Deep Convolutional Neural Network (DCNN) for automatically identifying the gender of a person using is widely used for medical image classification and gives the front chest X-ray images. Proposal adopts CNN based deep better results compared with traditional methods. [4] This learning and transfer learning for managing the features in architecture, mainly involve the layers of , limited data. This research was inspired from datasets which Pooling layers and Fully connected layers at the end. It is was not having gender information. The sequence of steps proficient for the building of an extraction of features which is involved in the experiments were pre-processing, CNN accomplished of resolving the problems occurred in Feature extractor, Feature Selection and Classification. The classification by conventional methods. [5] The feature dataset was formed by combining data from different sources extractor of the integrated model should be able to learn to bring variety. The features extraction were tested and extracting the differentiating features from the training set of compared against Alexnet, Vggnet, GoogLENet and images accurately. ReseNet and for classification SVM and Random Forest were used. The VGGNet Feature extractor with SVM gave II. LITERATURE SURVEY classification accuracy of 86.6%. Maria Tzelepi et al., [9] Paras Lakhani et al., [6] provided a tutorial for larger developed a model retraining method for efficient audience who are interested in implementing the deep convolutional representations for content based image learning for image analysis and processing. It provides an retrieval. It proposed three retraining approaches such as full overview of the steps involved in building a deep neural unsupervised retraining, Retraining with relevance network for medical classification. The implementation done information and Relevance feedback based retraining. CNN models were used to obtain the convolution representation 3130 IJSTR©2019 www.ijstr.org INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH VOLUME 8, ISSUE 11, NOVEMBER 2019 ISSN 2277-8616 and build target approach for each retraining strategy. Paris The pre-trained networks are already trained in large volume 6K, UKBench, UKBench-2 datasets were used for of benchmark datasets with more number of classes and conducting the experiments. The experiments were further repurposed to extract and learn the features. The implemented using the Caffe Deep Learning framework and extracted features are transfer them to a target network to be CONV4 layers were used for feature extraction and the trained on the target dataset. For example, ImageNet which ReLU was replaced by PRELU layers initialized is very good for normal images can be used for training with randomly. The results were benchmarked on Precision and initial weights. Even though there might be a difference Recall results found to be satisfactory except for RRI and FU between medical image and normal images, still it would be approach. Cong Bai et al., [10] develpoed a novel method to useful to uncover generic representations which helps in address efficiency issues of large scale image retrieval. They classification networks and training convergence faster. This proposed the DCNN framework to improve its ability for study focuses on two pre-trained DCNN architecture such as feature extraction and its efficiency for similarity AlexNet and GoogleNet for DICOM image dataset. measurement. The focus of the research is to optimize AlexNet from three perspectives including Pooling Layer, AlexNet Fully Connected Layer and Hidden Layer. It also maps the The architecture used in the paper published by Alex high-dimensional features vectors to low dimensional hash Kriszhevsky in 2012 is popularly called Alexnet [11]. It code by adding hidden layer to improve the retrieval primarily solved the problem of image classification where efficiency. Upon the experiments, the efficiency of the the input image could belong 1000 different classes and implementation was done based on Retrieval time analysis output would be vector of 1000 numbers. The ith element of with three benchmark datasets MNIST, CIFAR10 and the output vector would be the probability that input image SUN397. The extraction time of images were at 0.79s, 0.48s belongs to the ith class in Network. The input image would be and 5.50s on the respective datasets and performance was in RGB format with the size of 256 x 256 and consists of 60 evaluated using Precision and mAP. million parameters and 650000 neurons. In figure 1 shows that the high level architecture of AlexNet. III. Pre – trained Network

Figure 1: High level architecture of Alexnet

AlexNet model mainly contains eight layers such as five for when occurre less than zero in matrix. Alexnet also solves Convolutional Layers and three for Fully Connected Layers. the problem of over fitting by applying dropout layer after It uses ReLU (Rectified Linear Unit) for the non-linear part every fully connected layers. Dropout layer has a probability and it represented by the following equation. associated with it which is applied to every neuron of the f(x) = max(0, x) response map independently. This is implemented based on The benefit of using ReLu over sigmoid is that it would help ensembles, with the help of dropout layers different set of train faster as sigmoid tend to become small in the saturating neurons are switched off and would represent different region. This causes the updates to the weights to disappear architecture which are trained in parallel with weight given to which is referred to as vanishing gradient problem. In each subset and summation of weight being one. For n AlexNet, ReLu put after each and every convolutional layers neurons attached to dropout the number of subset and replace the positive value architecture formed would be 2^n. This tends to avoid

developing co-adaptations themselves thereby to develop Alexnet [12]. Network consumes less CPU and memory meaningful features independent of others. compared to Alexnet and it is a pretrained convolutional neural network that is 22 layers deep. The primary intent of 3.2 GoogLeNet the GoogLeNet was the minimize the uniformly increasing GoogLeNet alternatively called as Inception VI is a CNN networks size and proportionally increasing computational model built by google and it is the first Deep network was resources, because any uniform increase in the number of available in 2014 which focused on efficiency and faster their filters results in a quadratic increase of computation. processing. Prior to GoogLeNet there were Alexnet and This kind of design unnecessarily wastes the computational oxfordnet and it consumes 12 times lesser parameters than resources and larger the network it also leads to

3131 IJSTR©2019 www.ijstr.org INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH VOLUME 8, ISSUE 11, NOVEMBER 2019 ISSN 2277-8616 overfitting. The solution proposed by GoogLeNet is to move the help of something called inception module shows in on to sparsely connected network architectures which will figure 2. Inception module mainly used to design a good local replace fully connected network architectures, inside the network topology and modules are stack on top of all layer. convolutional layers. This implementation was proposed with

Figure 2: Inception Module

The network was designed with computational efficiency and MRI. The 22 classes names are Brain, Breast, Liver, practicality in mind, so that inference can be run on individual Stomach, Leg, Kidney, chest, Heart, Cervix, Uterus, Thyroid, devices including even those with limited computational Rectum, Prostate, Phantom, Pancreas, Ovary, Lung, Head resources, especially with low-memory footprint. This neck, Head, Colon, Chest and Bladder. The DICOM images architecture is 22 layers deep without pooling and 27 layers are stored in “.dcm” format and each image size is 512KB. are used when counting of pooling. The total number of The entire images are resized with 227 x 227 and convert to layers (independent building blocks) used for the building of gray scale image in preprocessing. After pre processing, the the network is about 100. Given the comparatively large pre –trained networks are loaded and retrain the network for depth of the network, the ability to propagate gradients back DICOM image dataset. Randomly, the dataset images are through each layers in an effective manner was a concern. divided into a given ratio for training ang testing. In training In interesting discernment is that the robust performance of network process, the following parameters are initialized relatively shallower networks on this task recommends that such as mini – batch size is 10 which is used to allocate the the features generated through layers in the network middle howmany images to use in each iteration, Epochs value is 5 should be very discriminative. Through adding classifiers which is full training cycle of the entire training dataset, associated to these intermediate layers and it would suppose validation data and validation frequency which is used to to encourage discrimination in the lower stages in the calculate the validation accuracy of each epochs. Finaly, classifier, increase the gradient signal that gets propagated turn on the training plot to visualize the progress while train. back, and provide additional regularization. In training, In figure 3 shows that the training plot progressing report for losses are added to the total loss of the network with a different ratios of training and testing in AlexNet and 4 figure reduction weight (the losses of the auxiliary classifiers were shows that the training plot progressing report for different weighted by 0.3) and at inference time, these auxiliary ratio of training and testing in GoogleNet. In table 1 networks are discarded. Although we used CPU based represented the performance evaluations of AlexNet and implementation only, a rough estimate suggests that the GoogleNet and figure 5 shows that the classification GoogLeNet network could be trained to convergence using performance of AlexNet and GoogleNet for each ratio of few high-end GPUs within a week, the main limitation being training and testing dataset. The AlexNet and GoogleNet the memory usage. classification accuracy for different ratio of 50:50,60:40,70:30, 80:20 value is 94.18%, 88.98%, 95.56%, IV. Experimental Results 94.32% and 91.45%, 96.06%, 97.02%, 96.36% repectively. This study experimented on DICOM image dataset which Fom this obesevation, the GoogleNet obtained the higher consists of 6600 images with 22 classes and each class accuracy when compared with AlexNet architecture. contain 300 images with three Modality such as CT,USG and

3132 IJSTR©2019 www.ijstr.org INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH VOLUME 8, ISSUE 11, NOVEMBER 2019 ISSN 2277-8616

50:50 60:40

70:30 80:20

Figure 3. AlexNet training progress results with different radio of dataset

3133 IJSTR©2019 www.ijstr.org INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH VOLUME 8, ISSUE 11, NOVEMBER 2019 ISSN 2277-8616

50:50 60:40

70:30 80:20

Figure 4. GoogleNet training progress results with different radio of dataset

3134 IJSTR©2019 www.ijstr.org INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH VOLUME 8, ISSUE 11, NOVEMBER 2019 ISSN 2277-8616

Network Training and Testing Data set Accuracy Execution Time Loss rate 50:50 94.18% 42min 39sec 0.02 Alexnet 60:40 88.98% 49min 25sec 0.03 70:30 95.56% 43min 33sec 0.01 80:20 94.32% 55min 45sec 0.01 GoogleNet 50:50 91.45% 82min 2 sec 0.01 60:40 96.06% 86 min 59sec 0.01 70:30 97.02% 93 min 47sec 0.01 80:20 96.36% 103 min 3 sec 0.00

Table 1. Performance Evaluvation of AlexNet and GoogleNet 100. 90. 80. 70. 60. 50,50 60,40 50. 70,30 40. 80,20 30. 20. 10. 0. Alexnet GooleNet Figure 5. Classification Accuracy of Alexnet and GoogleNet

V. CONCLUSION [4] Nima Tajbakhsh, Member, IEEE, Jae Y. Shin, Inspired by the recent success of DCNN architecture, in this Suryakanth R. Gurudu, R. Todd Hurst, study analyzed the DICOM image classification performance “Convolutional Neural Networks for Medical Image by pre-trained networks such as AlexNet and GoogleNet. It Analysis: Full Training or Fine Tuning?”, Ieee is systematically examined with the different significant ratio Transactions On Medical Imaging, Vol. 35, No. 5, of training and testing data sets for classification and May 2016 compared the AlexNet and GoogleNet performance. [5] Lakhani, P., Gray, D. L., Pett, C. R., Nagy, P., & Experimental results demonstrate that training and testing Shih, G. (2018). Hello World Deep Learning in ratio of 70: 30 produced the better classification accuracy is Medical Imaging. Journal of Digital 97.02% with loss error rate is 0.01 by GooleNet when Imaging. https://doi.org/10.1007/s10278-018-0079- compared with AlexNet for DICOM image dataset. It is 6 observed that the training and testing ratio of 70 and 30 is [6] M Manoj krishna, M Neelima , M Harshali , M Venu well trained and validated the given inputs compared with Gopala Rao, “Image classification using Deep other ratio. Further work to extend the study on other DCNN learning”, International Journal of Engineering & architecture models for DICOM image classification and Technology, Volume: 7 (2.7) (2018) 614-617, 2018 retrieval. [7] Maruyama, T., Hayashi, N., Sato, Y., Hyuga, S., Wakayama, Y., Watanabe, H., … Ogura, T. (2018). REFERENCES Comparison of medical image classification [1] Zehra cammlica, H.R Tizhoosh and Farzad accuracy among three machine learning Khalvati, “Medical Image Classification via SVM methods. Journal of X-Ray Science and using LBP Features from Saliency-Basd Folded Technology. https://doi.org/10.3233/XST-18386 Data”, International conference on Machine [8] Xue, Z., Antani, S., Long, R., & Thoma, G. R. (2018). Learning and Applications, IEEE, 2015 Using deep learning for detecting gender in adult [2] P. Haripriya, R, Porkodi, “A Survey Paper on Data chest mining Techniques and Challenges in Distributed radiographs. https://doi.org/10.1117/12.2293027 DICOM “, International Journal of Advanced [9] Tzelepi, M., & Tefas, A. (2018). Deep convolutional Research in Computer and Communication learning for Content Based Image Engineering Vol. 5, Issue 3, March 2016 Retrieval. Neurocomputing. https://doi.org/10.1016/ [3] Qing Li ; Weidong Cai ; Xiaogang Wang ; Yun Zhou j.neucom.2017.11.022 ; David Dagan Feng ; Mei Chen, Medical image [10] Bai, C., Huang, L., Pan, X., Zheng, J., & Chen, S. classification with convolutional neural network, (2018). Optimization of deep convolutional neural International Conference on Control Automation network for large scale image Robotics & Vision (ICARCV), 2014 retrieval. Neurocomputing. https://doi.org/10.1016/j. neucom.2018.04.034 3135 IJSTR©2019 www.ijstr.org INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH VOLUME 8, ISSUE 11, NOVEMBER 2019 ISSN 2277-8616

[11] Alex Krizhevsky, I. S. (2012). ImageNet Classification with Deep Convolutional Neural Networks. NIPS'12 Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1, (pp. 1097-1105). Lake Tahoe, Nevada. [12] Ashutosh Singla, Lin Yuan lin, Ebrahimi, “Food/Non-food Image Classification and Food Categorization using Pre-Trained GoogLeNet Model”, Proceeding MADiMa '16 Proceedings of the 2nd International Workshop on Multimedia Assisted Dietary Management, Pages 3-11, 2016

3136 IJSTR©2019 www.ijstr.org