<<

Article Convolutional Neural Networks with Transfer Learning for Recognition of COVID-19: A Comparative Study of Different Approaches

Tanmay Garg 1, Mamta Garg 2, Om Prakash Mahela 3 and Akhil Ranjan Garg 4,* 1 Department of Electrical Engineering, Punjab Engineering College (Deemed to be University), Chandigarh 160012, India; [email protected] 2 Department of Computer Science and Engineering, Jodhpur Institute of Engineering and Technology, Jodhpur 342001, India; [email protected] 3 Power System Planning Division, Rajasthan Rajya Vidyut Prasaran Nigam Ltd., Jaipur 302005, India; [email protected] 4 Department of Electrical Engineering, Jai Narain Vyas University, Jodhpur 342001, India * Correspondence: [email protected]

 Received: 2 November 2020; Accepted: 18 December 2020; Published: 21 December 2020 

Abstract: To judge the ability of convolutional neural networks (CNNs) to effectively and efficiently transfer image representations learned on the ImageNet dataset to the task of recognizing COVID-19 in this work, we propose and analyze four approaches. For this purpose, we use VGG16, ResNetV2, InceptionResNetV2, DenseNet121, and MobileNetV2 CNN models pre-trained on ImageNet dataset to extract features from X-ray images of COVID and Non-COVID patients. Simulations study performed by us reveal that these pre-trained models have a different level of ability to transfer image representation. We find that in the approaches that we have proposed, if we use either ResNetV2 or DenseNet121 to extract features, then the performance of these approaches to detect COVID-19 is better. One of the important findings of our study is that the use of principal component analysis for feature selection improves efficiency. The approach using the fusion of features outperforms all the other approaches, and with this approach, we could achieve an accuracy of 0.94 for a three-class classification problem. This work will not only be useful for COVID-19 detection but also for any domain with small datasets.

Keywords: convolutional neural networks; transfer learning; K-means clustering; principal component analysis

1. Introduction COVID-19, a global pandemic, is still spreading in many parts of the world since its identification in late December 2019. In these nine to ten months, this disease has become one of the most significant public health emergencies requiring remedial measures and early diagnosis. In many countries till recently, reverse transcription-polymerase chain reaction (RT-PCR) tests are the most popular diagnostic method for detecting COVID-19. Although popular, this method suffers from limitations in its long wait time and low sensitivity. Therefore, for the early diagnosis of COVID-19, many have started using molecular tests to determine the coronavirus. For example, many existing machines like Genmark’s ePlex Respiratory Pathogen instrument or Abbott’s ID, etc., have a COVID-19 feature for testing, which takes much less time [1,2]. The other advantage is that the sensitivity of these molecular tests is around 90% better than the RT-PCR method having a sensitivity of about 70%. However, both the RT-PCR method or molecular testing approach need expensive equipment and trained professionals. Further, the availability of these methods is limited in remote areas and low and middle-income

AI 2020, 1, 586–606; doi:10.3390/ai1040034 www.mdpi.com/journal/ai AI 2020, 1 587 countries. Thus, it necessitates developing useful diagnostic tools suitable for use in remote areas and low and middle-income group countries. Studies suggest an association of coronavirus with a dysregulation in blood investigations [3]. Apart from these methods, radiological imaging has utility in COVID-19 detection [3]. Further, Rubin et al. [4] report the recommendations by a panel of radiologists and pulmonologists for using CXR and computer tomography (CT) in the diagnosis, progression, and management of COVID-19. CT, in comparison to CXR, has superior diagnostic accuracy, but in terms of cost, ease of use, availability, and less cumbersome sanitization process CXR is a better option. Furthermore, it is easy to install and operate portable units having a CXR facility in COVID wards, intensive care units, COVID screening areas [5]. There are subtle changes in CXR due to COVID-19. COVID creates septal thickening [6], peripheral airspace opacities, ground-glass opacity [7], peribronchial consolidation, multifocal airspace opacities, and diffuse mid and lower zone consolidation in CXR [5]. Moreover, CXR is abnormal in 9% of cases, even for patients having negative initial RT-PCR results. For a detailed review of the role of CXR in COVID diagnosis and prognosis, see [5]. The use of CXR is not new; its usage includes disease detection, evaluation of prognosis, forecasting the response to treatment, and monitoring of disease status. It helps diagnose specific diseases such as tuberculosis, interstitial lung diseases, and pulmonary nodules [8]. Automatic CXR analysis acts as an assistive tool for radiologists and improves detection accuracy [8]. More recently, the inherent capability of deep neural networks to capture hidden representations and extract features of the input images has led to their remarkable success in the automatic analysis of medical images. algorithms help in identifying, classifying, and quantifying patterns in medical images [9]. Applications of deep neural networks in medical image analysis have been highlighted by [9–16]. Among many different types of deep neural networks, currently, convolutional neural networks (CNNs) are the most researched in the medical image analysis [9,11,17–19]. CNNs show a more distinct resemblance to the biological system. The connectivity patterns of CNNs in Convolutional layers resemble the connectivity pattern of the mammalian visual system. The most noteworthy characteristic of CNNs is that, like the mammalian visual system, these networks build the representation of input image hierarchically without prior knowledge or human effort. CNN directly processes two or three-dimensional images; this keeps the structural and configural information of the image intact making them the most suitable networks for accomplishing different tasks in medical imaging [9]. CNNs have a large number of parameters; for optimizing them; in general, there is a requirement of large scale annotated datasets. In the medical field, the availability of such large scale datasets is rare. To overcome this limitation, researchers use CNNs with transfer learning. In transfer learning, the image representations learned with CNNs on large-scale datasets are effectively and efficiently transferred to the other tasks having an availability of small-scale datasets, for example, to medical image analysis [20–22]. Li et al. [23] showed that for diabetic retinopathy fundus image classification, CNN with a transfer learning approach for classification outperformed other existing methods of classification. In another study, InceptionV3 pre-trained based model with transfer learning was successfully used to classify pulmonary images [24]. Mormont R et al. [25], in their work on digital pathology, compared VGG16, VGG19, InceptionV3, ResNet50, InceptionResNetV2, DenseNet, MobileNet pre-trained models with transfer learning. They found that the performance of VGG based model was worst while the performance of the model based on ResNet and DenseNet was much better than the model based on other pre-trained models. Shin et al. [19] used CifarNet, AlexNet, and GoogleNet pre-trained ImageNet models to transfer knowledge for thoracoabdominal lymph node and interstitial lung disease detection. Their study shows that the GoogleNet pre-trained based model performs poorly compared to either AlexNet or CifarNet pre-trained based models. Recently many of the researchers applied the approach of CNN with transfer learning for the detection of COVID-19. Training deep CNNs from scratch requires large scale annotated datasets, but the availability of the COVID-19 image dataset is limited. Therefore, CNN models proposed so far for detecting COVID-19 made use of transfer learning. Some researchers made tailored CNNs to detect COVID-19 [26,27]. AI 2020, 1 588

In these tailormade models, to overcome the limited availability of the COVID image data set, Wang et al. [26] first trained their proposed model on the ImageNet dataset and then on COVIDx dataset. In contrast, Afshar et al. [27] pre-trained their proposed model on an external dataset consisting of 94,323 frontal view chest X-ray images for common thorax diseases then fine-tuned the model on a dataset containing COVID-19 images. Islam et al. [28] applied long short term memory (LSTM) for COVID detection after the extraction of the feature using CNN. Rahimzadeh et al. [29] proposed a concatenated CNN model made by using Xception and ResNet50V2 models to detect COVID-19 cases using chest X-rays. Alqudah et al. [30] applied different approaches such as support vector machine (SVM), CNN, and random forest (RF) for the detection of COVID-19. Ucar et al. [31] applied probabilistic based deep Bayes-squeezeNet for the diagnosis of coronavirus. In another approach, Kumar et al. [32] used nine pre-trained models for feature extraction and then support vector machine for classification. Jain et al. [33] applied deep learning-based approach on PA view of chest X-ray scans for COVID-19 affected patients as well as healthy patients for classification of COVID-19. In their work after cleaning up the images and applying , a comparison between different deep learning-based CNN models is made. They collected 6432 chest X-ray scans samples from the Kaggle repository, out of which 5467 were used for training and 965 for validation. Abbas et al. [34] extracted the features using CNN and applied principal component analysis (PCA) for dimensionality reduction. On these feature vectors, they applied a clustering technique to decompose original classes into multiple classes. They again used a pre-trained CNN to classify features into original classes using DeTraC deep neural network. Others have used the pre-trained CNN model-based transfer learning approach to detect COVID-19 with reasonable accuracy [35–42]. These studies show that either a pre-trained or tailormade CNN model can successfully detect COVID-19. Important implicit conclusions of these studies are that CNN is like a black box that does not require any domain knowledge, and the features extracted by trained CNN are generic. In other words, these studies implicitly reinforce the notion that the transferability is possible despite the disparity between the training domain consisting of large scale annotated databases and the classification task domain having small scale annotated datasets. Many of these studies perform comparative studies to show that one of the pre-trained models with some conventional classifier outperforms the other combinations of pre-trained models and classifiers. These studies compare the performance of models based on the classification accuracy, sensitivity, etc., and do not comment either on the feature extraction capabilities of these models or provide the justifications and the reasons for the difference in the performance of various models. In this work, we use CNN with transfer learning to detect COVID-19 using chest X-ray images. During recognition using a visualization approach, we comment on the feature extraction capabilities of different pre-trained CNN models. We show that after the extraction of features using pre-trained CNNs, even the K-means algorithm, a simple unsupervised learning mechanism can be made to work as a COVID-19 detector. Moreover, we compare the performance of the K-means detector with two integrated models, namely, a simple integrated model (SIM) and a fused, integrated model (FIM). Furthermore, we also compare the results of these models, as mentioned above, with a new CNN model using shallow tuning. The new model is built by copying the feature extracting blocks of pre-trained models and by adding classifier layers on top of it. In this work, we also analyze the role of principal component analysis in feature selection from the extracted features in improving the classification performance of the models. Like other studies that use CNN for COVID-19 detection, our work also has limitations as studies show that since COVID-19 chest X-ray available on public domain include unconfirmed cases, samples of pediatric patients etc., CNN based approach for COVID-19 detection is giving biased results. Furthermore, in the absence of validation of results on external datasets and verification of results by clinicians reliability of the obtained results using a CNN based approach is questionable [43,44]. Interpretation of results requires a cautious approach as the data that we have used is in non-DICOM format, and it may have resulted in the loss of quality of the images and lack of consistency. Therefore, AI 2020, 1 589 in this work, we did not apply any interpretation methods for the CNN models. As, one of the main aims of our study is to compare the feature extraction capabilities of different pre-trained networks. Since the data used for comparison is same for all the pre-trained models, the effect of the type of data or the absence of verification of results by clinicians does not alter our judgement on the comparative analysis of the feature extraction capabilities of different pre-trained models. Many studies using transfer learning have been proposed for COVID-19 detection. There is no consensus about which pre-trained model is the best model for transferring the knowledge in some studies it has been shown that VGG19 is the best in other studies DenseNet or Resnet have been shown to outperform the other pre-trained models. However, in our study, we make use of PCA to show that DenseNet pre-trained model has better feature extraction capability.

2. Preliminaries

2.1. Convolutional Neural Networks (CNNs) In the mammalian visual system, the representations of visual input from simple to complex are progressively built up hierarchically [45,46]. CNNs like the mammalian visual system, build the representation of input image hierarchically. The term convolutional neural network was, for the first time, formally introduced in LeCun et al. 1998 [47]. They called their CNN as LeNet-5 and showed that LeNet-5 could outperform many other pattern recognition approaches for recognizing handwritten characters. Since then, different researchers have proposed many variants of CNN that have applications in different areas.

2.2. Architecture of Basic CNNs Basic CNNs are composed of two main building blocks: trainable feature extraction block (TFEB) and trainable classification block (TCB). Several convolutional subblocks are stacked together to form a trainable feature extraction block (TFEB.). Each convolutional subblock has two processing stages. In the first stage, a convolution operation is performed between the feature map (for the first subblock, this is an input image) of the previous layer and a kernel(filter). The output of convolution operation is processed by nonlinear processing units such as the Rectified Linear Unit (ReLU). The use of nonlinear processing units helps to learn abstraction and to embed nonlinearity in the feature space. In the second stage, by pooling operation, a downsampled feature map is obtained. The pooling operation reduces spatial resolution and sensitivity to small shifts and distortions [47]. This downsampled feature map represents many different same level representations obtained using many different learnable filters in each subblock. At the end of the feature extraction block (TFEB), one or many fully connected layers are connected, representing the trainable classifier block (TCB). Filter weights and the weights of all the connections in the fully connected layers represent the parameters of the given CNN. CNNs are specialized networks that can accomplish feature extraction, selection, and classification using a general-purpose learning algorithm, thus eliminating the requirement of a human expert. Furthermore, these networks are insensitive to variations in position, rotation, translation, scaling of the input data [47]. There are many parameters to be optimized in CNNs, and therefore, in general, these networks require large-scale annotated dataset to train them.

2.3. Transfer Learning Using CNNs CNN’s acquire knowledge consisting of feature representations of the input image in a hierarchal manner, decision boundaries, and regions. In the transfer learning, there is a transfer of the acquired knowledge from one domain to another in complete or partial form, followed by supplemental learning. In one approach of domain transfer, a tailormade model acquires knowledge using a domain having a sufficient training dataset. As part of supplemental learning, this model is fine-tuned end to end for the domain with insufficient datasets. In another approach of domain transfer, a new model is made by copying the first n layers of a pre-trained model (knowledge transfer) and adding new layers AI 2020, 1 590 on top of it. After that, as a part of supplemental learning, one of the following strategies is adopted (a) Shallow tuning: The parameters of the copied layers are frozen, and the parameters of the newly added layers are randomly initialized and then optimized. We call this approach to be a shallow tuning approach. (b) Deep fine-tuning: Deep fine-tuning is the process of fine-tuning the parameters of the new model in an end to end manner. In yet another approach called the feature extractor approach of transfer learning, a pre-trained network acts as a feature extractor. Then these extracted features are applied as input to the standard classifier, such as support vector machine, etc. This approach is called an off-the-shelf feature method [48]. In this work, we apply shallow tuning and off the shelf feature methods.

2.4. Pre-Trained Deep Networks Several models have performed exceedingly well on the ImageNet data classification task [25]. In this study, we evaluate five such deep networks, namely VGG16, ResNet50V2, InceptionResNetV2, DenseNet121, and MobileNetV2, and compare their performance when used in transfer learning mode for detection of COVID-19. Out of the five models mentioned above, VGG16 belongs to six VGG models developed by the Visual Geometry Group [49]. Models of this family secured first and second position in classification plus localization category of the ILSVR2014 competition. VGG is an example of stacked module architecture, with a 3 3 filter size in all the layers. VGG16 has × 13 convolutional and three fully connected layers. ResNet50 belongs to the family of deep residual learning framework [50]. Networks in this family also have a highly modular structure. One of the highlights of the models belonging to this family is that they can be very deep (needed to extract more complicated features of the image). They still do not face the problem of vanishing/exploding gradients or degeneration [51]. These models could overcome the problems mentioned above by making use of short cut connections. ResNet50, like its predecessors, is a two-branch network where one branch is the identity mapping performed by the short cut connections which skip one or more layers. ResNet50 is the first network to use 1 1 filters as a bottleneck to reduce the number of channels in the output × feature map. This network has performed exceedingly well on classification tasks of the ImageNet dataset [50]. InceptionNet can address the problem of properly recognizing the objects covering a different sized area in an image. For example, recognizing an object covering a large area in an image requires spatial information at a coarse level. In contrast, for recognizing an object covering a small area, fine-level spatial information is required [52]. InceptionNet networks contain an inception module having parallel paths with different sized kernels (1 1, 3 3, 5 5) to capture spatial information at × × × different scales. InceptionResNetV2 is a network that embeds in it the properties of inception networks and deep residual networks [53]. MobileNetV2 is suited to work on devices having low computational capabilities and low memory. These networks, based on MobileNetV1, also use depth-wise, separable convolution, and pointwise convolution to reduce the computational complexity and model size. Additionally, they have inverted residual blocks for channel expansion to overcome the limitation of depthwise separable convolution with a limited fixed number of input channels [54]. Therefore, MobileNetV2 have better accuracy than MobileNetV1 in multiple image classification and detection tasks for mobile application. Dense Convolutional Network [55] contains short paths that form a direct connection between any layer and its preceding layers. Like InceptionNet, in this network, the features are concatenated at each layer. Such a connection scheme helps CNN to alleviate the vanishing gradient problem, strengthening the propagation of features and narrowing of the net. DenseNet CNN performs at par with another state of the art CNNs with a requirement of substantially fewer parameters, thus making them computationally efficient. Table1 compares the five models (used in our study) for classification tasks on the ImageNet dataset. AI 2020, 1 591

Table 1. ILSVRC Image Classification Results.

Model Top-1 val. Error (%) Top-5 val. Error (%) Reference MobileNetV2 25.3 - [54] VGG16 25.6 8.1 [49] ResNet50V2 23.9 - [50] InceptionResNetV2 19.9 4.9 [53] DenseNet121 23.61 6.66 [55]

3. Material and Methods

3.1. X-ray Image-Dataset In this study,we have used open public datasets developed in a project by Cohen J.P., [56]. This dataset contains chest X-ray images of MERS, SARS, ARDS, and other respiratory diseases. The X-ray images in this database are from various public sources, through the indirect collection from hospitals and physicians. We have used this dataset to collect Chest X-rays of the patients which are positive or suspected of COVID-19. Chest X-rays of healthy patients and pneumonia infected patients were taken from Kaggle [57]. The images from Kaggle, have chest X-rays that were already initially screened for quality control by removing all low quality or unreadable scans. Moreover, in the Kaggle dataset the diagnoses for the images are graded by two expert physicians before being cleared for uploading them on the website. For further use, two datasets, namely dataset1 containing 142 images of COVID-19 and 300 images of Normal chest X-rays and dataset2 that included 142 images of COVID-19, 300 images each for Pneumonia and Normal chest X-rays, were created. All images were resized to 224 224 pixels × and normalized before using them as input. We did not apply any other preprocessing steps as many other studies have been done without preprocessing, and adopting a similar approach helped us in comparing our methods with other studies. The details of the dataset used are as given in Table2.

Table 2. Dataset details.

Images/Dataset COVID Normal Pneumonia Dataset1 142 300 - Dataset2 142 300 300

3.2. New CNN Model with Shallow Tuning and Its Training Procedure Rather than proposing our architecture, we make use of the transfer learning approach. We build a CNN with a feature extraction block adapted from CNNs previously trained on ImageNet database and adding fully connected layers on top of it, as shown in Figure1. With feature extracting block of VGG16, ResNetV2, InceptionResNetV2, DenseNet121, and MobileNetV2 respectively as the pre-trained FEB, and a fully connected dense layer between the output of this FEB and the output layer resulted in four CNN models. The feature extraction block of these deep networks, when inserted in the new CNN, retains both its original architecture and optimized parameter of their training on the ImageNet dataset. After that, we apply supplemental learning to this new CNN. The training steps followed as follows: Step 1: We choose one of the datasets (dataset1 or dataset2). Step 2: We split this dataset randomly into two independent data subsets with 70% and 30% for training and testing. Step 3: We decide on either cross-entropy or class weighted cross-entropy loss function. Step 4: The parameters of the trainable classifier block are initialized randomly to random values. Many adaptive and non-adaptive optimizers are well suited for optimizing the Neural Networks [58]. These optimizers have shown to find quite different solutions with very different generalization AI 2020, 1 592 properties [59]. We utilize the ADAM optimizer for supplemental training of newly formed CNNs. ADAM optimizer [60] use the first and second moments of gradients to compute the individual learning rateAI 2020 for, 12 di, xff FORerent parameters. 7 of 21

Feature Extraction Block (All the convolutional layers) of one of the following : VGG16, ResNetV2, InceptionResNetV2, DenseNet121 and MobileNetV2

Knowledge Transfer Pre- TCB Chest X- Trained Learnable Output Ray Image FEB Weights

Figure 1. The new convolutional neural network (CNN) constructedconstructed by adapting feature extractingextracting block of any oneone ofof VGG16,VGG16, ResNetV2,ResNetV2, InceptionResNetV2,InceptionResNetV2, MobileNetV2,MobileNetV2, and DenseNet121DenseNet121 CNNs trained onon thethe ImageNetImageNet database database consisting consisting of of 1000 1000 classes. classes. The The trainable trainable classifier classifier block, block, consisting consisting of fullyof fully connected connected layers, layers, is recreatedis recreated in newin new CNN CNN with with initial initial randomized randomized weights. weights. After After completing completing the supplementalthe supplemental learning learning process process with wi chestth chest X-rays X-rays as input, as input, the newthe ne CNNw CNN can classifycan classify Chest Chest X-rays X-rays into diintofferent different classes. classes.

During the entire course of training, parameters of the pre-trained FEB are kept frozen to their initial values.values. AfterAfter completion completion of of training, training, we we test test the performancethe performance of the of trained the trained model onmodel the training on the andtraining test dataset.and test Todataset. analyze To theanalyze results, the we results, obtain we the obtain confusion the confusion matrix for matrix the training for the andtraining test dataand subsets.test data Thesubsets. models The trained models on trai dataset1ned on classify dataset1 the classify images the into images COVID-19 into andCOVID-19 normal and and normal on dataset2 and classifyon dataset2 images classify into images COVID-19, into normal,COVID-19, and norm pneumonia.al, and pneumonia. For comparison For comparison purposes for purposes all models, for trainingall models, hyperparameters, training hyperparameters, the number the of neurons number in of the neurons trainable in the layers, trainable and the layers, value and of dropoutthe value are of keptdropout the same.are kept After the everysame. training After every cycle, training using the cycle, confusion using the matrix, confusion the performance matrix, the measures performance that includemeasures accuracy, that include sensitivity, accuracy, Positive sensitivity, predictive Positive Value (PPV), predictive and specificity Value (PPV), is determined and specificity separately is fordetermined each model. separately for each model. In the second stage, the sample-set obtained from the stage I is first labeled, and then we randomly split3.3. Semi-Supervised it into two independent K-Means Detector sample subsets, namely A and B, with 70% and 30%, respectively, for clustering and testing. We now use the K-means algorithm [61] to cluster sample set A into three or K-means detector is composed of a two-stage process. The first stage comprises of creation of two clusters for a two class and three class classification problem respectively in an unsupervised samples. The sample-set includes feature vectors extracted with chest X-rays of dataset1 or dataset2 manner. The process of clustering results in providing a cluster center for each cluster and the samples as input to any one of the CNNs namely VGG16, ResNetV2, InceptionResnetV2, DenseNet121, and belonging to that cluster. To assign the class labels to each cluster and for detecting the class for any MobileNetV2 pre-trained on ImageNet dataset. The block diagram of the K-means detector is, as sample from either sample subsets A or B, we adopt the following steps: shown in Figure 2. Step 1: We use the sample labels of subset A to label the cluster, and its center with the label of the In the second stage, the sample-set obtained from the stage I is first labeled, and then we class having the maximum number of samples in that cluster. randomly split it into two independent sample subsets, namely A and B, with 70% and 30%, Step 2: For detection, we pick any sample from subset A, or B, and determine the Euclidean respectively, for clustering and testing. We now use the K-means algorithm [61] to cluster sample set distance of this sample from each cluster center. We then assign this sample the label of the cluster A into three or two clusters for a two class and three class classification problem respectively in an center with minimum Euclidean distance (ED) among the EDs from this sample. unsupervised manner. The process of clustering results in providing a cluster center for each cluster We now obtain the confusion matrix for both subsets A and B. To compare with other methods and the samples belonging to that cluster. To assign the class labels to each cluster and for detecting using this confusion matrix, we obtain the performance measures that include accuracy, sensitivity, the class for any sample from either sample subsets A or B, we adopt the following steps: PPV, and specificity. AI 2020, 1 593

3.3. Semi-Supervised K-Means Detector K-means detector is composed of a two-stage process. The first stage comprises of creation of samples. The sample-set includes feature vectors extracted with chest X-rays of dataset1 or dataset2 as input to any one of the CNNs namely VGG16, ResNetV2, InceptionResnetV2, DenseNet121, and MobileNetV2 pre-trained on ImageNet dataset. The block diagram of the K-means detector is,

AIas 2020 shown, 12, x in FOR Figure 2. 8 of 21

Feature Extraction Block (All the convolutional layers) of one of the following : VGG16, ResNetV2, InceptionResNetV2, DenseNet121 and MobileNetV2 Knowledge Transfer Pre- Chest X- Feature Trained ray Image Vector FEB Stage I

Sample- K-means Detector set Clustering

Stage II Figure 2. TheThe Semi-supervised Semi-supervised K-means Detector. 3.4. Simple Integrated Model (SIM) Step 1: We use the sample labels of subset A to label the cluster, and its center with the label of the classWe builthaving a simple the maximum integrated number model of by samples integrating in that an cluster. additional feature selection block between the featureStep 2: extraction For detection, block we and pick the classifierany sample block from of the subset conventional A, or B, convolutionaland determine neural the Euclidean network. Thedistance block of diagram this sample of the from SIM each is, ascluster shown center. in Figure We then3. SIM, assign in thethis first sample stage, the extracts label of the the feature cluster vectorcenter with using minimum one of the Euclidean pre-trained distance CNNas (ED) already among described the EDs in from the previousthis sample. sections. In the second stage,We after now selecting obtain thethe featuresconfusion using matrix principal for both component subsets A analysis and B. (PCA),To compare these with selected other features methods act usingas an inputthis confusion to the multilayer matrix, we perceptron obtain the classifier. performance PCA ismeasures a standard that statistical include accuracy, technique sensitivity, that helps PPV,find patternsand specificity. in the data of high dimension and has applications in fields such as face recognition and image processing [62]. Suitable selection of the principal components helps in dimensionality 3.4.reduction, Simple whichIntegrated helps Model in improving (SIM) the computational efficiency of the classifier. We use the technique describedWe built by Jollia simpleffe, T. integrated [62] for determining model by integrating the principal an additional components. feature The selection final block block of between SIM is a themultilayer feature perceptronextraction block (MLP) and neural the classifier network. block MLP of is the one conventional of the prevalent convol artificialutional neuralneural networksnetwork. withThe block many diagram applications of the that SIM include is, as shown solving in classification Figure 3. SIM, and in regression the first stage, problems extracts [63]. the MLP feature is an vectorextension using of theone perceptron of the pre-traine and containsd CNN hiddenas already layers described directly in not the connected previous withsections. the outsideIn the second world. Thesestage, networksafter selecting are built the byfeatures connecting using many principal different component processing analysis units. In(PCA), these these network selected weights features of the actinterconnections as an input to are the the multilayer adjustable perceptron parameters classifier. optimized PCA by the is errora standard backpropagation statistical technique algorithm that [64]. helpsMLPs performfind patterns better orin equivalentthe data of to manyhigh dimensio conventionaln and classifiers has applications such as K-NN, in fields quadratic such Gaussian, as face recognitionor Bayesian maximumand image likelihood processing classifiers [62]. Suitable [63]. selection of the principal components helps in dimensionality reduction, which helps in improving the computational efficiency of the classifier. We use the technique described by Jolliffe, T. [62] for determining the principal components. The final block of SIM is a multilayer perceptron (MLP) neural network. MLP is one of the prevalent artificial neural networks with many applications that include solving classification and regression problems [63]. MLP is an extension of the perceptron and contains hidden layers directly not connected with the outside world. These networks are built by connecting many different processing units. In these network weights of the interconnections are the adjustable parameters optimized by the error backpropagation algorithm [64]. MLPs perform better or equivalent to many conventional classifiers such as K-NN, quadratic Gaussian, or Bayesian maximum likelihood classifiers. [63]. AI 2020, 1 594 AI 2020, 12, x FOR 9 of 21

Feature Extraction Block (All the convolutional layers) of one of the following : VGG16, ResNetV2, InceptionResNetV2, DenseNet121 and MobileNetV2

Knowledge Transfer Pre- Chest X- Feature Trained ray Image Vector FEB Stage I

Multi-Layer Feature Selection Perceptron Using PCA Classifier

Stage II

Figure 3. The Simple Integrated Model. Figure 3. The Simple Integrated Model. 3.5. Fused Integrated Model (FIM) 3.5. Fused Integrated Model (FIM) The fused integrated model is an extension of a simple integrated model. The steps involved in the processingThe fused ofintegrated this model model are as is follows:an extension of a simple integrated model. The steps involved in the processingStep 1: In theof this first model stage, weare extract as follows: two feature vectors parallelly using ResNetV2 and DenseNet121 fromStep the Chest1: In X-raythe first image. stage, we extract two feature vectors parallelly using ResNetV2 and DenseNet121Step 2: In from the secondthe Chest stage, X-ray we image. apply these extracted features as input to PCA, which works as a featureStep selector. 2: In the second stage, we apply these extracted features as input to PCA, which works as a featureStep selector. 3: Next, we concatenate the selected features to form a concatenated vector. Step 3: 4: Next, Finally, we we concatenate apply the concatenatedthe selected features vector asto inputform a to concatenated the multilayer vector. perceptron. StepFigure 4:4 Finally, represents we apply the block the diagramconcatenated of the vector fused as integrated input to the model. multilayer For both perceptron. the SIM and FIM, we firstFigure choose 4 represents one of the the datasets block diagram (dataset1 of or the dataset2) fused integrated and split itmodel. randomly For both into the two SIM independent and FIM, wedata first subsets choose with one 70% of the and datasets 30% for (dataset1 training andor da testing.taset2) and In this split work, it randomly we use MLPinto two with independent two hidden datalayers, subsets input andwith output 70% and layer. 30% We for use training an error and backpropagation testing. In this algorithm work, we [ 64use] toMLP optimize with two the network hidden layers,parameters input with and a output learning layer. rate ofWe 0.3 use on thean error training backpropagation dataset. After completingalgorithm [64] the trainingto optimize of MLP, the networkwe obtain parameters the confusion with matrix a learning for both rate trainingof 0.3 on andthe training test datasets. dataset. To After compare completing with other themethods training ofusing MLP, this we confusion obtain the matrix, confusion we obtain matrix the performancefor both training measures and test that datasets. include accuracy, To compare sensitivity, with other PPV, methodsand specificity. using Thethis nextconfusion subsection matrix, provides we obtain the detailsthe performance about getting measures these performance that include measures. accuracy, sensitivity, PPV, and specificity. The next subsection provides the details about getting these performance3.6. Performance measures. Measures The metrics to analyze the performance of different models are accuracy-percentage of correct predictions., specificity-percentage of an accurately predicted healthy individual sensitivity-percentage of accurately predicted unhealthy individuals, and positive predictive value-percentage of correct positive predictions. The confusion matrix for dataset1 is as given in Table2. We use Equations (1) to (4) to determine the values of the metrics, as mentioned above, for a two-class classification problem. The confusion matrix for dataset2 is as given in Table3.

TP + TN Accuracy = (1) TP + TN + FP + FN AI 2020, 1 595

TP Sensitivity(Recall) = (2) TP + FN TN Speci f icity = (3) TN + FP TP Positive Predictive Value (PPV) = (4) TP + FP AI 2020, 12, x FOR 10 of 21

Chest X-Ray Image

FEB of FEB of ResNet DenseNet

Stage I

PCA PCA Feature Feature Selector Selector

Cascaded Features

Multi Layer Perceptron

Output

Stage II

FigureFigure 4. 4. TheThe Fused Fused Integrated Model. Model.

Table 3. Confusion Matrix for Dataset1. 3.6. Performance Measures Predicted The metrics to analyze the performance of different models are accuracy-percentage of correct COVID-19 Normal predictions., Actualspecificity-percentage of an accurately predicted healthy individual sensitivity- percentage of accurately predictedCOVID-19 unhealthyTrue Positive indivi Valuesduals, (TP) and positiveFalse Negative predictive Values value-percentage (FN) of correct positive predictions.NORMAL False Positive Values (FP) True Negative Values (TN) The Trueconfusion positive if matrix COVID-19 for detected dataset1 as COVID-19, is as given False Negativein Table if COVID-19 2. We use detected Equations as Normal, (1) False to positive(4) to determine if Normal detected as COVID-19 and True Negative if Normal detected as Normal. the values of the metrics, as mentioned above, for a two-class classification problem. The confusion matrix forIn dataset2 the case is of as dataset2, given in we Table have 3. three classes A(COVID-19), B(Normal), and C(Pneumonia); therefore, values of a false negative, false positive, and𝑇𝑃 true negative+ 𝑇𝑁 are calculated for each of these 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = (1) class separately. Table4 depicts Confusion Matrix𝑇𝑃 for Dataset2. + 𝑇𝑁 + 𝐹𝑃 +𝐹𝑁 𝑆𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦(𝑅𝑒𝑐𝑎𝑙𝑙) = (2) 𝑆𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦 = (3) 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑃𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑣𝑒 𝑉𝑎𝑙𝑢𝑒(𝑃𝑃𝑉)= (4)

Table 3. Confusion Matrix for Dataset1.

Predicted

COVID-19 Normal Actual COVID-19 True Positive Values (TP) False Negative Values (FN) NORMAL False Positive Values (FP) True Negative Values (TN) True positive if COVID-19 detected as COVID-19, False Negative if COVID-19 detected as Normal, False positive if Normal detected as COVID-19 and True Negative if Normal detected as Normal. In the case of dataset2, we have three classes A(COVID-19), B(Normal), and C(Pneumonia); therefore, values of a false negative, false positive, and true negative are calculated for each of these class separately. Table 4 depicts Confusion Matrix for Dataset2. AI 2020, 1 596

Table 4. Confusion Matrix for Dataset2

Predicted CD-19 (A) NR (B) PN (C)

Actual CD-19 TPA FAB FAC

NR FBA TPB FBC

PN FCA FCB TPC

TPA = number of COVID-19(A) cases detected as COVID-19(A), FAB = number of COVID-19 (A) cases identified as Normal (B), and so on.CD-19, NR and PN are COVID-19, Normal, and Pneumonia.

As an example, the calculations for class A are as follows: False Negative for class A is FN = Sum of all values in Row(A) True Positive value of A (TP ), therefore, FN = TP + F A − A A A AB + F TP = F + F . AC − A AB AC Similarly, False Positive of class A is FP = Sum of all values in Column(A) True Positive value A − of A (TP ), therefore, FP = TP + F + F TP = F + F A A A BA CA − A BA CA. True Negative TNA = (Sum of all columns (Except Column A) +Sum of all rows (Except Row A)), or TNA = TPB + FBC + FCB + TPC. Equation (5) gives the accuracy of the models for dataset2.

TP + TP + TP Accuracy = A B C (5) TPA + TPB + TPC + FAB + FAC + FBA + FBC + FCA + FCB

The sensitivity, selectivity, and positive predictive value for class A are as per Equations (6) to (8).

TP Sensitivity (A) = A (6) TPA + FNA TN Speci f icity (A) = A (7) TNA + FPA TP PPV(A) = A (8) TPA + FPA Similarly, the performance metrics for class B and class C.

4. Simulation Results We carried out the simulations on the Google Colab Linux server, having an availability of a high-end processor. The GPUs available in Colab often include Nvidia K80s, T4s, P4s, and P100s, and there is no facility for choosing any one of them. The type of GPU available varies over time and is automatically assigned. Simulations carried by us help us to analyze the feature extraction capabilities of each of the pre-trained models. We obtain the performance metrics of all the pre-trained models using the different modeling methods, namely the new CNN model with shallow tuning, semi-supervised K-means detector, and simple integrated model. Once we identify the pre-trained model with the best feature extracting capability, we compare different modeling strategies to determine the best among them. In the first part of the simulation results, we show and compare the performance of each pre-trained model for each modeling approach. For ease of representation, in the further discussion from now onwards, we use short forms, namely VN, IN, RN, DN, and MN, denoting VGG16, ResNet50, InceptionResNetV2, DenseNet121, and MobileNetV2 pre-trained models, respectively. AI 2020, 1 597

4.1. Analysis of New CNN Models with Shallow Tuning on Dataset1 and Dataset2 As depicted in Figures5 and6, the performance of the new CNN model built using VGG16 is better in comparison to the models built with other pre-trained models. This VGG16 based model with shallow tuning achieves an accuracy of about 0.98 and 0.87 for two-class and three-class classification problem. As can be seen from the same figures, the performance of the MobileNetV2 based model is AI 2020, 12, x FOR 12 of 21 worst,AI 2020,and 12, x thisFOR model achieves an accuracy of 0.65 and 0.37 for two-class and three-class classification12 of 21 modelsproblems, deteriorates respectively. substantially It is observed for dataset2; that the see, performance for example, of all the the value developed of sensitivity models for deteriorates COVID models deteriorates substantially for dataset2; see, for example, the value of sensitivity for COVID classsubstantially is less than for or dataset2;equal to 0.64 see, for for all example, the five thedeveloped value of models. sensitivity The forperformance COVID class of the is lessResNetV2 than or class is less than or equal to 0.64 for all the five developed models. The performance of the ResNetV2 orequal DenseNet121 to 0.64 for based all the model five achieves developed an models.accuracy The of about performance 0.85 and of 0.7 the for ResNetV2 the two-class or DenseNet121 and three- or DenseNet121 based model achieves an accuracy of about 0.85 and 0.7 for the two-class and three- classbased problem. model The achieves other anperformance accuracy ofmeasures about 0.85 for model and 0.7 based for the on two-classeither ResNetV2 and three-class or DenseNet121 problem. class problem. The other performance measures for model based on either ResNetV2 or DenseNet121 areThe also other having performance similar values. measures The forperformance model based of the on eitherInceptionResNetV2 ResNetV2 or DenseNet121 based model are is inferior also having to are also having similar values. The performance of the InceptionResNetV2 based model is inferior to allsimilar the developed values. The models performance except the of themodel InceptionResNetV2 based on MobileNetV2. based model is inferior to all the developed modelsall the developed except the models model basedexcept on the MobileNetV2. model based on MobileNetV2.

Figure 5. Positive predictive value (PPV), Sensitivity, Specificity, and Accuracy bar charts for dataset1 FigureFigure 5.5. Positive predictive value (PPV), Sensitivity, Specificity,Specificity, andand AccuracyAccuracy barbar chartscharts forfor dataset1dataset1 with different pre-trained CNN models. withwith didifferentfferent pre-trainedpre-trained CNN CNN models. models.

FigureFigure 6. 6.PPV,PPV, Sensitivity, Sensitivity, Specificity, Specificity, and and Accuracy Accuracy bar bar charts charts for for dataset2 dataset2 with with different different pre-trained pre-trained Figure 6. PPV, Sensitivity, Specificity, and Accuracy bar charts for dataset2 with different pre-trained CNNCNN models. models. In thisthis figure, figure, CD, CD, NR, NR, and PNand represent PN represent COVID, COVID, Normal, Normal, and Pneumonia, and Pneumonia, respectively. CNN models. In this figure, CD, NR, and PN represent COVID, Normal, and Pneumonia, respectively. 4.2. Analysisrespectively. of Semi-Supervised K-Means Detector 4.2. AnalysisAs shown of Semi-Supervised in Figure7 forK-Means the two-class Detector classification problems, the performance of the 4.2. Analysis of Semi-Supervised K-Means Detector semi-supervisedAs shown in Figure K-means 7 for approach the two-class is comparable classification for the problems, features the extracted performance using anyof the of thesemi- five As shown in Figure 7 for the two-class classification problems, the performance of the semi- supervisedpre-trained K-means CNN models. approach All theis comparable pre-trained modelsfor the usedfeatures in our extracted study extract using the any requisite of the five features pre- to supervised K-means approach is comparable for the features extracted using any of the five pre- trainedenable CNN semi-supervised models. All K-meansthe pre-trained technique models to classify used thein our X-ray study images extract into the normal requisite and COVIDfeatureswith to trained CNN models. All the pre-trained models used in our study extract the requisite features to enablean accuracy semi-supervised of almost 1. K-means Close observations technique to reveal classify that the the X-ray performance images into of the normal VGG16 and based COVID pretrained with enable semi-supervised K-means technique to classify the X-ray images into normal and COVID with anCNN accuracy model of is almost slightly 1. inferior Close inobservations comparison reveal to other that pre-trained the performance CNN models. of the Figure VGG168 shows based the an accuracy of almost 1. Close observations reveal that the performance of the VGG16 based pretrainedperformance CNN measures model is of slightly a semi-supervised inferior in comparison K-means detector to other for pre-trained a three-class CNN classification models. Figure problem. 8 pretrained CNN model is slightly inferior in comparison to other pre-trained CNN models. Figure 8 showsWe obtain the theseperformance performance measures measures of a individually semi-supervised after extractingK-means featuredetector vectors for a using three-class different shows the performance measures of a semi-supervised K-means detector for a three-class classification problem. We obtain these performance measures individually after extracting feature classification problem. We obtain these performance measures individually after extracting feature vectors using different pre-trained models. On comparing, we find that the performance of this semi- vectors using different pre-trained models. On comparing, we find that the performance of this semi- supervised K-means detector is superior when the extraction of features is done using either supervised K-means detector is superior when the extraction of features is done using either DenseNet121 or ResNetV2 pre-trained CNN model. For these two cases, the detector achieves an DenseNet121 or ResNetV2 pre-trained CNN model. For these two cases, the detector achieves an accuracy of 0.91; when the features are extracted using MobileNetV2, the detector achieves an accuracy of 0.91; when the features are extracted using MobileNetV2, the detector achieves an accuracy of 0.89. The extraction of features using either InceptionResNetV2 or VGG16 pre-trained accuracy of 0.89. The extraction of features using either InceptionResNetV2 or VGG16 pre-trained CNN models results in poor performance of the detector with an accuracy of around 0.7. Other CNN models results in poor performance of the detector with an accuracy of around 0.7. Other performance measures also follow similar trends for all three classes (see Figure 8). performance measures also follow similar trends for all three classes (see Figure 8). AI 2020, 1 598

pre-trained models. On comparing, we find that the performance of this semi-supervised K-means detector is superior when the extraction of features is done using either DenseNet121 or ResNetV2 pre-trained CNN model. For these two cases, the detector achieves an accuracy of 0.91; when the features are extracted using MobileNetV2, the detector achieves an accuracy of 0.89. The extraction of features using either InceptionResNetV2 or VGG16 pre-trained CNN models results in poor performance of the AIdetector 2020, 12, x with FOR an accuracy of around 0.7. Other performance measures also follow similar trends13 of for 21 all threeAI 2020 classes, 12, x FOR (see Figure8). 13 of 21

Figure 7. PPV, Sensitivity, Specificity, and Accuracy bar charts for dataset1 with different pre-trained Figure 7. PPV, Sensitivity, Specificity, and Accuracy bar charts for dataset1 with different pre-trained CNNFigure models. 7. PPV, Results Sensitivity, for semi-supervised Specificity, and K-means Accuracy detector. bar charts for dataset1 with different pre-trained CNN models. Results for semi-supervised K-means detector. CNN models. Results for semi-supervised K-means detector.

Figure 8. PPV, Sensitivity, Specificity, and Accuracy bar charts for dataset2 with different pre-trained Figure 8. PPV, Sensitivity, Specificity, and Accuracy bar charts for dataset2 with different pre-trained CNN models. In this figure, CD, NR, and PN represent COVID, Normal, and Pneumonia, respectively. CNNFigure models. 8. PPV, In Sensitivity, this figure, Specificity, CD, NR, and and Accuracy PN represent bar charts COVID, for dataset2 Normal, with differentand Pneumonia, pre-trained Results for semi-supervised K-means detector. respectively.CNN models. Results In forthis semi-supervised figure, CD, NR, K-means and PNdetector. represent COVID, Normal, and Pneumonia, 4.3. Analysisrespectively. of Simple Results Integrated for semi-supervised Model (SIM) K-means detector. 4.3. Analysis of Simple Integrated Model (SIM) 4.3. AnalysisSimulation of Simple results Integrated shown inModel Figure (SIM)9 for the two-class classification problem show that the extractionSimulation of features results usingshown the in DenseNet121Figure 9 for pre-trainedthe two-class CNN classification model makes problem the simple show integratedthat the extractionmodelSimulation achieve of features an results accuracy using shown ofthe 0.99. DenseNet121 in In Figure comparison, 9 prfore-trained the when tw theo-class CNN SIM modelclassification uses either makes MobileNetV2 problemthe simple show orintegrated ResNetV2 that the modelpre-trainedextraction achieve of CNN featuresan accuracy models using for of the extracting0.99. DenseNet121 In comparison, features, pre-trained then when it achievesCNN the SIMmodel an uses accuracy makes either the ofMobileNetV2 simple 0.96. This integrated model or ResNetV2achievesmodel achieve pre-trained an accuracy an accuracy CNN of 0.92 models and of 0.330.99. for when extractingIn comparison, the model featur extracts es,when then the featuresit achieves SIM usinguses an InceptionResNetv2eitheraccuracy MobileNetV2 of 0.96. This and or modelVGG16ResNetV2 achieves pre-trained pre-trained an CNNaccuracy CNN models. models of Other0.92 for extracting performanceand 0.33 featur when measureses, thenthe followitmodel achieves similarextracts an accuracy trends. features These of 0.96.using results This InceptionResNetv2showmodel that achieves for two-class andan accuracy classificationVGG16 pre-trainedof problems,0.92 and CNN SIM0. 33models. performs when Other thethe best performancemodel when extracts it extracts measures features features follow usingusing similartheInceptionResNetv2 DenseNet trends. These pre-trained resultsand VGG16 model,show that andpre-trained for it performstwo-cla CNNss theclassification models. worst whenOther problems, it performance extracts SIM the performs features measures the using followbest the whenVGG16similar it extracts pre-trainedtrends. featuresThese CNN results using model. showthe DenseNet that for two-clapre-trainedss classification model, and problems, it performs SIM the performs worst when the bestit extractswhen theit extracts features features using the using VGG16 the DenseNet pre-trained pre-trained CNN model. model, and it performs the worst when it extracts the features using the VGG16 pre-trained CNN model. AI 2020, 1 599 AIAI 20202020,, 1212,, xx FORFOR 1414 ofof 2121

FigureFigure 9.9. PPV,PPV, Sensitivity,Sensitivity, Specificity,Specificity,Specificity, and andand Accuracy AccuracyAccuracy bar barbar chartscharts forfor dataset1dataset1 withwith differentdidifferentfferent pre-trainedpre-trained CNNCNN models.models. ResultsResults forfor SIM.SIM.

FigureFigure 1010 depictsdepicts thethe barbar chartscharts forfor thethe valuesvalues ofof performance performanceperformance measures measuresmeasures obtainedobtained forfor three-class three-classthree-class classificationclassification problems problemsproblems using usingusing a simple aa simplesimple integrated integratedintegrated model. Themodel.model. accuracy TheThe baraccuracyaccuracy chart in bar Figurebar chartchart 10, comparedinin FigureFigure with 10,10, thatcomparedcompared of Figure withwith9, reveals thatthat of thatof FigureFigure the performance 9,9, revealsreveals of thatthat the simplethethe performanceperformance integrated model ofof thethe deteriorates simplesimple integratedintegrated for a three-class modelmodel classificationdeterioratesdeteriorates forfor problem aa three-classthree-class (TCCP). classificationclassification For a TCCP, problem thisproblem model (TCCP).(TCCP). achieves ForFor an aa accuracy TCCP,TCCP, thisthis of 0.85, modelmodel 0.82, achievesachieves 0.81, 0.66, anan andaccuracyaccuracy 0.23 when ofof 0.85,0.85, it extracts 0.82,0.82, 0.81,0.81, features 0.66,0.66, andand using 0.230.23 the whenwhen DenseNet121, itit extractsextracts ResNetV2, featuresfeatures usingusing MobileNetV2, thethe DenseNet121,DenseNet121, InceptionResNetV2 ResNetV2,ResNetV2, andMobileNetV2,MobileNetV2, VGG16 pre-trained InceptionResNetV2InceptionResNetV2 models, respectively. andand VGG16VGG16 Other pre-trainedpre-trained performance models,models, measures respectively.respectively. follow similar OtherOther trends. performanceperformance Like its performancemeasuresmeasures followfollow for the similarsimilar two-class trends.trends. classification LikeLike itsits performaperforma problemncence for forfor TCCP, thethe two-classtwo-class this model classificationclassification performs the problemproblem best when forfor extractingTCCP,TCCP, thisthis features modelmodel using performsperforms the DenseNet thethe bestbest whenwhen pre-trained extractiextracti CNNngng featuresfeatures model. SIM usingusing performs thethe DenseNetDenseNet the worst pre-trainedpre-trained when it extracts CNNCNN themodel.model. features SIMSIM using performsperforms the VGG16 thethe worstworst pre-trained whenwhen itit CNN extractsextracts model. thethe During featuresfeatures simulations, usingusing thethe theVGG16VGG16 hyperparameters pre-trainedpre-trained CNNCNN were keptmodel.model. the DuringDuring same for simulationsimulation differents,s, pre-trained thethe hyperparametershyperparameters CNN models. werewere Further, keptkept thth duringee samesame simulations, forfor differentdifferent we pre-trainedpre-trained applied the CNNCNN two mostmodels.models. prominent Further,Further, principal duringduring simulations,simulations, components we obtainedwe appliedapplied using thethe principaltwotwo mostmost component prominentprominent analysis principalprincipal as components inputcomponents to the multi-layeredobtainedobtained usingusing perceptron. principalprincipal componentcomponent analysisanalysis asas inputinput toto thethe multi-layeredmulti-layered perceptron.perceptron.

FigureFigure 10.10. PPV,PPV, Sensitivity,Sensitivity, Specificity,Specificity,Specificity, and andand Accuracy AccuracyAccuracy bar bar chartscharts forfor dataset2dataset2 withwith differentdidifferentfferent pre-trainedpre-trained CNNCNN models.models. InIn this this figure, figure,figure, CD,CD, NR, NR, and and PN PN represrepres represententent COVID, COVID, Normal, Normal, and and Pneumonia, Pneumonia, respectivelyrespectively Results for SIM. ResultsResults forfor SIM.SIM. 4.4. Analysis of Knowledge Transfer Capability of Different Pre-Trained CNN Models 4.4.4.4. AnalysisAnalysis ofof KnowledgeKnowledge TransferTransfer CapaCapabilitybility ofof DifferentDifferent Pre-Pre-TrainedTrained CNNCNN ModelsModels Simulation results for different approaches discussed in the previous sections use pre-trained CNN Simulation results for different approaches discussed in the previous sections use pre-trained modelsSimulation for knowledge results transfer. for different In these approaches approaches, discussed performance in the largely previous depends sections upon use the pre-trained knowledge CNN models for knowledge transfer. In these approaches, performance largely depends upon the transferCNN models capabilities for knowledge of the pre-trained transfer. In models. these a Thepproaches, feature performance vector in abstract largely form depends represents upon thisthe knowledge transfer capabilities of the pre-trained models. The feature vector in abstract form knowledge.knowledge Therefore,transfer capabilities to analyze theof the pre-trained pre-trained models’ models. knowledge The feature transfer vector capabilities, in abstract we plot form the representsrepresents thisthis knowledge.knowledge. Therefore,Therefore, toto analanalyzeyze thethe pre-trainedpre-trained models'models' knowledgeknowledge transfertransfer capabilities,capabilities, wewe plotplot thethe extractedextracted featurefeature vectorvectorss inin two-dimensionaltwo-dimensional space.space. ToTo plotplot thethe featurefeature AI 2020, 1 600

AIAI 2020 2020, ,12 12, ,x x FOR FOR 1515 ofof 2121 extracted feature vectors in two-dimensional space. To plot the feature vectors in two-dimensional space, wevectorsvectors utilize inin two-dimensionaltwo-dimensional principal component space,space, wewe analysis. utilizeutilize principalprincipal After componentcomponent extracting analysis.analysis. feature AfterAfter vectors extractingextracting using featurefeature diff erent vectorsvectors usingusing differentdifferent pre-trainedpre-trained models,models, wewe obtainobtain theirtheir twotwo mostmost prominentprominent principalprincipal pre-trained models, we obtain their two most prominent principal components (PC) and then plot these componentscomponents (PC)(PC) andand thenthen plotplot thesethese prominentprominent PCsPCs inin two-dimensionaltwo-dimensional space.space. FiguresFigures 11–1511–15 areare prominentshowingshowing PCs in representativerepresentative two-dimensional extractedextracted space. featurefeature Figures vectvect 11orsors– 15 forfor are VGG16,VGG16, showing ResNetV2ResNetV2 representative,, InceptionResNetV2,InceptionResNetV2, extracted feature vectors forDenseNet121DenseNet121 VGG16, ResNetV2, andand MobileNetV2,MobileNetV2, InceptionResNetV2, respectively.respectively. DenseNet121 and MobileNetV2, respectively.

Figure 11.FigureFigureClusters 11.11. ClustersClusters for feature forfor featurefeature extracted extractedextracted using usingusing VGG16 VGG16VGG16 pre-trained pre-trainedpre-trained CNN CNNCNN model. model.model. InIn thisthis figure,figure, figure, CD,CD, CD, NR, and PN representNR,NR, andand PNPN COVID, representrepresent Normal, COVID,COVID, Norm andNorm Pneumonia,al,al, andand Pneumonia,Pneumonia, respectively. respectively.respectively.

Figure 12.FigureFigureClusters 12.12. ClustersClusters for feature forfor featurefeature extracted extractedextracted using usingusing ResNetV2 ResNetV2ResNetV2 pre-trainedpre-trained CNNCNN CNN model.model. model. InIn thisthis In figure,figure, this figure, CD,CD, CD, NR,AI and 2020NR, PNNR,, 12 and represent,and x FOR PNPN representrepresent COVID, COVID,COVID, Normal, NormNorm andal,al, andand Pneumonia, Pneumonia,Pneumonia, respectively. respectively.respectively. 16 of 21

FigureFigure 13. 13.Clusters Clusters for for featurefeature extracted extracted using using InceptionR InceptionResNetV2esNetV2 pre-trained pre-trained CNN model. CNN In this model. figure, In this figure,CD, CD, NR, NR,and PN and represent PN represent COVID, COVID, Normal, and Normal, Pneumonia, and Pneumonia, respectively. respectively.

Figure 14. Clusters for feature extracted using DenseNet121 pre-trained CNN model. In this figure, CD, NR, and PN represent COVID, Normal, and Pneumonia, respectively.

Figure 15. Clusters for feature extracted using MobileNetV2 pre-trained CNN model. In this figure, CD, NR, and PN represent COVID, Normal, and Pneumonia, respectively.

As shown in these figures, the feature vectors for different classes (COVID, Normal, and Pneumonia) occupy regions with minimal overlap among regions if extracted using DenseNet121 or ResnetV2 pre-trained CNNs (Figures 12 and 14). AI 2020, 12, x FOR 16 of 21 AI 2020, 12, x FOR 16 of 21

Figure 13. Clusters for feature extracted using InceptionResNetV2 pre-trained CNN model. In this figure, Figure 13. Clusters for feature extracted using InceptionResNetV2 pre-trained CNN model. In this figure, AICD,2020 NR,, 1 and PN represent COVID, Normal, and Pneumonia, respectively. 601 CD, NR, and PN represent COVID, Normal, and Pneumonia, respectively.

Figure 14. Clusters for feature extracted using DenseNet121 pre-trained CNN model. In this figure, FigureFigure 14. 14.Clusters Clusters for for feature feature extracted extracted using using DenseNet121 DenseNet121 pre-trained pre-trained CNN CNN model. model. In In this this figure, figure, CD, NR, and PN represent COVID, Normal, and Pneumonia, respectively. CD,CD, NR, NR, and and PN PN represent represent COVID, COVID, Normal, Normal, and and Pneumonia, Pneumonia, respectively. respectively.

FigureFigure 15. 15.Clusters Clusters for for feature feature extracted extracted using using MobileNetV2 MobileNetV2 pre-trained pre-trained CNN CNN model. model. InIn thisthisfigure, figure, Figure 15. Clusters for feature extracted using MobileNetV2 pre-trained CNN model. In this figure, CD,CD, NR, NR, and and PN PN represent represent COVID, COVID, Normal, Normal, and and Pneumonia, Pneumonia, respectively. respectively. CD, NR, and PN represent COVID, Normal, and Pneumonia, respectively. AsAs shown shown in thesein these figures, figures, the feature the feature vectors vectors for different for classesdifferent (COVID, classes Normal, (COVID, and Normal, Pneumonia) and As shown in these figures, the feature vectors for different classes (COVID, Normal, and occupyPneumonia) regions occupy with minimalregions with overlap minimal among overlap regions among if extracted regions usingif extracted DenseNet121 using DenseNet121 or ResnetV2 or Pneumonia) occupy regions with minimal overlap among regions if extracted using DenseNet121 or pre-trainedResnetV2 pre-trained CNNs (Figures CNNs 12 (Figuresand 14). 12 and 14). ResnetV2In comparison, pre-trained the CNNs amount (Figures of overlap 12 and among 14). regions representing different classes for features extracted using either VGG16 or InceptionResNetV2 pre-trained CNNs is quite large (Figures 10 and 12). The overlap between the regions representing different classes for feature vectors extracted using MobileNetV2 pre-trained CNN is between the above two cases (Figure 14). These results suggest that DenseNet121 and ResNetV2 pre-trained CNNs transfer adequate knowledge, whereas VGG16 and InceptionResNetV2 transfer coarse knowledge requiring further processing. It also helps us to build the notion that VGG16 and InceptionResNetV2 can extract low-level features, whereas DenseNet121 and ResNetV2 can extract high-level features making the images quite distinctive.

4.5. Analysis of Fused Integrated Model To access the impact of combining features extracted using different pre-trained CNN models, we in this work framed the fused integrated model wherein the feature vectors extracted with the help of DenseNet121 and ResNetV2 were concatenated before applying to MLP. Figure 16 shows the bar chart for different performance measures for all three classes. This model achieves an accuracy of 0.94 for a three-class classification problem TCCP. This accuracy is better than the best that we could achieve in all the other three approaches signifying the usefulness of fusing the feature vectors AI 2020, 12, x FOR 17 of 21

In comparison, the amount of overlap among regions representing different classes for features extracted using either VGG16 or InceptionResNetV2 pre-trained CNNs is quite large (Figures 10 and 12). The overlap between the regions representing different classes for feature vectors extracted using MobileNetV2 pre-trained CNN is between the above two cases (Figure 14). These results suggest that DenseNet121 and ResNetV2 pre-trained CNNs transfer adequate knowledge, whereas VGG16 and InceptionResNetV2 transfer coarse knowledge requiring further processing. It also helps us to build the notion that VGG16 and InceptionResNetV2 can extract low-level features, whereas DenseNet121 and ResNetV2 can extract high-level features making the images quite distinctive.

4.5. Analysis of Fused Integrated Model To access the impact of combining features extracted using different pre-trained CNN models, we in this work framed the fused integrated model wherein the feature vectors extracted with the help of DenseNet121 and ResNetV2 were concatenated before applying to MLP. Figure 16 shows the bar chart for different performance measures for all three classes. This model achieves an accuracy of AI0.942020 for, 1 a three-class classification problem TCCP. This accuracy is better than the best that we could602 achieve in all the other three approaches signifying the usefulness of fusing the feature vectors extractedextracted using didifferentfferent pre-trainedpre-trained models.models. The other performanceperformance measures also show thethe samesame trends,trends, i.e.,i.e., anan improvementimprovement inin valuesvalues comparedcompared toto allall otherother approachesapproaches forfor allall classes. classes.

Figure 16. PPV,Sensitivity, Specificity, and Accuracy bar charts for dataset2 for a fused, integrated model Figure 16. PPV, Sensitivity, Specificity, and Accuracy bar charts for dataset2 for a fused, integrated (FIM). In this figure, CD, NR, and PN represent COVID, Normal, and Pneumonia. model (FIM). In this figure, CD, NR, and PN represent COVID, Normal, and Pneumonia. 5. Comparison of Results with other Modeling Studies 5. Comparison of Results with other Modeling Studies The comparison of our proposed models with other modeling studies reveal that we also obtain similarThe kind comparison of results. of Theour pro comparisonposed models is depicted with othe inr Table modeling5. As studies can be seenreveal from that thiswe also table obtain that accuracysimilar kind for three-classof results. classificationThe comparison problem is depicted is always in lessTable than 5. As accuracy can be for seen two-class from this classification table that problem.accuracy for Our three-class results are classification consistent with problem the other is alwa similarys less studies. than accuracy for two-class classification problem. Our results are consistent with the other similar studies. Table 5. Comparison of Accuracy for different modeling studies Table 5. Comparison of Accuracy for different modeling studies Study Type Model Accuracy Study Type Model Accuracy VGG19 (3-Class) 93.48 Ioannis et al. [36] Chest X-ray VGG19 (3-Class) 93.48 Ioannis et al. [36] Chest X-ray VGG19 (2-Class) 98.75 VGG19 (2-Class) 98.75 ResNetResNet+SVM +SVM SethySethy and and Behera Behera [42] [42] ChestChest X-rayX-ray 95.38 (3-Class)(3-Class) 95.38 COVIDX-NetCOVIDX-Net HemdanHemdan et al et [35] al [35] ChestChest X-rayX-ray 90 (2-Class)(2-Class) 90 DeepDeep CNN CNN ResNet- ResNet- 50 50 NarinNarin et al. et [37] al. [37] ChestChest X-rayX-ray 98 (2-Class)(2-Class) 98 COVID-CAPSCOVID-CAPS AfsharAfshar et al. et [27] al. [27] ChestChest X-rayX-ray 95.7 (3-Class)(3-Class) 95.7 Efficient Net Eduardo et al. [39] Chest X-ray Efficient Net Eduardo et al. [39] Chest X-ray 91.4 (3-Class)(3-Class) 91.4 Darknet 87.02 Tulin et al. [38] Chest X-ray (3-Class) 98.08 (2-Class) Proposed: Shallow Tuning 87 Chest X-ray VGG (3-Class) 98 VGG (2-Class) SIM 85 Chest X-ray DenseNet (3-Class) 99 DenseNet (2-Class) FIM Chest X-ray DenseNet+ResNet 94 (3-Class) K-Means 91 Chest X-ray DenseNet (3-Class) 99 DenseNet (2-Class) AI 2020, 1 603

6. Conclusions In this work, we proposed four different approaches for the detection of COVID-19. For the extraction of features using CNN we used Python. For K-means approach, principal component analysis, visualization, plotting of figures etc., we made use of Matlab. The number of epoch and batch size for different approaches is 15 epochs and batch size 5. All these approaches are based on the concept of transfer learning using pre-trained convolutional neural networks. We used VGG16, ResNetV2, InceptionResNetV2, DenseNet121, and MobileNetV2 CNNs pre-trained on the ImageNet dataset individually in the three approaches and used ResNetV2 with DenseNet in fused integrated model approach. It is interesting to find that even though there is a little resemblance between the ImageNet dataset and X-ray images dataset, these pre-trained models possess sufficient knowledge transfer capabilities to make them suitable and useful for classification purposes. We could show that even a straightforward unsupervised K-means clustering algorithm clusters the extracted features into three clusters. Another important conclusion is that the application of the concept of feature selection not only reduces the computational time but also helps in improving the performance of the classifiers. Surprisingly we found that the performance of VGG16 based model was better than other pre-trained based models in case of a shallow tuning approach. This result indicates that if the pre-trained model extracts low-level features, then the chance of getting improved performance while using shallow tuning gets enhanced. Of all the approaches fused integral model approach achieves better values for different performance measures. These results suggest that different pre-trained CNN models have different feature extraction capabilities, and the fusion of these extracted features enhances the classifier performance. Although we have used these approaches to classify X-ray images, our results indicate that these approaches will prove useful to classify data of any domain with a limited dataset. Finally, it is essential to mention that since in this study we did not apply any interpretation method we recommend that present study and the results obtained in this study should not be directly used in clinical cases.

Author Contributions: Data curation, T.G.; formal analysis, A.R.G.; investigation, A.R.G.; methodology, A.R.G.; resources, O.P.M.; software, T.G.; supervision, O.P.M.; writing—original draft, M.G.; writing—review and editing, A.R.G. All authors have read and agreed to the published version of the manuscript. Funding: This research received no external funding. Conflicts of Interest: The authors declare no conflict of interest.

References

1. Allam, M.; Cai, S.; Ganesh, S.; Venkatesan, M.; Doodhwala, S.; Song, Z.; Hu, T.; Kumar, A.; Heit, J.; COVID-Nineteen Study Group; et al. COVID-19 diagnostics, tools, and prevention. Diagnostics 2020, 10, 409. [CrossRef] 2. Abbott. Abbott Launches Molecular Point-of-Care Test to Detect Novel Coronavirus in as Little as Five Minutes. Available online: https://abbott.mediaroom.com/2020-03-27-Abbott-Launches-MolecularPoint-of- Care-Test-to-Detect-Novel-Coronavirus-in-as-Little-as-Five-Minutes (accessed on 25 October 2020). 3. Mitra, A.P.; Suri, S.; Goyal, T.; Misra, R.; Singh, K.; Garg, M.K.; Misra, S.; Sharma, P. Association of comorbidities with Coronavirus disease 2019: A review. Ann. Natl. Acad. Med. Sci. 2020, 56, 102–111. [CrossRef] 4. Rubin, G.D.; Ryerson, C.J.; Haramati, L.B.; Sverzellati, N.; Kanne, J.P.; Raoof, S.; Schluger, N.W.; Volpi, A.; Yim, J.-J.; Martin, I.B.; et al. The role of chest imaging in patient management during the COVID-19 pandemic. Chest 2020, 158, 106–116. [CrossRef][PubMed] 5. Bhalla, A.S.; Jana, M.; Naranje, P.; Manchanda, S. Role of chest radiographs during COVID-19 pandemic. Ann. Natl. Acad. Med Sci. 2020, 56, 138–144. [CrossRef] 6. Li, Y.; Xia, L. Coronavirus disease 2019 (COVID-19): Role of chest CT in diagnosis and management. Am. J. Roentgenol. 2020, 214, 1280–1286. [CrossRef] 7. Zhao, W.; Zhong, Z.; Xie, X.; Yu, Q.; Liu, J. Relation between chest CT findings and clinical conditions of coronavirus disease (COVID-19) pneumonia: A multicenter study. Am. J. Roentgenol. 2020, 214, 1072–1077. [CrossRef] 8. Qin, C.; Yao, D.; Shi, Y.; Song, Z. Computer-aided detection in chest radiography based on artificial intelligence: A survey. Biomed. Eng. Online 2018, 17, 1–23. [CrossRef] AI 2020, 1 604

9. Shen, D.; Wu, G.; Suk, H. Deep learning in medical image analysis. Annu. Rev. Biomed. Eng. 2017, 19, 221–248. [CrossRef] 10. Litjens, G.; Kooi, T.; Bejnordi, B.E.; Setio, A.A.A.; Ciompi, F.; Ghafoorian, M.; Van Der Laak, J.A.W.M.; Van Ginneken, B.; Sánchez, C.I. A survey on deep learning in medical image analysis. Med. Image Anal. 2017, 42, 60–88. [CrossRef] 11. Ker, J.; Wang, L.; Rao, J.; Lim, T. Deep learning applications in medical image analysis. IEEE Access 2018, 6, 9375–9389. [CrossRef] 12. Faust, O.; Hagiwara, Y.; Hong, T.J.; Lih, O.S.; Acharya, U.R. Deep learning for healthcare applications based on physiological signals: A review. Comput. Methods Programs Biomed. 2018, 161, 1–13. [CrossRef][PubMed] 13. Murat, F.; Yildirim, O.; Talo, M.; Baloglu, U.B.; Demir, Y.; Acharya, U.R. Application of deep learning techniques for heartbeats detection using ECG signals-analysis and review. Comput. Biol. Med. 2020, 120, 103726. [CrossRef][PubMed] 14. Topol, E.J. High-performance medicine: The convergence of human and artificial intelligence. Nat. Med. 2019, 25, 44–56. [CrossRef][PubMed] 15. Esteva, A.; Robicquet, A.; Ramsundar, B.; Kuleshov, V.; Depristo, M.; Chou, K.; Cui, C.; Corrado, G.; Thrun, S.; Dean, J. A guide to deep learning in healthcare. Nat. Med. 2019, 25, 24–29. [CrossRef] 16. Kermany, D.S.; Goldbaum, M.; Cai, W.; Valentim, C.C.S.; Liang, H.; Baxter, S.L.; McKeown, A.; Yang, G.; Wu, X.; Yan, F.; et al. Identifying medical diagnoses and treatable diseases by image-based deep learning. Cell 2018, 172, 1122–1131. [CrossRef][PubMed] 17. Lan, L.; Ye, C.; Wang, C.; Zhou, S. Deep convolutional neural networks for WCE abnormality detection: CNN architecture, region proposal and transfer learning. IEEE Access 2019, 7, 30017–30032. [CrossRef] 18. Brown, J.M.; Campbell, J.P.; Beers, A.; Chang, K.; Ostmo, S.; Chan, R.P.; Dy, J.; Erdogmus, D.; Ioannidis, S.; Kalpathy-Cramer, J.; et al. Automated diagnosis of plus disease in retinopathy of prematurity using deep convolutional neural networks. JAMA Ophthalmol. 2018, 136, 803–810. [CrossRef] 19. Shin, H.-C.; Roth, H.R.; Gao, M.; Lu, L.; Xu, Z.; Nogues, I.; Yao, J.; Mollura, D.; Summers, R.M. Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning. IEEE Trans. Med. Imaging 2016, 35, 1285–1298. [CrossRef] 20. Oquab, M.; Bottou, L.; Laptev, I.; Sivic, J. Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks. In Proceedings of the 2014 IEEE Conference on and Pattern Recognition, Institute of Electrical and Electronics Engineers (IEEE), Columbus, OH, USA, 23–28 June 2014; pp. 1717–1724. 21. Kaur, T.; Gandhi, T.K. Deep convolutional neural networks with transfer learning for automated brain image classification. Mach. Vis. Appl. 2020, 31, 1–16. [CrossRef] 22. Tajbakhsh, N.; Shin, J.Y.; Gurudu, S.R.; Hurst, R.T.; Kendall, C.B.; Gotway, M.B.; Liang, J. Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE Trans. Med. Imaging 2016, 35, 1299–1312. [CrossRef] 23. Li, F.; Liu, Z.; Chen, H.; Jiang, M.-S.; Zhang, X.; Wu, Z. Automatic detection of diabetic retinopathy in retinal fundus photographs based on deep learning algorithm. Transl. Vis. Sci. Technol. 2019, 8, 4. [CrossRef] [PubMed] 24. Wang, C.; Chen, D.; Hao, L.; Liu, X.; Zeng, Y.; Chen, J.; Zhang, G.; Lin, H.; Liu, B.X.; Zeng, C.Y.; et al. Pulmonary image classification based on inception-v3 transfer learning model. IEEE Access 2019, 7, 146533–146541. [CrossRef] 25. Mormont, R.; Geurts, P.; Maree, R. Comparison of Deep Transfer Learning Strategies for Digital Pathology. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Institute of Electrical and Electronics Engineers (IEEE), Salt Lake City, UT, USA, 18–22 June 2018. 26. Wang, L.; Wong, A. COVID-net: A tailored deep convolutional neural network design for detection of COVID-19 cases from chest radiography images. arXiv 2020, arXiv:2003.09871. [CrossRef][PubMed] 27. Afshar, P.; Heidarian, S.; Naderkhani, F.; Oikonomou, A.; Plataniotis, K.N.; Mohammadi, A. COVID-CAPS: A capsule network-based framework for identification of COVID-19 cases from X-ray images. Pattern Recognit. Lett. 2020, 138, 638–643. [CrossRef][PubMed] 28. Islam, Z.; Islam, M.; Asraf, A. A combined deep CNN-LSTM network for the detection of novel coronavirus (COVID-19) using X-ray images. Inform. Med. Unlocked 2020, 20, 100412. [CrossRef] 29. Rahimzadeh, M.; Attar, A. A new modified deep convolutional neural network for detecting COVID-19 from X-ray images. arXiv 2020, arXiv:2004.08052. AI 2020, 1 605

30. Alqudah, A.M.; Qazan, S.; Alquran, H.H.; Qasmieh, I.A.; Alqudah, A. Covid-2019 Detection Using X-ray Images and Artificial Intelligence Hybrid Systems. Available online: https://doi.org/10.13140/RG.2.2.16077. 59362/1 (accessed on 3 December 2020). 31. Ucar, F.; Korkmaz, D. COVIDiagnosis-Net: Deep Bayes-SqueezeNet based diagnosis of the coronavirus disease 2019 (COVID-19) from X-ray images. Med Hypotheses 2020, 140, 109761. [CrossRef] 32. Kumar, P.; Kumari, S. Detection of Coronavirus Disease (COVID-19) Based on Deep Features. Available online: https://Www.Preprints.Org/Manuscript/202003.0300/V1 (accessed on 3 December 2020). 33. Jain, R.; Gupta, M.; Taneja, S.; Hemanth, D.J. Deep learning based detection and analysis of COVID-19 on chest X-ray images. Appl. Intell. 2020, 1–11. [CrossRef] 34. Abbas, A.; Abdelsamea, M.M.; Gaber, M.M. Classification of COVID-19 in chest X-ray images using DeTraC deep convolutional neural network. Appl. Intell. 2020, 1–11. [CrossRef] 35. Hemdan, E.E.D.; Shouman, M.A.; Karar, M.E. COVIDX-net: A framework of deep learning classifiers to diagnose COVID-19 in X-ray images. arXiv 2020, arXiv:2003.11055. 36. Apostolopoulos, I.D.; Mpesiana, T.A. Covid-19: Automatic detection from X-ray images utilizing transfer learning with convolutional neural networks. Phys. Eng. Sci. Med. 2020, 43, 635–640. [CrossRef][PubMed] 37. Narin, A.; Kaya, C.; Pamuk, Z. Automatic detection of Coronavirus disease (COVID- 19) using X-ray images and deep convolutional neural networks. arXiv 2020, arXiv:2003.10849. 38. Ozturk, T.; Talo, M.; Yildirim, E.A.; Baloglu, U.B.; Yildirim, O.; Acharya, U.R. Automated detection of COVID-19 cases using deep neural networks with X-ray images. Comput. Biol. Med. 2020, 121, 103792. [CrossRef][PubMed] 39. Luz, E.; Lopes Silva, P.; Silva, R.; Moreira, G. Towards an efficient deep learning model for covid-19 patterns detection in x-ray images. arXiv 2020, arXiv:2004.05717. 40. Ghoshal, B.; Tucker, A. Estimating uncertainty and interpretability in deep learning for coronavirus (COVID-19) detection. arXiv 2020, arXiv:2003.10769. 41. Zhang, J.; Xie, Y.; Li, Y.; Shen, C.; Xia, Y. COVID-19 screening on Chest X-ray images using deep learning based anomaly detection. arXiv 2020, arXiv:2003.12338. 42. Sethy, P.K.; Behera, S.K.; Ratha, P.K.; Biswas, P. Detection of Coronavirus disease (COVID-19) based on deep features and support vector machine. Int. J. Math. Eng. Manag. Sci. 2020, 5, 643–651. [CrossRef] 43. Roberts, M. Machine learning for COVID-19 detection and prognostication using chest radiographs and CT scans: A systematic methodological review. arXiv 2020, arXiv:2008.06388. 44. Majeed, T.; Rashid, R.; Ali, D.; Asaad, A. Issues associated with deploying CNN transfer learning to detect COVID-19 from chest X-rays. Phys. Eng. Sci. Med. 2020, 1–15. [CrossRef] 45. Hubel, D.H.; Wiesel, T.N. Receptive fields, binocular interaction, and functional architecture in the cat’s visual cortex. J. Physiol. 1962, 160, 106–154. [CrossRef] 46. Van Essen, D.C.; Maunsell, J.H. Hierarchical organization and functional streams in the visual cortex. Trends Neurosci. 1983, 6, 370–375. [CrossRef] 47. LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [CrossRef] 48. Razavian, A.S.; Azizpour, H.; Sullivan, J.; Carlsson, S. CNN Features Off-the-Shelf: An Astounding Baseline for Recognition. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops, Institute of Electrical and Electronics Engineers (IEEE), Columbus, OH, USA, 23–28 June 2014; pp. 512–519. 49. Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. 50. Xie, S.; Girshick, R.; Dollar, P.; Tu, Z.; He, K. Aggregated Residual Transformations for Deep Neural Networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Institute of Electrical and Electronics Engineers (IEEE), Honolulu, HI, USA, 21–26 July 2017; pp. 5987–5995. 51. He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial pyramid pooling in deep convolutional networks for visual recognition. arXiv 2014, arXiv:1406.4729. 52. Zewen, L.; Wenjie, Y.; Shouheng, P.; Fan, L. A survey of convolutional neural networks: Analysis, applications, and prospects. arXiv 2020, arXiv:2004.02806v1. AI 2020, 1 606

53. Szegedy, C.; Ioffe, S.; Vanhoucke, V. Inception-v4, Inception-Resnet and the Impact of Residual Connections on Learning. In Proceedings of the 2016 International Conference on Learning Representations, San Juan, PR, USA, 2–4 May 2016. 54. Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.-C. MobileNetV2: Inverted Residuals and Linear Bottlenecks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520. 55. Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely Connected Convolutional Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Institute of Electrical and Electronics Engineers (IEEE), Salt Lake City, UT, USA, 21–26 July 2017; pp. 2261–2269. 56. Cohen, J.P. COVID-19 image data collection. arXiv 2020, arXiv:2003.11597. Available online: https: //github.com/ieee8023/covid-chestxray-dataset (accessed on 25 October 2020). 57. Kermany, D.; Zhang, K.; Goldbaum, M. Labeled Optical Coherence Tomography (OCT) and Chest X-ray Images for Classification Mendeley Data. Available online: http://dx.doi.org/10.17632/rscbjbr9sj.2 (accessed on 3 December 2020). 58. Ruder, S. An overview of gradient descent optimization algorithms. arXiv 2016, arXiv:1609.04747. 59. Wilson, A.C.; Roelofs, R.; Stern, M.; Srebro, N.; Recht, B. The marginal value of adaptive gradient methods in machine learning. arXiv 2017, arXiv:1705.08292. 60. Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. 61. Mehta, S.; Shete, D.; Lingayat, N.S.; Chouhan, V. K-means algorithm for the detection and delineation of QRS-complexes in Electrocardiogram. IRBM 2010, 31, 48–54. [CrossRef] 62. Jolliffe, T. Principal Component Analysis; Springer: New York, NY, USA, 1986. 63. Murtagh, F. Multilayer perceptrons for classification and regression. Neurocomputing 1991, 2, 183–197. [CrossRef] 64. Haykin, S. Neural Networks and Learning Machines; Pearson Education: Upper Saddle River, NJ, USA, 2009.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).