Integrating Enhanced Sparse Autoencoder-Based Artificial Neural

electronics Article Integrating Enhanced Sparse Autoencoder-Based Artificial Neural Network Technique and Softmax Regression for Medical Diagnosis Sarah A. Ebiaredoh-Mienye, Ebenezer Esenogho * and Theo G. Swart Center for Telecommunication, Department of Electrical and Electronic Engineering Science, University of Johannesburg, Johannesburg 2006, South Africa; [email protected] (S.A.E.-M.); [email protected] (T.G.S.) * Correspondence: [email protected] Received: 28 September 2020; Accepted: 2 November 2020; Published: 20 November 2020 Abstract: In recent times, several machine learning models have been built to aid in the prediction of diverse diseases and to minimize diagnostic errors made by clinicians. However, since most medical datasets seem to be imbalanced, conventional machine learning algorithms tend to underperform when trained with such data, especially in the prediction of the minority class. To address this challenge and proffer a robust model for the prediction of diseases, this paper introduces an approach that comprises of feature learning and classification stages that integrate an enhanced sparse autoencoder (SAE) and Softmax regression, respectively. In the SAE network, sparsity is achieved by penalizing the weights of the network, unlike conventional SAEs that penalize the activations within the hidden layers. For the classification task, the Softmax classifier is further optimized to achieve excellent performance. Hence, the proposed approach has the advantage of effective feature learning and robust classification performance. When employed for the prediction of three diseases, the proposed method obtained test accuracies of 98%, 97%, and 91% for chronic kidney disease, cervical cancer, and heart disease, respectively, which shows superior performance compared to other machine learning algorithms. The proposed approach also achieves comparable performance with other methods available in the recent literature. Keywords: sparse autoencoder; unsupervised learning; Softmax regression; medical diagnosis; machine learning; artificial neural network; e-health 1. Introduction Medical diagnosis is the process of deducing the disease affecting an individual [1]. This is usually done by clinicians, who analyze the patient’s medical record, conduct laboratory tests, and physical examinations, etc. Accurate diagnosis is essential and quite challenging, as certain diseases have similar symptoms. A good diagnosis should meet some requirements: it should be accurate, communicated, and timely. Misdiagnosis occurs regularly and can be life-threatening; in fact, over 12 million people get misdiagnosed every year in the United States alone [2]. Machine learning (ML) is progressively being applied in medical diagnosis and has achieved significant success so far. In contrast to the shortfall of clinicians in most countries and expensive manual diagnosis, ML-based diagnosis can significantly improve the healthcare system and reduce misdiagnosis caused by clinicians, which can be due to stress, fatigue, and inexperience, etc. Machine learning models can also ensure that patient data are examined in more detail and results are obtained quickly [3]. Hence, several researchers and industry experts have developed numerous medical diagnosis models using machine learning [4]. However, some factors are hindering the growth of ML in the medical Electronics 2020, 9, 1963; doi:10.3390/electronics9111963 www.mdpi.com/journal/electronics Electronics 2020, 9, x FOR PEER REVIEW 2 of 13 ML in the medical domain, i.e., the imbalanced nature of medical data and the high cost of labeling data. Imbalanced data are a classification problem in which the number of instances per class is not uniformly distributed. Recently, unsupervised feature learning methods have received massive Electronics 2020, 9, 1963 2 of 13 attention since they do not entirely rely on labeled data [5], and are suitable for training models when the data are imbalanced. domain, i.e., the imbalanced nature of medical data and the high cost of labeling data. Imbalanced data There are various methods used to achieve feature learning, including supervised learning are a classification problem in which the number of instances per class is not uniformly distributed. techniquesRecently, such unsupervisedas dictionary feature learning learning and methods multilayer have perceptron received massive (MLP), attention and sinceunsupervised they do not learning techniquesentirely which rely oninclude labeled dataindependent [5], and are suitablecomponent for training analysis, models whenmatrix the datafactorization, are imbalanced. clustering, unsupervisedThere dictionary are various learning, methods and used autoencoders. to achieve feature An learning,autoencoder including is a supervisedneural network learning used for unsupervisedtechniques feature such learning. as dictionary It is learningcomposed and of multilayer input, hidden, perceptron and (MLP), output and layers unsupervised [6]. The basic learning techniques which include independent component analysis, matrix factorization, clustering, architecture of a three-layer autoencoder (AE) is shown in Figure 1. When given an input data, unsupervised dictionary learning, and autoencoders. An autoencoder is a neural network used for autoencodersunsupervised (AEs) featureare helpful learning. to It isautomatically composed of input, discover hidden, the and outputfeatures layers that [6 ].lead The basicto optimal classificationarchitecture [7]. There of a three-layerare diverse autoencoder forms of (AE)autoencoders, is shown in including Figure1. Whenvariational given anand input regularized autoencoders.data, autoencoders The regularized (AEs) are autoencoders helpful to automatically have been discover mostly the used features in thatsolving lead toproblems optimal where optimal classificationfeature learning [7]. There is needed are diverse for formssubsequent of autoencoders, classification, including which variational is the focus and regularized of this research. Examplesautoencoders. of regularized The regularized autoencoders autoencoders include have deno beenising, mostly contractive, used in solving and problems sparse where autoencoders. optimal We feature learning is needed for subsequent classification, which is the focus of this research. Examples aim to implementof regularized a autoencoderssparse autoencoder include denoising, (SAE) to contractive, learn representations and sparse autoencoders. more efficiently We aim tofrom raw data in orderimplement to ease a sparse the classification autoencoder (SAE) process to learn an representationsd ultimately, more improve efficiently the fromprediction raw data performance in order of the classifier.to ease the classification process and ultimately, improve the prediction performance of the classifier. Figure 1. The structure of an autoencoder. Figure 1. The structure of an autoencoder. Usually, the sparsity penalty in the sparse autoencoder network is achieved using either of these Usually,two methods: the sparsity L1 regularization penalty in or Kullback–Leiblerthe sparse autoencoder (KL) divergence. network It is noteworthy is achieved that using the SAE either of does not regularize the weights of the network; rather, the regularization is imposed on the activations. these two methods: L1 regularization or Kullback–Leibler (KL) divergence. It is noteworthy that the Consequently, suboptimal performances are obtained with this type of structure where the sparsity SAE doesmakes not itregularize challenging the for theweights network of to the approximate network; a near-zerorather, the cost regularization function [8]. Therefore, is imposed in this on the activations.paper, Consequently, we integrate an suboptim improved SAEal performances and a Softmax classifierare obtained for application with this in medicaltype of diagnosis.structure where the sparsityThe SAEmakes imposes it challenging regularization for on thethe weights, network instead to approximate of the activations a asnear-zero in conventional cost function SAE, [8]. Therefore,and in the this Softmax paper, classifier we integrate is used for an performing improved the classificationSAE and a task.Softmax classifier for application in To demonstrate the effectiveness of the approach, three publicly available medical datasets are medical diagnosis. The SAE imposes regularization on the weights, instead of the activations as in used, i.e., the chronic kidney disease (CKD) dataset [9], cervical cancer risk factors dataset [10], conventionaland Framingham SAE, and heartthe Softmax study dataset classifier [11]. We is also used aim for to useperforming diverse performance the classification evaluation task. metrics To demonstrateto assess the performance the effectiveness of the proposed of the method approach and compare, three it publicly with some available techniques availablemedical in datasets the are used, i.e.,recent the literaturechronic andkidney other machinedisease learning(CKD) algorithms dataset [9 such], cervical as logistic cancer regression risk (LR), factors classification dataset and [10], and Framinghamregression heart tree study (CART), dataset support [11]. vector We machine also aim (SVM), to use k-nearest diverse neighbor performance (KNN), linear evaluation discriminant metrics to analysis (LDA), and conventional Softmax classifier. The rest of the paper is structured as follows: assess the performance of the proposed method and compare it with some techniques available in Section2 reviews some

Integrating Enhanced Sparse Autoencoder-Based Artificial Neural

Lecture 4 Feedforward Neural Networks, Backpropagation

Revisiting the Softmax Bellman Operator: New Beneﬁts and New Perspective

Arxiv:2006.04059V1 [Cs.LG] 7 Jun 2020

On the Learning Property of Logistic and Softmax Losses for Deep Neural Networks

CS281B/Stat241b. Statistical Learning Theory. Lecture 7. Peter Bartlett

Loss Function Search for Face Recognition

Deep Neural Networks for Choice Analysis: Architecture Design with Alternative-Specific Utility Functions Shenhao Wang Baichuan

Pseudo-Learning Effects in Reinforcement Learning Model-Based Analysis: a Problem Of

Lecture 18: Wrapping up Classification Mark Hasegawa-Johnson, 3/9/2019

Categorical Data

Xgboost: a Scalable Tree Boosting System

Mixed Pattern Recognition Methodology on Wafer Maps with Pre-Trained Convolutional Neural Networks