Convolutional Extreme Learning Machines: a Systematic Review
Total Page:16
File Type:pdf, Size:1020Kb
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 28 April 2021 doi:10.20944/preprints202104.0753.v1 Definitions/notspecified-logo-eps-converted-to.pdf Article Convolutional Extreme Learning Machines: A Systematic Review Iago Richard Rodrigues 1 , Sebastião Rogério 2 , Judith Kelner 1 , Djamel Sadok 1 and Patricia Takako Endo 2 1 Centro de Informática, Universidade Federal de Pernambuco (UFPE); [email protected], [email protected], [email protected] 2 Universidade de Pernambuco (UPE); [email protected], [email protected] * Correspondence: [email protected] (I. R. Rodrigues); [email protected] (P. T. Endo). Abstract: Many works have recently identified the need to combine deep learning with extreme learning to strike a performance balance with accuracy especially in the domain of multimedia applications. Considering this new paradigm, namely convolutional extreme learning machine (CELM), we present a systematic review that investigates alternative deep learning architectures that use extreme learning machine (ELM) for a faster training to solve problems based on image analysis. We detail each of the architectures found in the literature, application scenarios, benchmark datasets, main results, advantages, and present the open challenges for CELM. We follow a well structured methodology and establish relevant research questions that guide our findings. We hope that the observation and classification of such works can leverage the CELM research area providing a good starting point to cope with some of the current problems in the image-based computer vision analysis. Keywords: Convolutional extreme learning machine; Deep learning; Multimedia analysis 1. Introduction Citation: Rodrigues, I. R.; Rogério, S.; Due to the growth of image analysis-based applications, researchers adopted deep Kelner, J.; Sadok, D.; Endo, P. T. Title. learning to develop intelligent systems that provide learning tasks in computer vision, Journal Not Specified 2021 1 , , 0. image processing, text recognition, and other signal processing problems. Deep learning https://doi.org/ architectures are generally a good solution for learning on large-scale data, surpassing classic models that were once state of the art in multimedia problems [1]. Received: Unlike classic approaches to pattern recognition tasks, convolutional neural networks Accepted: Published: (CNNs), a type of deep learning, can perform the process of extracting features and at the same time recognize these features. CNNs can process data stored as multi-dimensional Publisher’s Note: MDPI stays neutral arrays (1D, 2D, and so on). They extract meaningful abstract representations from raw data with regard to jurisdictional claims in [1], such as images, audio, text, video, and so on. CNN’s also have received attention in published maps and institutional affil- the last decade due to their success obtained in fields such as image classification [2], object iations. detection [3], semantic segmentation [4], and medical applications that support a diagnosis by signals or images [5]. Despite its benefits, CNNs also suffers from some challenges: it incurs a high com- putational cost, impacting directly on training and inference times. Classification time is an issue for real-time applications that tolerate a minimal loss of information. Another Copyright: c 2020 by the authors. Licensee MDPI, Basel, Switzerland. challenge is the long training and testing times if we consider a computer with limited hard- This article is an open access article ware resources. Other problems are local minimum, intensive human intervention, and distributed under the terms and vanishing gradients [6]. It is, therefore, necessary to investigate alternative approaches that conditions of the Creative Commons may extract deep feature representation and, at the same time, reduce the computational Attribution (CC BY) license (https:// cost. creativecommons.org/licenses/by/ Extreme learning machine (ELM) is a type of single-layer feed-forward neural network 4.0/). (SLFN) [7] that provides a faster convergence training process and does not require a series Journal Not Specified 2021, 1, 0. https://doi.org/10.3390/1010000 https://www.mdpi.com/journal/notspecified © 2021 by the author(s). Distributed under a Creative Commons CC BY license. Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 28 April 2021 doi:10.20944/preprints202104.0753.v1 Journal Not Specified 2021, 1, 0 2 of 32 of iterations to adjust the weights of the hidden layers. According to [8], “seems that ELM performs better than other conventional learning algorithms in applications with higher noise”, presenting similar or better generalizations in regression and classification tasks. Unlike others, a ELM model executes a single hidden layer of neurons with random feature map- ping, providing a faster learning execution. The low computational complexity attracted a great deal of attention from the research community, especially for high dimensional and large data applications [9]. Based on the strengths of CNNs and ELMs, a new neural network paradigm was proposed, the convolutional extreme learning machine (CELM) [10]. CELMs are quick- training CNNs that avoid gradient calculations for updating the network weights. Filters are efficiently defined for the feature extraction step, and least-squares obtain weights in the classification stage’s output layer through an ELM network architecture. In most cases, the accuracy achieved by CELMs is not the best one [10]. However, the results are very competitive compared to those obtained by convolutional networks, not only in terms of accuracy but also in training and inference time. Some works in the literature have presented a survey on ELM from different per- spectives. Huang et al. [11] present a survey of ELM and its variants. They focused on describing the fundamental design principles and learning theories. The main ELM variants presented by the authors are (i) batch learning mode of ELM, (ii) fully complex ELM, (iii) online sequential ELM, (iv) incremental ELM, and (v) ensemble of ELM. Cao et al. [8] present a survey on ELM while mainly considering high dimensional and large-data applications. The works in the literature are classified into image processing, video pro- cessing, and medical signal processing. Huang et al. [12] present trends in ELM, including ensembles, semi-supervised learning, imbalanced data, and applications such as computer vision and image processing. Salaken et al. [13] explore ELM in conjunction with transfer learning algorithms. Zhang et al. [14] present current approaches based on multilayer ELM (ML-ELM) and its variants compared to classical deep learning. However, despite some ELM surveys, no work focuses specifically on CELM studies from the authors’ knowledge. Therefore, differently from the existing literature, we present a systematic review that concentrates on CELM applied in the context of analysis. The works considered in this systematic review should attend two main concepts in their purposes: (i) usage of deep feature representation through convolution operations and (ii) usage of ELM aiming to achieve fast feature learning in/after the convolution stage. We discuss the proposed architectures, the application scenarios, the benchmark datasets, the principal results and advantages, and the open challenges in the CELM field. The rest of this work is organized as follows: Section2 presents the methodology adopted to conduct this systematic review. The overview of primary studies of this systematic review are presented at Section3. The answers for each research question defined in the systematic review protocol are described in sections4,5,6, and7. Finally, we conclude this work in Section8. 2. Methodology To perform the systematic review, we adopted the methodology previously used by Endo et al. [15] and Coutinho et al. [16]. The mentioned systematic review protocol was originally inspired by the classic protocol proposed by Kitchenham [17]. Fig1 illustrates the methodology adopted in this work. Next, we explain each of these steps. 2.1. Identify need of review Due to the growth of image and big data applications, both academia and industry used deep learning to analyze data and extract relevant information. For large networks, deep learning architectures suffer drawbacks such as high computational cost, slow con- vergence, vanishing gradients, and hardware limitations for training. In this systematic review, we mainly investigate the use of CELM as a viable alternative for deep learning architectures while guarantying quick-training and avoiding gradient Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 28 April 2021 doi:10.20944/preprints202104.0753.v1 Journal Not Specified 2021, 1, 0 3 of 32 Figure 1. Methodology to select papers in this systematic review. calculations necessary for updating the network’s weights. In recent years, CELM’s have solved some of the leading deep learning issues while maintaining a reasonable quality of the solutions in many applications. 2.2. Define research questions We initiate with the definition of four research questions (RQ) related to our topic of study. Our objective is to answer these research questions to raise a discussion of the current state of the art in the usage of CELM in the specific domain of image analysis. The research questions are: • RQ 1: What are the most common problems based on image analysis and datasets analyzed under CELM? • RQ 2: How are defined the CELM architectures in the analyzed works? • RQ 3: Which are the