Review of Deep Convolution Neural Network in Image Classification
Total Page:16
File Type:pdf, Size:1020Kb
View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by UMP Institutional Repository 2017 International Conference on Radar, Antenna, Microwave, Electronics, and Telecommunications Review of Deep Convolution Neural Network in Image Classification Ahmed Ali Mohammed Al-Saffar, Hai Tao, Mohammed Ahmed Talab Faculty of Computer Systems and Software Engineering Universiti Malaysia Pahang Pahang, Malaysia [email protected] Abstract—With the development of large data age, recognition system, feature extraction methods such as Scale- Convolutional neural networks (CNNs) with more hidden layers invariant feature transform (SIFT [1]) and histogram of have more complex network structure and more powerful feature oriented gradients (HOG [2]) were used, and then the extracted learning and feature expression abilities than traditional machine Feature input classifier for classification and recognition. These learning methods. The convolution neural network model trained features are essentially a feature of manual design. For by the deep learning algorithm has made remarkable different identification problems, the extracted features have a achievements in many large-scale identification tasks in the field direct impact on the performance of the system, so the of computer vision since its introduction. This paper first researchers need to study the problem areas to be studied in introduces the rise and development of deep learning and order to design Adaptability to better features, thereby convolution neural network, and summarizes the basic model improving system performance. This period of image structure, convolution feature extraction and pooling operation of convolution neural network. Then, the research status and recognition system is generally for a specific identification task, development trend of convolution neural network model based and the size of the data is not large, generalization ability is on deep learning in image classification are reviewed, which is poor, it is difficult in the practical application of the problem to mainly introduced from the aspects of typical network structure achieve accurate identification effect. construction, training method and performance. Finally, some problems in the current research are briefly summarized and II. CONVOLUTION NEURAL NETWORK discussed, and the new direction of future development is forecasted. Deep learning is a branch of machine learning, which is one of the major breakthroughs and research hotspots in Keywords—Deep learning; convolution neural network; image machine learning in recent years. In 2006, Geoffery Hinton, a recognition; Image Classification. professor of computer science at the University of Toronto, and his student, Ruslan Salakhutdinov, published an article in the international top academic journal Science, [3], for the first I. INTRODUCTION time in the depth of learning. This paper mainly points out two Computer vision (CV) is a study of how to use computer points: (1) Artificial neural network with multiple hidden simulation of human visual science, its main task is through the layers has a very powerful feature learning ability. The collection of images (or video) analysis and understanding, to characteristics extracted by the training model have more make judgments or decisions. In the past few decades, CV has abstract and more basic expression of the original input data, (2) made great progress and development. The Image recognition By using the unsupervised learning algorithm to achieve a is a kind of technology that uses computer to process, analyze method called "layer initialization" to achieve the input data and understand the image to identify the target and object of information hierarchical expression, which can effectively different modes. It is a major research direction in the field of reduce the depth of the neural network Training difficulty. computer vision. In the image-based intelligent data acquisition Subsequently, the depth of learning in academia and industry and Processing has a very important role and impact. The use continues to heat up, in the speech recognition, image of image recognition technology can effectively deal with the recognition and natural language processing and other fields to detection and identification of specific target objects (such as obtain a breakthrough. Since 2011, the researchers first in the face, handwritten characters or goods), image classification and voice recognition problem on the application of in-deep subjective image quality assessment and other issues. learning technology, the accuracy rate increased by 20% to At present, image recognition technology has great 30%, made more than a decade the biggest breakthrough. Only commercial market and good application prospect in Internet a year later, the deep learning model based on convolution applications such as image search, commodity neural network has achieved great performance improvement recommendation, user behavior analysis and face recognition. in large-scale image classification tasks, and set off the upsurge At the same time, high-tech such as intelligent robot, of deep learning. In the literature [4], two kinds of acoustic unmanned driving and unmanned aerial vehicle Industry and modeling methods based on deep neural network are proposed, biology, medicine and geology and many other disciplines which are more effective than the traditional modeling method, have broad application prospects. In the early image and have been made larger in Uyghur's large vocabulary 978-1-5386-3849-1/17/$31.00 ©2017 IEEE 26 continuous speech recognition of the performance of the similar to a biological neural network, and the capacity of the upgrade. At present, Google, Microsoft and Facebook and model can be adjusted by changing the depth and breadth of many other Internet technology companies competed to invest the network, and has a strong assumption for natural images a lot of resources, research and development layout of large- (statistical smoothness and local Correlation) . Therefore, scale depth of learning system. CNNs can effectively reduce the learning complexity of the network model, have fewer network connections and weight In the early 1960s, Hubel and Wiesel, through the study of parameters, and are more likely to be trained than the fully cat's visual cortical system of cat, proposed the concept of connected network with a considerable size. receptive field [5] and found the hierarchical processing mechanism of information in the visual cortical pathway, Nobel Prize in Physiology or Medicine. By the mid-1980s, B. Network Structure Fukushima et al. [6], which was based on the concept of A simple convolution neural network model structure receptive field, could be seen as the first realization of diagram shown in Fig. 1, the network model consists of two Convolution neural networks (CNNs) and the first neuron- convolution layers (C1, C2) and two sub-sampling layer (S1, based Between the local connectivity and the hierarchical S2) alternately. First, the original input image is convoluted by structure of the artificial neural network. The neural cognition three trained filters (called convolution kernel) and addable machine decomposes a visual pattern into many subpatterns, bias vectors. Three feature maps are generated in the C1 layer, and these subpattern features are processed by hierarchical and then, for each feature map The localized regions are cascaded feature planes so that the model is very good even in weighted and averaged, and three new feature maps are the case of small targets of the target object Recognition ability. obtained in the S1 layer through a nonlinear activation function. After that, the researchers began experimenting with the use of These feature maps are then convoluted with the three trained an artificial neural network (actually a shallow model with only filters of the C2 layer, and three feature maps are output one hidden layer node) called a multi-layer sensor [7] instead through the S2 layer. The final output of the S2 layer is of manually extracting features and using A simple stochastic vectorized and then input into the traditional neural network for gradient descent method to train the model, and further training. proposed a back propagation algorithm for calculating the error gradient, which was subsequently proved to be very effective C. Convolution Feature Extraction [8]. In 1990, LeCun et al. [9] studied the handwritten digital Natural images have its inherent characteristics, that is, for identification problem, first proposed the use of gradient back a part of the image, its statistical characteristics and other parts propagation algorithm training convolution neural network of the same. This means that the features learned in this section model, and in MNIST [10] handwritten digital data set to show can also be used on another part, so the same learning feature relative to the time Other methods for better performance. The can be used for all positions on the image. In other words, for success of the gradient back propagation algorithm and the large-size image recognition problems, a small piece of local convolution neural network brings new hope to the machine data is randomly selected from the image as a training sample, learning field. It opens up the wave of machine learning based some features are learned from the small sample, and then on the statistical learning model, and also brings the artificial these features are used as filters, with the original