2017 2nd International Conference on Computer Engineering, Information Science and Internet Technology (CII 2017) ISBN: 978-1-60595-504-9

The Understanding of Convolutional Neuron Network Family

XINGQUN QI

ABSTRACT

Along with the development of the computing speed and , more and more jobs have been done by computers. Convolutional Neural Network (CNN) is one of the most popular algorithms in the field of . It is mainly used in computer identification, especially in voice, text recognition and other aspects of the application. CNNs have been developed for decades. In theory, Convolutional Neural Network has gradually become more mature. However, in practical applications, it is limited by various conditions. This paper will introduce the course of Convolutional Neural Network’s development today, as well as the current more mature and popular architecture and related applications of it, to help readers have more macro and comprehensive understanding of Convolutional Neural Networks.

KEYWORDS Convolutional Neuron Network, Architecture, CNN family, Application

INTRODUCTION

With the continuous improvements of technology, there are more and more multi- disciplinary applications achieved with computer technology. And Machine Learning, especially Deep Learning is one of the most indispensable and important parts of that technology. Deep learning has the great advantage of acquiring features and modeling from data [1]. The Convolutional Neural Network (CNN) algorithm is one of the best and foremost algorithms in the Deep Learning algorithms. It is mainly used for image recognition and computer vision. In today’s living conditions, the impact and significance of CNN is becoming more and more important. In computer vision, CNN has an important position. According to David H Hubel and Torsten N Wiesel, it was found that the visual cortex of cats and monkeys contained neurons that react individually to certain specific regions [2]. Until the 2012 ImageNet Large Scale Visual Recognition Challenge (LSVRC), Alex Krizhevsky, firsrtly used CNN to achieve breakthroughs in object recognition [3]. CNN has deeply rooted in people’s daily lives, and attracted more and more people’s attention. KDnugget in Google Trends [4] suggested comparing to other computer terms, such as Big Data, Machine Learning was listed as the top place in recent years. Figure 1 shows the increasing trend of Deep Learning in searching volume from 2012 to 2017. ______Xingqun Qi, Beijing University of Posts and Telecommunications, Beijing, China

205

Figure 1. The popularity of search keyword [4].

Additionally, in the 2017’s Eye-catching Apple iPhone X Conference, facial recognition unlock application reflects the popularity of current CNN as well. This paper is mainly from a holistic perspective on the CNN algorithm to do a more comprehensive introduction and promotion. And in this paper, CNN will be introduced in the perspectives of development, extension, and infrastructure and future prospects.

LITERATURE REVIEW

Convolutional Neuron Network has developed with a long period of evolution. Learning the convolution network development process will help us to understand more profoundly of convolution network’s characteristics and important role. Convolutional Neuron Network’s main development is divided in three parts.

Convolutional Neuron Network’s Start

The first effective Deep Learning network was presented by Alexey Lvahknenko and V.G Lapa in 1965 [5]. They made a new definition of the small filter application. And they also proposed three corresponding scenarios for predictions. At that time, prediction was a concrete realization of a Deep Learning. This was the first time that theory and ideas having been applied to practice. Also, Alexey Lvahknenko in 1971, in the case of computer conditions have been limited, created an 8-tier depth of learning network [6]. And it was the successful demonstration of the computer recognition system Alpha learning process. It meant that the first generation of Alpha was available. After that, Lvahknenko proposed the famous Group Method of Data Handling (GMDH) algorithm [7]. This mainly used the idea of data packet processing. This algorithm can automatically model and optimize parameters, which can be applied to neural networks.

Convolutional Neuron Network’s Development

CNN in computer vision applications opened in the last century 70's. Kunihiko Fukishma proposed a new algorithm for a new effective tissue multilayer neural

206 network in 1975 [8]. A multi-tiered organization called "Cognitron" was built to extract specific patterns and features. Then, in 1980, he proposed a new neural network model called "Neocognitron" [9]. Here, the concept of convolution and pooling firstly appeared. "Neocognitron" was mainly used for the identification of handwritten characters, which revealed the idea in the subsequent CNN more reflected. It can be said that "Neocognitron" was the source of inspiration of CNN. After that, Kunihiko Fukishma improved the application of time signals in 1988 [10]. With the help of Kunihiko Fukishma, CNN has gained qualitative improvement in computer vision. And he also contributed to the underlying architecture of CNN.

Convolutional Neuron Network’s Formation

The first forming Convolutional Neutron Network was presented by Lecun in 1989[11]. Among them, Lecun proposed Back Propagation so far was still essential to the CNN algorithm. And this was the first time that CNN was used for practice. CNN was successfully used in the US Post Office’s handwritten postal code’s identification task. In 1998, the Lenet-5 model was proposed to push the CNN algorithm into a small climax [12]. Here, the importance of Back Propagation for CNN was once again verified. Lenet-5 became the first 7-level convolution network. And it was used by several banks to identify digital checks. The hierarchical architecture of the convolution network was also defined more clearly from that. At this point, CNN algorithm has been formally applied to people’s lives. After that, there are also some people having worked on the CNN algorithm to for optimizing and correcting. In 2006, Lecun made a formal definition of the weight update in CNN [13]. Although after a period of time, Support Vector Machine’s (SVM) production has made a certain impact on Convolution Neuron Network’s development. In 2012 ImageNet LSVRC Contest, CNN has made a breakthrough [3]. The error rate was only 18.9%. So far, CNN has become one of the hottest deep learning algorithms.

CNN FAMILY

Since Alex Krizhevsky used the CNN algorithm on the ImageNet LSVRC-2012, the CNN algorithm gained rapid development. Especially in the use of target detection, CNN has a qualitative fly over. In 2013, Ross Girshick made a significant improvement in the original CNN algorithm, and proposed the Region Convolutional Neuron Network (R-CNN) algorithm [14]. The traditional CNN algorithm mainly used the sliding window method in the object retrieval, so that the efficiency of the work was relatively low, and the operation cost was high. Ross changed the original CNN sliding window into a Selective Search method to extract about 2000 regional proposals, and then took the CNN convolution operation, as shown in the Figure 2 [14]. This transformation, compared to 2012 VOC, in the object retrieval accuracy (mAP) was increased by 30%, reaching a staggering 53.3%. The CNN algorithm was also making progress in architectural optimization. In 2014, Christian Szegedy proposed a new CNN architecture called "Inception" [15]. "Inception" with a 22-layer network was named "GoogLeNet". At the same time,

207 Christian Szegedy also proposed a filter-level sparsity framework to speed up Convolutional Neuron Network’s efficiency. This method of using sparsity framework to improve the efficiency of CNN algorithm was still worth learning and emulating today.

Figure 2. The architecture of R-CNN. [14]

ARCHITECTURE

In this section, we will introduce the basic structure of CNN algorithm which had more details. CNN is a hierarchical network with inheritance relationships. Each level of CNN plays a unique role in CNN operations. The elements in each layer of CNN are linearly independent of each other. Convolution only determines the way of calculation, Neuron Network determines the way of transmission.

Convolution Layer

This part mainly introduces the first level of CNN structure, Convolution Layer. It’s including the relevant concepts and operations involved in this hierarchy. The effect of the convolution layer is mainly to filter the input image signal, and then output a part that met the certain features. The convolution layer's input is a picture or a set of pixel values. Here we used Michael Nielsen's method of interpretation [16]. As shown in Figure 3 [16]. For example, we entered a set of 32 * 32 * 3 array of pixel values. The upper left corner of the delineated 5 * 5 size area is called "Receptive Filed". The term used to delineate the upper left corner is called "Filter" (or "Neuron" or "Kernel") in Deep Learning. Filter is an array whose values are called the filter weight or parameters. When the filter is sliding on the input image or sliding on the pixel group, the value of the filter and the value of the pixel group carry out the dot product operation. This process is called "Convolution". And the size of each slide is called "Stride ". On the other hand, the filter can be seen as Feature identifiers [17]. These features include factors such as right angles, curves, and so on. And the convolution is the filter sliding on the image to find the image area that meet the features. And the weight of the filter and the process of optimization is called "Training".

208

Figure 3. An example of convolution. [16]

Rectified Linear Units (ReLU) Layer

Rectified Linear Units (ReLU) Layer typically follows the convolution layer. The main role of it is to introduce non-linear features to the system, but also can be used to reduce the problem of gradient disappear [18]. The introduction of non-linear features in the ReLU layer is the reference to nonlinear feature functions. Non-linear feature functions could use f (x) = max (0, x) or sine, cosine and other functions. The use of nonlinear feature functions is generally determined, as the case may be. In Alex Krizhevsky's method, the special function and function of the non-linear feature function in the ReLU layer are specifically explained and applied [3]. Alex Krizhevsky used f (x) = tanh (x) or f (x) = (1 + e x) as a nonlinear characteristic function. And he has used a lot of experimental data to prove that these two functions in the CNN algorithm applications can provide higher accuracy.

Pooling Layer

The Pooling layer is typically applied after several convolution layers or ReLU layers. Pooling layer generally contains two roles, reducing the weight of the calculation and preventing overfitting. Overfitting is mainly due to the problem that a model and training samples are too matched to be inefficient in actual detection [19]. In the pooling layer there are a variety of pooling methods, such as max-pooling, L2- Norm, average-pooling and so on. Here is a brief introduction to max-pooling method. As shown in Figure 4 [19], Max-pooling generally uses 2 * 2 filter and length 2 of the Stride. When the Max-pooling is added to the input picture, and it outputs the maximum value of each region obtained by convolution [20]. In addition, pooling layer is also optimized for specific applications. Kaiming proposed a new pooling layer for the R-CNN algorithm [21].

209

Figure 4. The process of Max-pooling. [19]

Fully Connected Layer

The fully connected layer is the more backward level in the CNN. Its main function is to extract the feature vectors and classify it. There are two main ways to categorize: Support Vector Machine (SVM) and Softmax. Both classifiers have their own characteristics. The SVM method is generally applied to less sample training, and it uses the IOU to assign a fixed score to the sample, while the Softmax gives a probability for the categorized sample [22]. In addition, there are some methods used in CNN to improve accuracy. Dropout is mainly used to prevent over-fitting. However, if there is an overfitting in the fully connection layer, the other target detection process will be very slow. Dropout is used to lose or drop some parameter sets as randomly [23]. Transferable is also a way to improve CNN efficiency. Transferable referred to the process of using the pre-training model (the weights and parameters of the neural network have been trained by others using a larger data set) and the "fine-tuning" of the model with their own datasets [24].

DISCUSSION OF CNN APPLICATION

This section will mainly list and discuss some examples of applications of Convolutional Neuron Network.

Image Recognition System

CNN is mainly used in the field of image recognition. In practical applications, CNN can be used in the basement, parking or urban streets, and highways on the illegal identification of vehicles and so on. This technique has matured in theory;

210 however, in practice it has been limited to a lot of restrictions with poor effect. In the city streets for illegal vehicles, the images of intelligent identification are often limited to a lot of environmental restrictions. For example, in the haze, rain and snow weather on the identification of illegal vehicles, the efficiency and accuracy were often relatively low. License plate is a critical factor in vehicle identification. Since in the case of poor environment, CNN algorithm applied in the image with not very high resolution is often ineffective. There are also some problems in the identification of expressway vehicles in violation of the application. As the speed of the vehicle is very fast, photographed vehicle license photos are often seriously distorted. Under such conditions, there is still a long to to apply the CNN algorithm propriety to gain high efficiency and accuracy of image recognition.

Video Analysis

Compared to image recognition, CNN has less applications in video analysis. There is a problem with the time dimension in the video analysis. Moez Baccouche proposed a fully automated deep model in 2011 [25]. This model can be used to automatically learn the temporal and spatial characteristics, and then trained to get a practical neural network.

CONCLUSION

This paper mainly summarizes the CNN from three sources: CNN historical origin, CNN structure and CNN application. In addition, this paper also introduces the CNN algorithm in the R-CNN and filter-level sparse framework of the development. CNN could be said to be one of the most popular algorithms in Deep Learning. At present, CNN algorithm has been gradually matured in theory, but it is often limited by various conditions in practical application. The recommended in the future development of CNN is to strengthen its practical application and provide more stable applying scenarios.

REFERENCES 1. Zhou, F. Y., Jin, L. P., & Dong, J. (2017). Review of convolutional neural network. Chinese Journal of Computers. 2. Hubel, D. H., & Wiesel, T. N. (1998). Early exploration of the visual cortex. Neuron, 20(3), 401- 412. 3. Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097- 1105). 4. Piatetsky, G. (n.d.). Machine Learning overtaking Big Data? Retrieved October 02, 2017, from http://www.kdnuggets.com/2017/05/machine-learning-overtaking-big-data.html 5. Ivakhnenko, A. G., & Lapa, V. G. (1966). Cybernetic predicting devices (No. TR-EE66-5). Purdue Univ Lafayette Ind School Of Electrical Engineering. 6. Ivakhnenko, A. G. (1971). Polynomial theory of complex systems. IEEE transactions on Systems, Man, and Cybernetics, 1(4), 364-378. 7. Ivakhnenko, A. G., & Ivakhnenko, G. A. (1995). The review of problems solvable by algorithms of the group method of data handling (GMDH). And Image Analysis C/C Of Raspoznavaniye Obrazov I Analiz Izobrazhenii, 5, 527-535. 8. Fukushima, K. (1975). Cognitron: A self-organizing multilayered neural network. Biological cybernetics, 20(3-4), 121-136.

211 9. Fukushima, K., & Miyake, S. (1982). Neocognitron: A self-organizing neural network model for a mechanism of visual pattern recognition. In Competition and cooperation in neural nets (pp. 267- 285). Springer, Berlin, Heidelberg. 10. Atlas, L. E., Homma, T., & Marks II, R. J. (1988). An artificial neural network for spatio-temporal bipolar patterns: Application to phoneme classification. In Neural Information Processing Systems (pp. 31-40). 11. LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., & Jackel, L. D. (1989). Backpropagation applied to handwritten zip code recognition. Neural computation, 1(4), 541-551. 12. LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278-2324. 13. Bouvrie, J. (2006). Notes on convolutional neural networks. 14. Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 580-587). 15. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., & Rabinovich, A. (2015). Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1-9). 16. Nielsen, M. A. (2015). Neural networks and deep learning. 17. Karpathy, A. (2016). Cs231n: Convolutional neural networks for visual recognition. Neural networks, 1. 18. Nair, V., & Hinton, G. E. (2010). Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th international conference on machine learning (ICML-10) (pp. 807-814). 19. Ng, A. (2017, March 31). The Problem of Overfitting. Retrieved October 02, 2017, from https://www.youtube.com/watch?v=OSd30QGMl88 20. Giusti, A., Ciresan, D. C., Masci, J., Gambardella, L. M., & Schmidhuber, J. (2013, September). Fast image scanning with deep max-pooling convolutional neural networks. In Image Processing (ICIP), 2013 20th IEEE International Conference on (pp. 4034-4038). IEEE. 21. He, K., Zhang, X., Ren, S., & Sun, J. (2014, September). Spatial pyramid pooling in deep convolutional networks for visual recognition. In European Conference on Computer Vision (pp. 346-361). Springer, Cham. 22. Li, F. F., Karpathy, A., & Johnson, J. (2015). CS231n: Convolutional neural networks for visual recognition. University Lecture. 23. Srivastava, N., Hinton, G. E., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: a simple way to prevent neural networks from overfitting. Journal of machine learning research, 15(1), 1929-1958. 24. Yosinski, J., Clune, J., Bengio, Y., & Lipson, H. (2014). How transferable are features in deep neural networks? In Advances in neural information processing systems (pp. 3320-3328). 25. Baccouche, M., Mamalet, F., Wolf, C., Garcia, C., & Baskurt, A. (2011, November). Sequential deep learning for human action recognition. In International Workshop on Human Behavior Understanding (pp. 29-39). Springer, Berlin, Heidelberg.

212