Fast Image Recognition with Gabor Filter and Pseudoinverse Learning Autoencoders

Fast Image Recognition with Gabor Filter and Pseudoinverse Learning AutoEncoders Xiaodan Deng, Sibo Feng, Ping Guo*, and Qian Yin* Image Processing and Pattern Recognition Laboratory, Beijing Normal University, Beijing, China [email protected], [email protected], [email protected], [email protected] Abstract. Deep neural network has been successfully used in various fields, and it has received significant results in some typical tasks, especially in computer vision. However, deep neural network are usually trained by using gradient descent based algorithm, which results in gradient vanishing and gradient explosion problems. And it requires expert level professional knowledge to de- sign the structure of the deep neural network and find the optimal hyper parameters for a given task. Consequently, training a deep neural network becomes a very time consuming problem. To overcome the shortcomings mentioned above, we present a model which combining Gabor filter and pseudoinverse learning autoencoders. The method referred in model optimization is a non- gradient descent algorithm. Besides, we presented the empirical formula to set the number of hidden neurons and the number of hidden layers in the entire training process. The experimental results show that our model is better than ex- isting benchmark methods in speed, at same time it has the comparative recognition accuracy also. Keywords: Pseudoinverse learning autoencoder, Gabor filter, Image recognition, Handcraft feature. 1 Introduction Neural network has attracted many researchers to study, and it has been used in many fields successfully. Currently the most used model for image recognition is convolutional neural networks (CNN). In 1998, Yann LeCun and Yoshua Bengio published a paper on the application of neural networks in handwriting recognition and optimization with back propagation algorithm, and presented model LeNet5 in [1] is consid- ered as the beginning of CNN. Its network structure includes the convolutional layer, the pooling layer and the full connection layer, which are the basic components of the modern CNN network. In 2012, Alex used AlexNet [2] in the contest of ImageNet to refresh the record of image classification and set the position of deep learning in computer vision. AlexNet uses five convolution layers and three fully connected layers for classification. Subsequently, there are many other successful CNN models, 2 X. Deng et al. which became deeper and more complex. In 2015, He et al. proposed the ResNet [3] model which reached 152 layers. Recently, successful CNN models usually have complex structure and need to set many hyper-parameters. Those parameters are related to the performance of the CNN models and they are difficult to tune. Many research groups have presented their research results, however it is difficult to repeat them. On the other hand, because there are too many hyper-parameters, the training of the CNN model is a time-consuming process. Moreover, most deep neural networks are trained by the gradient descent (GD) based algorithms and variations [1,4]. Also, it is found that the gradient descent based algorithm in deep neural networks has inherent instability. This instability blocks the learning process of the previous or later layers. In addition, gradient descent method is easy to be stuck in vanishing gradient problem. Though CNN has good performing result, it need much professional knowledge to use and it takes a lot of time to train. In order to reduce the training time and improve the generalization ability of neural network, we present a model, which combines the Gabor filter [5] and pseudoinverse learning autoencoders (PILAE) [6], to deal with image recognition problem. Gabor transformation belongs to the window Fourier transformation, and the Gabor function can extract relevant characteristics in different scales and directions in the frequency domain. In addition, Gabor function is similar to the biological function of human eyes, so it is often used for texture recognition and has achieved good results. The main advantage of using CNN based deep nets is the features are leant from images, while the advantage of using traditional handcraft - features is that the feature extraction speed. Therefore, to extract features using Gabor filter (GF) is much easier than that of using CNN. In Feng et al.’s work [7], histogram of oriented gradient (HOG) is used to extract features. However, the generation processing of the HOG descriptor is tedious, resulting in slow speed and poor real-time performance. Besides, due to the nature of the gradient, the descriptor is quite sensitive to noise. Hence, we choose Gabor filter to extract features first. Then, PILAE based feed forward neural net is adopted to extract independent feature vectors and perform image recognition. Our proposed GF+PILAE model optimization does not need gradient descent based algorithms. The learning procedure of our model is forward propagating and the whole structure of network is determined with a given strategy in the process of propagation, including the depth of the network and the number of neurons in the hidden layer. It is a completely quasi-automatic learning procedure, so even users without professional knowledge they can easily use it. It is our efforts to prompt democratized artificial intelligence development. 2 Related Work 2.1 Gabor Filter The class of Gabor functions was presented by Gabor [8]. The basic idea of Gabor function is to add a small window to the signal. The Fourier transform of the signal is mainly concentrated in the small window, so it can reflect the local characteristics of Fast Image Recognition with Gabor Filter and PILAE 3 the signal. Daugman [9] extended Gabor function to two-dimensional cases. Gabor wavelet function was regarded as the best model for simulating visual sensory cells in the cerebral cortex [10]. Each visual cell can be viewed as a Gabor filter with a certain direction and scale. When an external stimulus such as image signal inputs visual cells, the output response of visual cells is the convolution of image with Gabor filter, and the output signal is further processed by the brain to form the final impression of cognition. This model can better explain human vision's tolerance to scale and direction change. The two-dimensional Gabor kernel function is defined as follows [11]: 2 2 푥′ +훾2푦′ 푥′ 퐺 (푥, 푦) = exp (− ) cos (2휋 + 휑) , (1) 휆,휃,휑,휎,훾 2휎2 휆 ′ 푥 = (푥 − 푥0)푐표푠휃 + (푦 − 푦0)푠푖푛휃, ′ 푦 = −(푥 − 푥0)푠푖푛휃 + (푦 − 푦0)푐표푠휃. Eq. (1) is obtained by the multiplication of a Gaussian function and a cosine function. The arguments x and y specify the position of a light impulse, where (푥0,푦0) is the center of the receptive field in the spatial domain. 훩 is the orientation of parallel bands in the kernel of Gabor filter, and the valid values are real numbers from 0 to 360. φ is the phase parameter of cosine function in Gabor kernel function, and the valid values is from -180 to 180 degrees. γ is the space aspect ratio, which represents the ellipticity of the Gabor filter. λ is the wavelength parameter of the cosine function in the Gabor kernel function. σ is the standard deviation of Gaussian function in the Gabor kernel function. This parameter determines the size of acceptable area in the Gabor filter. Its value is related to the Bandwidth b and the value of λ. The Bandwidth b indicates the difference in high and low frequency. Eq.(2) presents the relationship of b, σ and λ: 휎 푙푛2 휋+√ 휎 1 푙푛2 2푏+1 b = 푙표푔 휆 2 = √ ∙ . (2) 2 휎 푙푛2 휆 휋 2 2푏−1 휋−√ 휆 2 Fig. 1. Gabor filter bank. These filters are in different scales and orientations [12] Usually we use the Gabor filter in 8 directions, 5 scales, and these parameters can be adjusted. Fig 1 is a sample of Gabor filter bank with forty different Gabor filters. 4 X. Deng et al. Feature extraction is performed using Gabor filter, as shown in Eq. (3). 퐈퐺 = 퐈 ⊕ 퐆. (3) Where I is the grayscale distribution of the image, IG is the feature extracted from I, “⊕” stands for 2D convolution operator, G is the defined Gabor filter. Eq. (3) can be −1 −1 efficiently computed by fast Fourier transform, 퐈퐺 = 퐹 (퐹(퐈)퐹(퐆)), where F is the inverse Fourier transform. Gabor filters are sensitive to the edge information of images and able to adapt to the obvious environments with different light. Studies have found that Gabor wavelet transformation is very suitable for texture expression and separation. Gabor filter needs less data and can meet the real-time requirements compared with other methods. On the other hand, it can tolerate a certain degree of image rotation and defor- mation. 2.2 Autoencoders An autoencoder [13] was first proposed by Rumelhart et al. in 1986. The autoencod- ing neural network is an unsupervised learning scheme, which uses the back propagation algorithm and tries to encode input vectors into hidden vectors, and decode hidden vectors into input vectors. Autoencoders are usually used to reduce dimension and feature learning task. The autoencoder consists of two parts: encoder and decoder. The autoencoder can compress the input into potential space representation and the decoder can reconstruct the input from potential space representation. The loss function of autoencoder can be defined as reconstruction error function in Eq. (4), 2 푁 푖 푖 E = ∑푖=1 ‖(푊푑 (푓(푊푒푥 + 푏푒)) + 푏푑) − 푥 ‖ , (4) where W푒 and W푑 is respectively the weights of encoder and decoder, 푏푒 and 푏푑 is the bias of the encoder and decoder repectively. A stacked autoencoder is a feed forward neural network in which the outputs of each encoder layer are the inputs of the successive layer.

Fast Image Recognition with Gabor Filter and Pseudoinverse Learning Autoencoders

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support