Orientation Detection Using a CNN Designed by Transfer Learning of Alexnet
Total Page:16
File Type:pdf, Size:1020Kb
Proceedings of the 8th IIAE International Conference on Industrial Application Engineering 2020 Orientation Detection Using a CNN Designed by Transfer Learning of AlexNet Fusaomi Nagataa,*, Kohei Mikia, Yudai Imahashia, Kento Nakashimaa Kenta Tokunoa, Akimasa Otsukaa, Keigo Watanabeb and Maki K. Habibc aDepartment of Mechanical Engineering, Faculty of Engineering, Sanyo-Onoda City University Sanyo-Onoda 756-0884, Japan bGraduate School of Natural Science and Technology, Okayama University Okayama 700-8530, Japan cMechanical Engineering Department, School of Sciences and Engineering, The American University in Cairo AUC Avenue, P.O.Box 74, New Cairo 11835, Egypt *Corresponding Author: [email protected] Abstract classification in spite of only two layers. Nagi et al. designed max-pooling convolutional neural networks (MPCNN) for Artificial neural network (ANN) which has four or vision-based hand gesture recognition(1). The MPCNN could more layers structure is called deep NN (DNN) and is classify six kinds of gestures with 96% accuracy and allow recognized as a promising machine learning technique. mobile robots to perform real-time gesture recognition. Convolutional neural network (CNN) has the most used and Weimer et al. also designed a deep CNN architectures for powerful structure for image recognition. It is also known automated feature extraction in industrial inspection that support vector machine (SVM) has a superior ability for process(2). The CNN automatically generates features from binary classification in spite of only two layers. We have massive amount of training image data and demonstrates developed a CNN&SVM design and training tool for defect excellent defect detection results with low false alarm rates. detection of resin molded articles, and the effectiveness and Faghih-Roohi et al. presented a different type of deep CNN validity have been proved through several CNNs design, for automatic detection of rail surface defects(3). It was training and evaluation. The tool further enables to easily concluded that the large CNN model performed a better design a CNN model based on transfer learning concept. In classification result than the small and medium CNN, this paper, a CNN acquired by transfer learning of AlexNet, although the training required a longer time. Zhou et al. used which is the winner of ImageNet LSVRC2012, is designed a CNN to classify the surface defects of steel sheets(4). The to recognize the orientation of objects. The effectiveness of CNN could directly learn better representative features from the transfer learning based CNN is evaluated using test data labeled images of surface defects. Further, Ferguson et al. set. presented a system to identify casting defects in X-ray images based on the Mask Region-based CNN Keywords: Deep learning, Convolutional neural network, architecture(5,6). It is reported that the proposed system Defect detection, Transfer learning, Orientation detection, simultaneously performed defect detection and segmentation robotic pick and place. on input images making it suitable for a range of defect detection tasks. 1. Introduction We have developed a CNN&SVM design and training tool for defect detection of resin molded articles and the Artificial neural network (ANN) which has four or more effectiveness and validity have been proved through several layers structure is called deep NN (DNN) and is recognized CNNs design, training and evaluation(7,8,9). The tool further as a promising machine learning technique. Convolutional enables to easily design a CNN model based on transfer neural network (CNN) has the most used and powerful learning concept. When industrial robots are applied to pick structure for image recognition. It is also known that support and place tasks of resin molded articles, information of each vector machine (SVM) has a superior ability for binary DOI: 10.12792/iciae2020.051 295 © 2020 The Institute of Industrial Applications Engineers, Japan. object’s position and orientation is essential. Recognition Parallel Computing Toolbox for GPU, Deep Learning and extraction of object position in an image are not so Toolbox, Statistics and Machine Learning Toolbox. difficult if using image processing technique, however, that of orientation is not easy due to the variety in shape. In this 3. Images for Training and Test paper, a CNN acquired by transfer learning of AlexNet, which is the winner of ImageNet LSVRC2012, is introduced Training image generator was already proposed to to recognize the orientation of objects in images. The efficiently augment limited number of training images(7). By effectiveness of the CNN is evaluated using test image data using the generator, images for training are prepared set of thin resin mold articles. considering typical twelve orientations, i.e., 0, 15, 30, 45, 60, 75, 90, 105, 120, 135, 150 and 165 degrees. Figures 2 and 3 2. Design Tool for CNN and SVM show examples of the training images for the categories of Figure 1 shows the main dialogue of the developed 45 and 165 [deg], respectively. The resolution and channel CNN&SVM design tool. In training of CNN, pre-training are 200×200 and 1, respectively. using randomly initialize weights and additional (successive) 90° training with once trained weights can be selected. As for 0° 180° SVM, one-class unsupervised learning and two class Image6.jpg image7.jpg image8.jpg image9.jpg image10.jpg image11.jpg image12.jpg supervised learning can be selectively executed. Also, favorite CNN, which is used for a feature extractor, and Kernel function are selected. Image13.jpg image14.jpg image15.jpg image16.jpg image17.jpg image18.jpg image19.jpg The tool has another promising function to design original CNNs based on transfer learning. For example, the Image111.jpg image112.jpg image113.jpg image114.jpg image115.jpg image116.jpg image117.jpg following main items can be set for the operation of transfer learning through the dialogue. (1) Folders for training and test images. Image118.jpg image119.jpg image120.jpg image121.jpg image121.jpg image122.jpg image123.jpg (2) Base CNNs used for transfer learning such as AlexNet, VGG16, VGG19, GoogleNet and Inception-V3. Image370.jpg image371.jpg image372.jpg image373.jpg image374.jpg image375.jpg image376.jpg (3) Learning parameters such as max epochs, mini batch size, desired accuracy and loss, learning rates for convolution layers and fully connected layers. Image377.jpg image378.jpg image379.jpg image380.jpg image381.jpg image382.jpg image383.jpg The software shown in Fig. 1 is developed on MATLAB Fig. 2. Example of training images for 45 [deg]. system optionally installed with Neural Network Toolbox, 90° 0° 180° Image169.jpg image170.jpg image171.jpg image172.jpg image173.jpg image174.jpg image175.jpg Image176.jpg image177.jpg image178.jpg image179.jpg image180.jpg image181.jpg image182.jpg Image111.jpg image112.jpg image113.jpg image114.jpg image115.jpg image116.jpg image117.jpg Image118.jpg image119.jpg image120.jpg image121.jpg image122.jpg image123.jpg image124.jpg Image209.jpg image210.jpg image211.jpg image212.jpg image213.jpg image214.jpg image215.jpg Fig. 1. Main dailogue developed for CNN and SVM Image216.jpg image217.jpg image218.jpg image219.jpg image220.jpg image221.jpg image222.jpg design and training. Fig. 3. Example of training images for 165 [deg]. 296 ) 3 nit nit nit nit nit nit nit x 48) x U U U U U U U 5 element element 3 3 3 o 227 x 227 x 227 x 3 o Normalized Exponential Normalized 3 x layer 1000 FC 384 filters (3 x 3 x 3 x (3 256) x 384 filters 50% dropout 50% 256 filters (5 (5 256 x filters Rectified Linear Rectified Cross Entropy Function Function Entropy Cross Rectified Linear Rectified 3 x 4096 FC layer 4096 FC Rectified Linear Rectified / 5 channels 96 filters (11 x 11 x 11 x (11 x 96 filters 384 filters (3 x 3 x 3 x (3 192) x 384 filters 227 x 3 227 x 227 x Rectified Linear Rectified 5 channels / 5 channels 50% dropout 50% Rectified Linear Rectified 3 x Rectified Linear Rectified 4096 FC layer 4096 FC 256 filters (3 x 3 x 3 x (3 192) x 256 filters t Rectified Linear Rectified 1 2 ReLU ReLU ReLU ReLU ReLU ReLU ReLU Softmax Dropout Dropout Convolution Convolution Convolution Convolution Convolution Max Pooling Max Pooling Max Max Pooling Max Input Images Input Fully Connected Fully Connected Fully Connected 1000 Cross Cross channel norm. Cross channel norm. Classification Output Classification Resize of Resize input image Fig. 4. Network structure of well-known CNN named AlexNet. explained above, an original CNN model acquired by 4. Transfer Learning Based CNN transfer learning of AlexNet, which is the winner of ImageNet LSVRC2012, is presented to recognize the In this section, a transfer learning based CNN is designed to learn the feature of orientation included in Original AlexNet images as shown in Figs. 2 and 3. Figure 4 illustrates the 1 2 structure of the original AleXnet. In order to make the CNN have an ability to classify input images into 12 categories as ReLU ReLU 0, 15, 30, 45, 60, 75, 90, 105, 120, 135, 150 and 165 [deg], Dropout Dropout Softmax Fully Connected Fully Connected Fully Connected 1000 the fully connected layers are replaced as shown in Fig. 5 Output Classification before executing transfer learning. 6,889 images consisting Replaced of 12 categories are used for the transfer learning. As for Transferred CNN training parameters, mini batch size is given 50. Iteration is 0 15 the number of mini batches needed to complete one epoch, 30 ReLU so that one epoch in this transfer learning is composed of ReLU Softmax Dropout Dropout Fully Connected Fully Connected 6,889/50≒137 iterations. Desired accuracy and loss are set Fully Connected Classification Output Classification 165 to 1 and 0, respectively. Besides, learning rates of convolutional layers and fully connected ones are set to Fig.