A Study on the Clothes Classification using AlexNet

Hye Yeon Son Min Seon Lee Sun-Kuk Noh Dept of Computer Eng. Dept of Computer Eng. National Program of Excellence in CHOSUN University, Korea CHOSUN University, Korea Software Center, CHOSUN [email protected] [email protected] University, Korea [email protected]

ABSTRACT KEYWORDS (AI), which is driven by Artificial Intelligence, Deep learning, AlexNet, Clothing and deep learning together with new technologies that will lead classification the Fourth Industrial Revolution, will soon be applied in the real world through various research and development. At present, object recognition technology using deep learning is already being 1 INTRODUCTION applied in various fields. Artificial intelligence (AI), which is driven by machine learning In this study, we used the transfer learning of AlexNet among and deep learning together with new technologies that will lead various models and AI to classify different types of clothing. First, the Fourth Industrial Revolution, will soon be applied in the real clothing data were collected from 900 to 1,400 clean image data world through various research and development [1–5]. In without damage and 900 loss image data with damage. These data particular, object recognition technology using deep learning is (total: 2,300) were used as inputs of AlexNet. The clothing image currently being used in various fields, such as disease data were categorized into nine groups: cardigans, jackets, shirts, identification in the medical field, inspection and robot vision in T-shirts, knits, jeans, cotton pants, short pants, and skirts. The industrial fields, and recognition of stop signals in unmanned classification results using AlexNet show a classification accuracy vehicles. of approximately 69.28% with a learning rate of 0.001 and epoke is a research field that examines how computers 10. When the clean image data were tested, the classification process images to perform tasks quickly and efficiently. Images accuracy increased to 76.67%. This finding confirms that the collected from cameras are mostly large and have data equivalent classification accuracy increases when many clean clothing to human vision. However, computer vision is constrained by the images are given to AlexNet as inputs. CPU processing speed and memory capacity limitations, but it is Thus, we confirm the possibility of a further segmentation of continually developed with continuous research on computer clothing and applications for other objects in the future, where performance and machine learning algorithms. GoogLeNet can also be used, which has better performance than In this study, we apply AI using object recognition technology to AlexNet. perform clothing classification and used the transfer learning of AlexNet among various models to classify different types of CCS CONCEPTS clothing. The clothing image data (total: 2,300) were categorized • Computer systems organization • Smart Information into nine groups: cardigans, jackets, shirts, T-shirts, knits, jeans, • Artificial Intelligence cotton pants, short pants, and skirts. and skirts.

2 Artificial Intelligence and Object Recognition

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or 2.1 Artificial Intelligence distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this AI is generally distinguished into two types: machine learning and work must be honored. For all other uses, contact the owner/author(s). deep learning. Deep learning is defined as a set of machine SMA 2020, September 17-19, 2020, Jeju, Republic of Korea © 2020 Copyright held by the owner/author(s). learning algorithms that attempt high levels of abstractions through the combination of several nonlinear transformation techniques and is a field of machine learning that integrates SMA 2020, September 17-19, 2020, Jeju, Republic of Korea F. Surname et al. people’s thinking to computers. Deep learning is a technique that and the number of neurons in the remaining layers of the network solves problems by stacking neural networks and relies on the is 290, 400–186, 624–64, 896–64, 896–43, 264–4096–4096–1000. amount of data. It makes a structure that flexibly responds to various patterns and cases instead of having less assumptions about problems than other machine learning techniques. Many studies have been conducted to represent the performance of deep learning neural network models in the form of computer understanding when there is any arbitrary data and apply it to learning.

Figure 2: The structure of AlexNet

2.4 Transfer learning An assumption of traditional machine learning methodologies is that training and testing data are taken from the same domain, such that the input feature space and data distribution Figure 1: General Neural Network (CNN) characteristics are the same. However, in some real-world 2.2 Object recognition machine-learning scenarios, this assumption does not hold. There are cases where the training data are expensive or difficult to Object recognition is a computer vision technology that uses collect. Therefore, there is a need to create high-performance computers to identify objects in images. People can easily learners trained with more easily obtained data from different recognize characters, objects, scenes, and visual details when they domains. This methodology is referred to as transfer learning see pictures or videos. This ability allows a computer to learn [10,11]. Transfer learning allows one to reduce time consumption what a person can do with a machine. To solve this problem, deep and quickly produce results through training of models with learning and machine learning algorithms, as AI technologies, are several images. widely used [6,7].

Object recognition technology using machine learning is a classification method using machine learning and a convolutional 3 Experimental environment and neural network (CNN) and a feature extraction method for image measurement recognition. This method requires a large amount of training data (learning dataset) and requires CNN to set layers and weights. Figure 1 shows object recognition technology using a CNN. 3.1 Experimental environment The composition of the experimental environment is shown in 2.3 AlexNet Table 1. To classify clothing images collected by the clothing AlexNet involves 60 million parameters and a neural network image datasets, the CNN used AlexNet, written in MATLAB, with 650,000 neurons consisting of five convolutional layers, using transition learning. AlexNet used in the experiment is a pre- some of which consist of three fully connected layers with a trained CNN that uses MATLAB, a webcam, and deep learning to maximum pooling followed by the last 1000-way softmax identify the surrounding objects and is trained on more than 1 [8,9]. To make training faster, unsaturated neurons and highly million images and can be classified in real time into 1,000 efficient Graphic Processing Unit (GPU) convolution operations categories, including keyboards, coffee mugs, pencils, and various have been used. To reduce overfitting on fully connected layers, a animals [12]. normalization method called “dropout” was used, and the top test error rate of 15.3% was achieved and compared to the error rate of Table 1: Experimental equipment 26.2% achieved by a variant of this model presented in the CNN AlexNet ILSVRC 2012 competition. The structure of AlexNet is shown in Figure 2, which explicitly GPU NVIDIA GeForce RTX 2080 Ti shows the delineation of responsibilities between the two GPUs. Clothes Dataset 900 ~ 2300 One GPU runs the layer parts at the top of the figure, whereas the other runs the layer parts at the bottom. The GPUs communicate only in certain layers. The network’s input is 150,528 dimensional,

A Study on the Clothes Classification using AlexNet Deep Learning SMA 2020, September 17-19, 2020, Jeju, Republic of Korea

3.2 Clothes classification and measurement The process of classifying the clothing image dataset using AlexNet is shown in Figure 3 and Figure 4. In AlexNet, nine datasets (cardigans, knits, t-shirts, shirts, jackets, cotton pants, jeans, shorts and pants, and skirts) were classified through CNN learning. For the clothing classification, we used 900–1400 clean images without damage and 900 loss images with damage, a total of 2,300 image data. (a) Clean clothes image (b) Loss clothes image

Figure 5: The clean image and loss image of Cardigan

Table 2: Measurement results of classification experiments Image Clothes Dataset Accuracy (%) Clean 900 63.70 Clean 1400 76.67 Loss 900 54.81

Total 2300 69.28 Figure 3: Clothes classification using CNN

(a) Clean clothes image (1400) Figure 4: Measurement of Clothes classification using AlexNet 3.3 Measurement result The classification results using AlexNet are shown in Figure 6 and Table 2, where the learning rate was 0.001 and the accuracy of the total image was approximately 69.28% under the condition of epoke 10. However, these results were thought to be somewhat insufficient for the image data. The clean and loss images are shown in Figure 5. In Table 2, the classification measurement results showed that the (b) Total clothes image (2300) clothing image dataset (900 clean data, 1400 clean data, 900 loss Figure 6: Results of clothes classification using AlexNet data) was respectively 63.70%, 76.67%, 54.81% and 69.28% in the combined clothing image dataset (total: 2,300). This finding confirms that the classification accuracy increases when many 4 CONCLUSIONS clean clothing images are given to AlexNet as inputs. AI, which is driven by machine learning and deep learning together with new technologies that will lead the Fourth Industrial Revolution, will soon be applied in the real world through various research and development. At present, object recognition technology using deep learning is applied in various fields. In this paper, we propose a novel algorithm for the classification of clothing using object recognition technology. For

SMA 2020, September 17-19, 2020, Jeju, Republic of Korea F. Surname et al. implementation and application, 900 ~ 2,300 collected image data were transferred to and studied by AlexNet. Clothing data was classified into nine types: cardigans, jackets, shirts, T-shirts, knits, jeans, cotton pants, short pants, and skirts. The classification results using AlexNet show a classification accuracy of the total clothing images (Clean (1400) + Loss (900), 2300) of approximately 69.28% with a learning rate of 0.001 and epoke 10. When the clean clothing image data were tested, the classification accuracy increased to 76.67%. This finding confirmed that the classification accuracy increases when many clean clothing images are given to AlexNet as inputs. Therefore, our study results confirmed the possibility of a further segmentation of clothing and applications for other objects in the future, where GoogLeNet can also be used, which has better performance than AlexNet.

ACKNOWLEDGMENTS “This research was supported by the MISP (Ministry of Science, ICT & Future Planning), Korea, under the National Program for Excellence in SW (2017-0-00137) supervised by the IITP (Institute of Information & communications Technology Planing & Evaluation)” (2017-0-00137)

REFERENCES

[1] Z. Liu, P. Luo, S. Qiu, X. Wang, and X.Tang, “Deepfashion: Powering robust clothes recognition and retrieval with rich annotations,” in Proc. IEEE Conf. CVPR, pp.1096-1104, Jun. 2016. [2] Y. Ge, R. Zhang, X. Wang, X. Tang, and P.Luo, “DeepFashion2: A versatile benchmark for detection, pose estimation, segmentation and re-identification of clothing images,” in Proc. IEEE Conf. CVPR, pp. 5337-5345, Jun. 2019. [3] Silver, D. et al. “Mastering the game of Go with deep neural network and tree search,” Nature, vol. 529, no. 7287, pp. 484-489. [4] Vinyals, O., Toshev, A., Bengio, S., and Erhan, D. “Show and tell: A neural image caption generator,” In Proceedings of the IEEE Conference on Computer Vision and , 2015, pp. 31563164. [5] Szegedy, Christian, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich."Going deeper with ." In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.1-9. 2015. [6] David G. Lowe, “Three-Dimensional Object Recognition from Single Two Dimensional Images,” Artificial intelligence, 1987 [7] Itamar Arel, Derek C Rose, Thomas P Karnowski, “Research frontier: deep machine learning--a new frontier in artificial intelligence research,” IEEE Computational Intelligence Magazine, November 2010 [8] Alex Krizhevsky, , GeoffreyE.Hinton,."ImageNet Classification with Deep Convolutional Neural Networks.", COMMUNICATIONS OF THE ACM, JUNE 2017, VOL. 60, NO. 6 [9] Mohan Laavanya, Veeramani Vijayaraghavan, “Residual Learning of Transfer learned AlexNet for Image Denoising,” IEIE Transactions on Smart Processing and Computing, vol. 9, no. 2, April 2020 [10] Pan, S. J., and Yang, Q. “A survey on transfer learning,” IEEE Transactions on Knowledge and Data Engineering, vol. 22, no. 10, pp. 1345-1359, 2010. [11] Karl Weiss, Taghi M. Khoshgoftaar , DingDing Wang, ."A survey of transfer learning.", Joural of Big data, May, 2016 [12]https://kr.mathworks.com/help/deeplearning/examples/classify-images-from- webcam-using-deep-learning.html