CNN Model for Image Classification on MNIST and Fashion-MNIST Dataset
Total Page:16
File Type:pdf, Size:1020Kb
Volume 64, Issue 2, 2020 Journal of Scientific Research Institute of Science, Banaras Hindu University, Varanasi, India. CNN Model for Image Classification on MNIST and Fashion-MNIST Dataset Shivam S. Kadam, Amol C. Adamuthe*, and Ashwini B. Patil Dept. of CS & IT, RIT, Rajaramnagar, Sangli, MS, India. [email protected], [email protected]*, [email protected] Abstract: Paper presents application of convolutional neural flatten it to pass on to the output layer, which will make the network for image classification problem. MNIST and Fashion- prediction. Due to such architecture, it will take fewer parameter MNIST datasets used to test the performance of CNN model. Paper to learn and reduce the amount of data required to train the presents five different architectures with varying convolutional model. CNNs inspired by time-delay neural networks in which layers, filter size and fully connected layers. Experiments conducted the weights are shared in temporal dimension for reduction in with varying hyper-parameters namely activation function, optimizer, learning rate, dropout rate and batch size. Results show computation. Successful training of the hierarchical layers leads that selection of activation function, optimizer and dropout rate has CNN as the first successful deep learning architecture proved by impact on accuracy of results. All architectures give accuracy more Liu et al. (2017). CNN has shown excellent performance for than 99% for MNIST dataset. Fashion-MNIST dataset is complex machine learning applications. Due to practical benefits of CNN, than MNIST. For Fashion-MNIST dataset architecture 3 gives it gives better accuracy and improve the performance of system better results. Review of obtained results and from literature shows in applications like image classification, computer vision, natural that CNN is suitable for image classification for MNIST and language processing (Albawi et al., 2017), age and gender Fashion-MNIST dataset. classification (Levi & Hassner, 2015), text classification (Lai et al. 2015), facial expression recognition (Mollahosseini et al., Index Terms: Convolutional neural network, fashion-MNIST, image classification, MNIST. 2016), speech recognition (Abdel-Hamid et al., 2014), visual document analysis (Simard et al., 2003). Image classification means assigning the label to input image I. INTRODUCTION from fixed set of categories. The image classification includes Albawi et al. (2017) state that Convolutional neural network variety of application like designing robots, objects (CNN) is one of the most popular deep neural networks. O'Shea identifications, automatic cars, traffic signal processing. Feature & Nash (2015) said convolutional neural network (CNN) is extraction is more important task for the image representation in similar to artificial neural network (ANN) which is comprised of classification problem. CNN applied on large-scale datasets to neurons that self-optimize through learning. CNN has multiple learn images representation and reuse it for the classification. It layers namely convolutional layer, pooling layer and fully gives better accuracy and performance in image classification connected layer. The main objective of convolutional layer is to problem as compared to other Neural Network on classifying the obtain the features of an image by sliding smaller matrix (kernel CIFAR-10 dataset proved by Wang & Xi (2015). or filter) over the entire image and generate the feature maps. Objective of this paper is to apply convolutional neural The pooling layer used to retain the most important aspect by network for image classification problem. Five different reducing the feature maps. Fully connected layer interconnect architectures with varying convolutional layers, filter size and every neuron in the layer to the neurons from the previous and fully connected layers are proposed. To test the performance of next layer, to take the matrix inputs from the previous layers and CNN we have used MNIST and Fashion-MNIST datasets. The rest of this paper is organized as follows. Section II is * Corresponding Author literature review of different machine learning algorithms for DOI: 10.37398/JSR.2020.640251 374 Journal of Scientific Research, Volume 64, Issue 2, 2020 Fashion-MNIST and MNIST. Section III demonstrated CNN of (Fashion-MNIST, CIFAR-10 and ILSVRC-2012). Results show image classification. Experimental details, datasets, results and substantial improvements in the overall performance, such as an discussion are presented in section IV. Section V presents increase in the top-1 accuracy for AlexNet on ILSVRC-2012 of conclusions. 3.01 percentage points. Tang, Y. (2013), proved that the replacement of softmax layer with SVMs is useful for II. LITERATURE REVIEW classification tasks. The experimentation carried out on MNIST In literature, many techniques have experimented for image and CIFAR-10 datasets. Using SVM, cross validation accuracy classification of Fashion-MNIST and MNIST. for testing phase is increased up to 68.9% which was 67.6% Levi & Hassner (2015), proposed CNN architecture to using softmax. Results of RNN for classification of Fashion- overcome the overfitting problem due to less number of images. MNIST dataset presented by Zhang. This model developed using Result shows that CNN provide improved gender and age Long-Short Term Memory technique to reduce the risk of classification result even in case of smaller size of contemporary gradient vanishing the traditional RNN faces. Cross validation unconstrained image sets. For large-scale image classification detects and prevents overfitting and decrease of score caused due using CNN with dataset of 1 million YouTube videos belonging to overfitting. The proposed model achieved accuracy more to 487 classes is experimented by Karpathy et al. (2014). than 89% which is comparatively better than other models. Xiao Authors found that CNN learn the powerful features even from et al. (2017), presented accuracy of 89.70% and 97.30% for weakly-labeled data. The performance of proposed model fashion-MNIST dataset and MNIST dataset respectively using compared using UCF-101 Action Recognition dataset. SVC classifier. Significant performance improvement up to 63.3 % is presented. Bhatnagar et al. (2017) proposed three different CNN Jmour et al. (2018) trained the CNN on ImageNet dataset for architecture for solving Fashion-MNIST dataset, for traffic sign classification system. CNN is used to learn features classification of fashion article images. Authors obtained an and classify RGB-D images task. Various parameter has effect accuracy of 92.54% by using a two layer CNN along with batch on accuracy of training results. Authors presented the effect of normalization and skip connections. The architecture proposed mini-batch-size on training model. The model given 93.33% by Tang (2013) was emulated by Agarap (2017), by combining accuracy on the test set with a minimum batch size of 10. CNN the convolutional neural network (CNN) and linear SVM for for document image classification is presented in paper by Kang image classification on MNIST dataset. The result shows that et al. (2014). Authors used CNN with rectified linear units and the test accuracy of CNN-SVM model and CNN-Softmax for trained with dropout. Results obtained using CNN are better than MNIST dataset is approximately 99.04% and 99.23% traditional methods. Tobacco litigation and NIST tax-form respectively. For Fashion-MNIST dataset, CNN-SVM and dataset are used for experimentation. For Tobacco dataset, 80% CNN-Softmax given approximately 90.72% and 91.86% of accuracy is achieved for training and 20% for validation for respectively. The random erasing method for training the 10 classes of images. The median accuracy of 65.37% is convolutional neural network (CNN) is used in paper presented achieved for 100 samples. A median accuracy of 100% is by Zhong et al. (2017), which randomly selects a rectangle achieved through 100 partitions of training and test on NIST tax- region in an image and erases its pixels with random values. In form dataset. CNN is used for house number digit classification this process, training images with various levels of occlusion are in paper published by Sermanet et al. (2012). They improved the generated, which reduces the risk of over-fitting and makes the traditional CNN by multistage features and use of Lp pooling model robust to occlusion. Experiment conducted on CIFAR10, method. Obtained accuracy of 94.85% using SVHN dataset for CIFAR100, and Fashion-MNIST datasets, with various 45.2% error improvement. architectures, with good performance on various recognition Hu et al. (2015) proposed , five-layer CNN architecture is tasks. ReLU is used as classification function in a deep neural experimented on several hyperspectral image data sets to network instead of activation function by Agarap (2018). He classify hyperspectral images directly in spectral domain, which implemented feed-forward neural network (FFNN) and given better performance. CNN framework such as Caffe used to convolutional models with two different classification functions, reduce training and testing time and achieved 90% accuracy on i.e. (1) softmax, and (2) ReLU. The predictive performance of MNIST dataset. Manessi & Rozza (2018), has introduced two DL-ReLU models is compared with DL-Softmax models on approaches to learn the different combination of base activation MNIST, Fashion- MNIST and Wisconsin Diagnostic Breast function namely identity function, ReLU and tanh.