Arxiv:1810.03505V1 [Cs.CV] 2 Oct 2018

I V N E R U S E I T H Y T O H F G E R D I N B U School of Informatics, University of Edinburgh Institute for Adaptive and Neural Computation CINIC-10 Is Not ImageNet or CIFAR-10 by Luke N. Darlow Elliot J. Crowley School of Informatics School of Informatics University of Edinburgh University of Edinburgh [email protected] [email protected] Antreas Antoniou Amos J. Storkey School of Informatics School of Informatics University of Edinburgh University of Edinburgh [email protected] [email protected] arXiv:1810.03505v1 [cs.CV] 2 Oct 2018 Informatics Research Report EDI-INF-ANC-1802 School of Informatics September 2018 http://www.informatics.ed.ac.uk/ CINIC-10 Is Not ImageNet or CIFAR-10 Luke N. Darlow Elliot J. Crowley School of Informatics School of Informatics University of Edinburgh University of Edinburgh [email protected] [email protected] Antreas Antoniou Amos J. Storkey School of Informatics School of Informatics University of Edinburgh University of Edinburgh [email protected] [email protected] Abstract In this brief technical report we introduce the CINIC-10 dataset as a plug-in extended alternative for CIFAR-10. It was compiled by combining CIFAR-10 with images selected and downsampled from the ImageNet database. We present the approach to compiling the dataset, illustrate the example images for different classes, give pixel distributions for each part of the repository, and give some standard benchmarks for well known models. Details for download, usage, and compilation can be found in the associated github repository. 1. 1 Motivation Recent years have seen tremendous advances in the field of deep learning (LeCun et al., 2015). – Anonymous Author(s) Some derivation of the quote above may be familiar to many readers. Something similar appears at the beginning of numerous papers on deep learning. How might we assess statements like this? It is through benchmarking. AlexNet (Krizhevsky et al., 2012) outperformed traditional computer vision methods on ImageNet (Russakovsky et al., 2015), which was in turn outperformed by VGG nets (Simonyan & Zisserman, 2015), then ResNets (He et al., 2016) etc. ImageNet has its flaws however. It is an unwieldy dataset. The images are large, at least in neural network terms, and there are over a million of them. A single training run can take several days without abundant computational resources (Goyal et al., 2017). Perhaps for this reason, CIFAR-10 and CIFAR- 100 (Krizhevsky, 2009) have become the datasets of choice for many when initially benchmarking neural networks in the context of realistic images. Indeed, this is where several popular architectures have demonstrated their potency (Huang et al., 2017; Gastaldi, 2017). In CIFAR-10, each of the 10 classes has 6,000 examples. The 100 classes of CIFAR-100 only have 600 examples each. This leads to a large gap in difficulty between these tasks; CIFAR-100 is arguably more difficult than even ImageNet. A dataset that provides another milestone with respect to task difficulty would be useful. ImageNet-32 (Chrabaszcz et al., 2017) already exists as a CIFAR alternative; however, this actually poses a more challenging problem than ImageNet as the down-sampled images have substantially less capacity for information. Moreover, most benchmark datasets have uneven train/validation/test splits (validation being non-existent for CIFAR). Equally sized splits are desirable, as they give a more principled perspective of generalisation performance. 1https://github.com/BayesWatch/cinic-10 1 To combat the shortcomings of existing benchmarking datasets, we present CINIC-10: CINIC-10 Is Not ImageNet or CIFAR-10. It is an extension of CIFAR-10 via the addition of downsampled ImageNet images. CINIC-10 has the following desirable properties: • It has 270,000 images, 4:5× that of CIFAR. • The images are the same size as in CIFAR, meaning that CINIC-10 can be used as a drop-in alternative to CIFAR-10. • It has equally sized train, validation, and test splits. In some experimental setups it may be that more than one training dataset is required. Nonetheless, a fair assessment of generalisation performance is enabled through equal dataset split sizes. • The train and validation subsets can be combined to make a larger training set. • CINIC-10 consists of images from both CIFAR and ImageNet. The images from these are not necessarily identically distributed, presenting a new challenge: distribution shift. In other words, we can find out how well models trained on CIFAR images perform on ImageNet images for the same classes. 2 Construction In this section we outline our method of constructing the CINIC-10 dataset. This process is detailed with accompanying notebooks in the github repository (Darlow et al.). 1 Stage 1: Reformatting CIFAR-10 The original CIFAR-10 is processed into image format (.png) and stored as set/classname/cifar-10-origin-index where set is either train, validation or test, classname refers to the corresponding CIFAR-10 class (airplane, automobile etc. ), origin is the set from which the image was taken (train or test), and index is the original index of the image in the set it came from. This is an equal split of the CIFAR-10 data: 20,000 images per set; 2,000 images per class within set; and an equal distribution of CIFAR-10 data among all three sets. The CINIC-10 test set contains the entirety of the original CIFAR-10 test set, as well as randomly selected examples from the CIFAR-10 train set. CINIC-10’s train and validation contain a random split of the remaining CIFAR images. CIFAR-10 can be fully recovered from CINIC-10 by the filename. Stage 2: Finding relevant ImageNet images The relevant synonym sets (synsets) within the Fall 2011 release of the ImageNet Database were identified and collected. These synset-groups are listed in synsets- to-cifar-10-classes.txt in the repository. The mapping from sysnsets to CINIC-10 is listed in imagenet- contributors.csv in the repository. These synsets were downloaded using Imagenet Utils.2 Note that some .tar downloads failed (with a 0 Byte download) even after repeated retries. This is not exceedingly detrimental as a subset of the downloaded images was taken. Stage 3: Adding ImageNet The .tar files were extracted, the .JPEG images were read using the Pillow Python library3, and converted to 32×32 colour pixel images with the Box algorithm from the Pillow library (in the same manner as ImageNet32x32 (Chrabaszcz et al., 2017), for consistency). The lowest number of CIFAR10 class-relevant samples from these ImageNnet synset-groups samples was observed to be 21939 in the ‘truck’ class. Therefore, 21000 samples were randomly selected from each synset-group to compile CINIC-10 by augmenting the CIFAR-10 data. Finally, these 21000 samples were randomly distributed (but can be recovered using the filename) within the new train, validation, and test sets, storing as follows: set/classname/synsetnumber.png, where set is either train, valid or test. classname refers to the CIFAR-10 classes (airplane, automobile, etc.). synset indicates which ImageNet synset this image came from and number is the image number directly associated with the original downloaded .jpeg images. This process resulted is a dataset that consists of 270000 images (60000 from the original CIFAR-10 data and the remaining from ImageNet), split into three equal-sized train, validation, and test subsets. Thus, each class within these subsets contains 9000 images. 2https://github.com/tzutalin/ImageNet_Utils 3https://python-pillow.org/ 2 3 Download and Usage Details for download can be found at the associated repository.1 The dataset is hosted on the University of Edinburgh digital repository of research data, DataShare. 4 The simplest way to use CINIC-10 is with a PyTorch5 data loader: import torchvision import torchvision.transforms as transforms cinic_directory =’/path/to/cinic/directory’ cinic_mean = [0.47889522, 0.47227842, 0.43047404] cinic_std = [0.24205776, 0.23828046, 0.25874835] cinic_train = torch.utils.data.DataLoader( torchvision.datasets.ImageFolder(cinic_directory +’/train’, transform=transforms.Compose([transforms.ToTensor(), transforms.Normalize(mean=cinic_mean,std=cinic_std)])), batch_size=128, shuffle=True) 4 Analysis This section shows the difference in distribution (Section 4.1) and gives examples for each class in CINIC-10 (Section 4.2). 4.1 Distribution The distribution of colour intensities for CIFAR-10 and ImageNet contributors is given in Figure 1. 0.010 CIFAR-10 ImageNet contributors 0.008 0.006 Frequency 0.004 0.002 0.000 0 50 100 150 200 250 Colour intensity Figure 1: Histograms of the intensities (all red, green, and blue channels inclusive) of CIFAR-10 (blue) and ImageNet contributions (red). The mean is shown as the dashed lines. The colour distribution is very similar, although not identical. 4https://datashare.is.ed.ac.uk/ 5https://pytorch.org/ 3 4.2 Examples Figures 2 through 11 show randomly selected samples from CINIC-10. Readers familiar with CIFAR-10 images will note that these are more noisy, not necessarily explicitly cropped and centred, and may contain elements that are less class-distinct as CIFAR-10 (cows and goats in the deer class, for instance). Figure 2: CINIC-10, automobile Figure 3: CINIC-10, airplane 5 Benchmarks Table 1 gives benchmark results on CINIC-10. Model definitions that were applied to CIFAR-106 were copied and tested. 6https://github.com/kuangliu/pytorch-cifar/ 4 Figure 4: CINIC-10, bird Figure 5: CINIC-10, cat Table 1: CINIC-10 benchmarks. Model No. Parameters Test Error VGG-16 14.7M 12.23 ± 0.16 ResNet-18 11.2M 9.73 ± 0.05 GoogLeNet 6.2M 8.83 ± 0.12 ResNeXt29_2x64d 9.2M 8.55 ± 0.15 DenseNet-121 7.0M 8.74 ± 0.16 MobileNet 3.2M 18.00 ± 0.16 5 Figure 6: CINIC-10, deer Figure 7: CINIC-10, dog 6 Figure 8: CINIC-10, frog Figure 9: CINIC-10, horse 7 Figure 10: CINIC-10, ship Figure 11: CINIC-10, truck 8 6 Conclusion We presented CINIC-10 in this technical report.

Arxiv:1810.03505V1 [Cs.CV] 2 Oct 2018

The Deep Learning Revolution and Its Implications for Computer Architecture and Chip Design

Lecture 1: Introduction

Lecture Notes Geoffrey Hinton

An Effective and Improved CNN-ELM Classifier for Handwritten Digits

Deep Learning for Computer Vision

Biologically Inspired Semantic Lateral Connectivity for Convolutional Neural Networks

Computer Vision and Its Implications Based on the Paper Imagenet Classification with Deep Convolutional Neural Networks by Alex Krizhevsky, Ilya Sutskever, Geoffrey E

A Timeline of Artificial Intelligence

Training Recurrent Neural Networks

Deep Convolutional Networks

Classification of Image Using Convolutional Neural Network (CNN) by Md

Working Hard to Know Your Neighbor's Margins: Local Descriptor Learning