I V N E R U S E I T H Y

T

O

H F G E R D I N B U School of Informatics, University of Edinburgh

Institute for Adaptive and Neural Computation

CINIC-10 Is Not ImageNet or CIFAR-10

by

Luke N. Darlow Elliot J. Crowley School of Informatics School of Informatics University of Edinburgh University of Edinburgh [email protected] [email protected]

Antreas Antoniou Amos J. Storkey School of Informatics School of Informatics University of Edinburgh University of Edinburgh [email protected] [email protected] arXiv:1810.03505v1 [cs.CV] 2 Oct 2018

Informatics Research Report EDI-INF-ANC-1802

School of Informatics September 2018 http://www.informatics.ed.ac.uk/ CINIC-10 Is Not ImageNet or CIFAR-10

Luke N. Darlow Elliot J. Crowley School of Informatics School of Informatics University of Edinburgh University of Edinburgh [email protected] [email protected]

Antreas Antoniou Amos J. Storkey School of Informatics School of Informatics University of Edinburgh University of Edinburgh [email protected] [email protected]

Abstract

In this brief technical report we introduce the CINIC-10 dataset as a plug-in extended alternative for CIFAR-10. It was compiled by combining CIFAR-10 with images selected and downsampled from the ImageNet database. We present the approach to compiling the dataset, illustrate the example images for different classes, give pixel distributions for each part of the repository, and give some standard benchmarks for well known models. Details for download, usage, and compilation can be found in the associated github repository. 1.

1 Motivation

Recent years have seen tremendous advances in the field of (LeCun et al., 2015). – Anonymous Author(s)

Some derivation of the quote above may be familiar to many readers. Something similar appears at the beginning of numerous papers on deep learning. How might we assess statements like this? It is through benchmarking. AlexNet (Krizhevsky et al., 2012) outperformed traditional methods on ImageNet (Russakovsky et al., 2015), which was in turn outperformed by VGG nets (Simonyan & Zisserman, 2015), then ResNets (He et al., 2016) etc. ImageNet has its flaws however. It is an unwieldy dataset. The images are large, at least in neural network terms, and there are over a million of them. A single training run can take several days without abundant computational resources (Goyal et al., 2017). Perhaps for this reason, CIFAR-10 and CIFAR- 100 (Krizhevsky, 2009) have become the datasets of choice for many when initially benchmarking neural networks in the context of realistic images. Indeed, this is where several popular architectures have demonstrated their potency (Huang et al., 2017; Gastaldi, 2017). In CIFAR-10, each of the 10 classes has 6,000 examples. The 100 classes of CIFAR-100 only have 600 examples each. This leads to a large gap in difficulty between these tasks; CIFAR-100 is arguably more difficult than even ImageNet. A dataset that provides another milestone with respect to task difficulty would be useful. ImageNet-32 (Chrabaszcz et al., 2017) already exists as a CIFAR alternative; however, this actually poses a more challenging problem than ImageNet as the down-sampled images have substantially less capacity for information. Moreover, most benchmark datasets have uneven train/validation/test splits (validation being non-existent for CIFAR). Equally sized splits are desirable, as they give a more principled perspective of generalisation performance.

1https://github.com/BayesWatch/cinic-10

1 To combat the shortcomings of existing benchmarking datasets, we present CINIC-10: CINIC-10 Is Not ImageNet or CIFAR-10. It is an extension of CIFAR-10 via the addition of downsampled ImageNet images. CINIC-10 has the following desirable properties:

• It has 270,000 images, 4.5× that of CIFAR. • The images are the same size as in CIFAR, meaning that CINIC-10 can be used as a drop-in alternative to CIFAR-10. • It has equally sized train, validation, and test splits. In some experimental setups it may be that more than one training dataset is required. Nonetheless, a fair assessment of generalisation performance is enabled through equal dataset split sizes. • The train and validation subsets can be combined to make a larger training set. • CINIC-10 consists of images from both CIFAR and ImageNet. The images from these are not necessarily identically distributed, presenting a new challenge: distribution shift. In other words, we can find out how well models trained on CIFAR images perform on ImageNet images for the same classes.

2 Construction

In this section we outline our method of constructing the CINIC-10 dataset. This process is detailed with accompanying notebooks in the github repository (Darlow et al.). 1

Stage 1: Reformatting CIFAR-10 The original CIFAR-10 is processed into image format (.png) and stored as set/classname/cifar-10-origin-index where set is either train, validation or test, classname refers to the corresponding CIFAR-10 class (airplane, automobile etc. ), origin is the set from which the image was taken (train or test), and index is the original index of the image in the set it came from. This is an equal split of the CIFAR-10 data: 20,000 images per set; 2,000 images per class within set; and an equal distribution of CIFAR-10 data among all three sets. The CINIC-10 test set contains the entirety of the original CIFAR-10 test set, as well as randomly selected examples from the CIFAR-10 train set. CINIC-10’s train and validation contain a random split of the remaining CIFAR images. CIFAR-10 can be fully recovered from CINIC-10 by the filename.

Stage 2: Finding relevant ImageNet images The relevant synonym sets (synsets) within the Fall 2011 release of the ImageNet Database were identified and collected. These synset-groups are listed in synsets- to-cifar-10-classes.txt in the repository. The mapping from sysnsets to CINIC-10 is listed in imagenet- contributors.csv in the repository. These synsets were downloaded using Imagenet Utils.2 Note that some .tar downloads failed (with a 0 Byte download) even after repeated retries. This is not exceedingly detrimental as a subset of the downloaded images was taken.

Stage 3: Adding ImageNet The .tar files were extracted, the .JPEG images were read using the Pillow Python library3, and converted to 32×32 colour pixel images with the Box algorithm from the Pillow library (in the same manner as ImageNet32x32 (Chrabaszcz et al., 2017), for consistency). The lowest number of CIFAR10 class-relevant samples from these ImageNnet synset-groups samples was observed to be 21939 in the ‘truck’ class. Therefore, 21000 samples were randomly selected from each synset-group to compile CINIC-10 by augmenting the CIFAR-10 data. Finally, these 21000 samples were randomly distributed (but can be recovered using the filename) within the new train, validation, and test sets, storing as follows: set/classname/synsetnumber.png, where set is either train, valid or test. classname refers to the CIFAR-10 classes (airplane, automobile, etc.). synset indicates which ImageNet synset this image came from and number is the image number directly associated with the original downloaded .jpeg images. This process resulted is a dataset that consists of 270000 images (60000 from the original CIFAR-10 data and the remaining from ImageNet), split into three equal-sized train, validation, and test subsets. Thus, each class within these subsets contains 9000 images.

2https://github.com/tzutalin/ImageNet_Utils 3https://python-pillow.org/

2 3 Download and Usage

Details for download can be found at the associated repository.1 The dataset is hosted on the University of Edinburgh digital repository of research data, DataShare. 4 The simplest way to use CINIC-10 is with a PyTorch5 data loader: import torchvision import torchvision.transforms as transforms cinic_directory =’/path/to/cinic/directory’ cinic_mean = [0.47889522, 0.47227842, 0.43047404] cinic_std = [0.24205776, 0.23828046, 0.25874835] cinic_train = torch.utils.data.DataLoader( torchvision.datasets.ImageFolder(cinic_directory +’/train’, transform=transforms.Compose([transforms.ToTensor(), transforms.Normalize(mean=cinic_mean,std=cinic_std)])), batch_size=128, shuffle=True)

4 Analysis

This section shows the difference in distribution (Section 4.1) and gives examples for each class in CINIC-10 (Section 4.2).

4.1 Distribution

The distribution of colour intensities for CIFAR-10 and ImageNet contributors is given in Figure 1.

0.010 CIFAR-10 ImageNet contributors 0.008

0.006

Frequency 0.004

0.002

0.000 0 50 100 150 200 250 Colour intensity Figure 1: Histograms of the intensities (all red, green, and blue channels inclusive) of CIFAR-10 (blue) and ImageNet contributions (red). The mean is shown as the dashed lines. The colour distribution is very similar, although not identical.

4https://datashare.is.ed.ac.uk/ 5https://pytorch.org/

3 4.2 Examples

Figures 2 through 11 show randomly selected samples from CINIC-10. Readers familiar with CIFAR-10 images will note that these are more noisy, not necessarily explicitly cropped and centred, and may contain elements that are less class-distinct as CIFAR-10 (cows and goats in the deer class, for instance).

Figure 2: CINIC-10, automobile

Figure 3: CINIC-10, airplane

5 Benchmarks

Table 1 gives benchmark results on CINIC-10. Model definitions that were applied to CIFAR-106 were copied and tested.

6https://github.com/kuangliu/pytorch-cifar/

4 Figure 4: CINIC-10, bird

Figure 5: CINIC-10, cat

Table 1: CINIC-10 benchmarks. Model No. Parameters Test Error VGG-16 14.7M 12.23 ± 0.16 ResNet-18 11.2M 9.73 ± 0.05 GoogLeNet 6.2M 8.83 ± 0.12 ResNeXt29_2x64d 9.2M 8.55 ± 0.15 DenseNet-121 7.0M 8.74 ± 0.16 MobileNet 3.2M 18.00 ± 0.16

5 Figure 6: CINIC-10, deer

Figure 7: CINIC-10, dog

6 Figure 8: CINIC-10, frog

Figure 9: CINIC-10, horse

7 Figure 10: CINIC-10, ship

Figure 11: CINIC-10, truck

8 6 Conclusion

We presented CINIC-10 in this technical report. It was compiled by augmenting CIFAR-10 with downsam- pled images sourced from ImageNet. We proposed that benchmarking is paramount in and that there is a blind-spot regarding dataset size and task difficulty. To this end we offer a new benchmarking dataset that is larger than CIFAR-10 (and more challenging) but not as difficult as ImageNet.

References Patryk Chrabaszcz, Ilya Loshchilov, and Hutter Frank. A downsampled variant of ImageNet as an alternative to the CIFAR datasets. arXiv preprint arXiv:1707.08819, 2017. Luke N. Darlow, Elliot J. Crowley, Antreas Antoniou, and Amos J. Storkey. CINIC-10 Github repository. URL https://github.com/BayesWatch/cinic-10. Xavier Gastaldi. Shake-shake regularization. arXiv preprint arXiv:1705.07485, 2017. P. Goyal, P. Dollár, R. Girshick, P. Noordhuis, L. Wesolowski, A. Kyrola, A. Tulloch, Y. Jia, and K. He. Accurate, large minibatch SGD: Training ImageNet in 1 hour. arXiv preprint arXiv:1706.02677, 2017. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016. G. Huang, Z. Liu, L. van der Maaten, and K. Q. Weinberger. Densely connected convolutional networks. 2017. Alex Krizhevsky. Learning multiple layers of features from tiny images. Master’s thesis, Toronto University, 2009. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems, 2012. Yann LeCun, Yoshua Bengio, and . Deep learning. Nature, 521(7553):436–444, 2015. Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. ImageNet large scale visual recognition challenge. International Journal of Computer Vision (IJCV), 115(3):211–252, 2015. doi: 10.1007/s11263-015-0816-y. Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recogni- tion. In International Conference on Learning Representations, 2015.

9