
Study of the Impact of Standard Image Compression Techniques on Performance of Image Classification with a Convolutional Neural Network Mathieu Dejean-Servières, Karol Desnos, Kamel Abdelouahab, Wassim Hamidouche, Luce Morin, Maxime Pelcat To cite this version: Mathieu Dejean-Servières, Karol Desnos, Kamel Abdelouahab, Wassim Hamidouche, Luce Morin, et al.. Study of the Impact of Standard Image Compression Techniques on Performance of Image Classification with a Convolutional Neural Network. [Research Report] INSA Rennes; Univ Rennes; IETR; Institut Pascal. 2017. hal-01725126 HAL Id: hal-01725126 https://hal.archives-ouvertes.fr/hal-01725126 Submitted on 7 Mar 2018 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Study of the Impact of Standard Image Compression Techniques on Performance of Image Classification with a Convolutional Neural Network Internship Report, IETR-Vaader 2017 Mathieu Dejean-Servieres` 1, Karol Desnos1, Kamel Abdelouahab2, Wassim Hamidouche1, Luce Morin1, and Maxime Pelcat1,2 1IETR, INSA Rennes, CNRS UMR 6164, UBL, Rennes, France 2Institut Pascal, Clermont Ferrand, France December 20, 2017 Version v1.0 Abstract In this study, we have measured the impact of image compression on the classification performance of Convolutional Neural Networks (CNNs). By using a pre-trained CNN to classify compressed images, we have shown that on average, an image can be compressed by a factor 7, 16, 40 for a JPEG, JPEG200 and an HEVC encoder, respectively, while still maintaining a correct classification by the CNN. This study also showed that pre- trained AlexNet CNN was making use of JPEG artifacts learned during the training phase to perform classification. To further study the impact of compression on CNN-based classification, a large set of encoding parameters was explored: color-space, resolution, Quantization Parameter (QP). Main conclusions of this study are that color is essential for classification with AlexNet CNN, and that classification is resilient to image downscaling. Finally, we have studied the correlation between classification performance of a CNN and image quality measured with two objective metrics, namely the Peak Signal to Noise Ratio (PSNR) and the Structural SIMilarity (SSIM). We have found that the SSIM met- rics was more appropriate to measure the degradation of an image with regards the CNN performance. 1 Introduction Recent victory of the AlphaGo program over a professional human Go player has thrown light on the tremendous capabilities of machine learning techniques. In the past few years, machine learning techniques based on artificial neural network have been used to address many technological challenges of our society, like autonomous driving vehicle [12], text recognition [9], or medical diagnosis [10]. Convolutional Neural Networks (CNNs), studied in this report, are a kind of artificial neural networks inspired by the human visual cortex. The main purpose of CNNs is to recognize and classify shapes (objects, characters, animals, . ) in processed 2D images. In parallel, the era of Internet-of-Things (IoT) has accelerated the trend towards a world where a network of distributed objects and smart sensors communicate to provide new kinds of services. In particular, combining IoT objects with machine learning techniques paves the way towards futuristic applications like smart cities, home automation, and e-health. Because of the distributed and low-power nature of IoT sensors, processing of acquired data is likely to be offloaded on a remote server where more computational power is available. Hence, acquired data should be compressed before transmission in order to save bandwidth and power. Smart cameras will thus rely on standard image and video compression algorithm (e.g. Joint Photographic Experts Group (JPEG), High Efficiency Video Coding (HEVC), . ) to transmit images to be processed by CNNs. One of the basic principles of these compression algorithms is to compress images by introducing an acceptable loss of image quality in order to transmit less information. Figure 1: Classification results of an image with different compression ratio. The objective of this report is to study the impact of quality degradation induced by com- pression of images on the performance of their classification with a CNN. Figure 1 illustrates this problem with the example of an image compressed with three different parameters, lead- ing to different loss of image quality. As can be seen in this example, classification of the image as a war plane is successful for the first two qualities, but fails for the most degraded image. The steps followed in this study are: 1. Creation of a database of degraded images by applying standard image/compression algorithm to a database of images. 2. Perform classification of the degraded images with pre-trained State-of-the-Art CNNs. 3. Study the influence of compression algorithms on classification performance and model the impact of compression parameters on the efficiency of the classification. 1 The main objective of this work is to identify which parameters can be used to compress as much as possible images, without losing the efficiency of the classification with a pre- trained CNN. The report is organized as follows. Section 2 introduces the working principle of CNNs and image compression techniques. Section 3 explains our approach to build the database of compressed images and Section 4 analyses of the classification of the compressed images with a CNN. Finally, Section 5 concludes this reports. 2 Background 2.1 Convolutional Neural Network (CNN) Deep learning [8] is a machine learning technique whose principle is to teach a task to computers by feeding the algorithm with many input examples and associated "correct" outputs. To be concise, the Artificial Intelligence (AI) will adapt its behavior depending on the data it is fed with, and will thus "learn by itself". In particular, it is important to note that no human intervention is needed to develop and adjust his algorithm. Deep learning models are trained by using large sets of labeled data and neural network architectures that learn features directly from the data without the need for manual feature extraction. Deep learning computational model is inspired by how the biological neurons work. As illustrated in Figure 2, a neural network is a directed graph where vertices represent the neurons and edges represents channels for transmitting signals between neurons. A neuron usually implements a simple function applied to its input signals to produce an output. Neural networks are usually organized into layers of neurons, where all neurons belonging to layer n only receive signals from neurons of layer n − 1, and produces signals processed by neurons of layer n + 1. The first and the last layers of a neural network are called the input and output layers, respectively. Layers between the input and output layers are called the hidden layers of the network. The term \deep" usually refers to the important number of hidden layers in a deep learning neural network. Figure 2: Neural networks, which are organized in layers consisting of a set of interconnected nodes. Networks can have tens or hundreds of hidden layers. An image classification algorithm is an algorithm whose purpose is to automatically identify the content of an image among a predefined set of image classes. For example, in 2 Figure 1, the first two images were identified as belonging to the class war plane, and the third image was associated to the dust cover class. A Convolutional Neural Network (CNN) [7] is a specialized neural network model in- spired by the human visual cortex, specially fitted for image classification. Intuitively, a CNNs alternates convolutional and pooling layers, each processing and producing 2D im- ages. In convolutional layers, neurons produce 2D images by convolving learned 2D features, with input 2D images. In pooling layers, 2D images produced by multiple neurons from the previous layer are mixed, usually using a non-linear function, and the resulting 2D im- age may be downsampled. Connections between layers can be sparse, typically near input layers, or fully connected, typically near output layers. The convoluted features are not pre-trained, they are learned while the network trains on a collection of images. This au- tomated feature extraction makes CNNs highly accurate for computer vision tasks such as image classification [6, 8]. As shown in [15] some invisible perturbations, for human observers, can prevent the image from being correctly classified by a CNN. A special technique, presented in [11], was specifically developed to produce the smallest perturbations to fool such network. Hence, the question of the resilience of CNNs to artifacts introduced during the compression of the classified image is not trivial. Indeed, although image compression artifact aim at preserving the quality of images for human subjects, they may still introduce imperceptible artifacts that would undermine the efficiency of CNNs. 2.2 Image compression 2.2.1 Used Encoders In this report, the impact and performances of three different encoders are studied: JPEG, JPEG2000 and Better Portable Graphics (BPG). JPEG JPEG [16] is one of the most commonly used algorithm for lossy image compression. In this study, JPEG will therefore be the reference to compare the performance of encoders. JPEG can typically achieve a compression ratio (see Section 2.2.2) of 10 with smalls perceptible loss in image quality [3]. The algorithm is particularly efficient on images with smooth variations of color, like photographies. JPEG uses a lossy form of compression based on a decomposition on the encoded image into 8x8 pixels blocks, and the application of a Discrete Consine Transform (DCT) to each block of pixels.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages21 Page
-
File Size-