Color-To-Grayscale: Does the Method Matter in Image Recognition?

Color-to-Grayscale: Does the Method Matter in Image Recognition? Christopher Kanan*, Garrison W. Cottrell Department of Computer Science and Engineering, University of California San Diego, La Jolla, California, United States of America Abstract In image recognition it is often assumed the method used to convert color images to grayscale has little impact on recognition performance. We compare thirteen different grayscale algorithms with four types of image descriptors and demonstrate that this assumption is wrong: not all color-to-grayscale algorithms work equally well, even when using descriptors that are robust to changes in illumination. These methods are tested using a modern descriptor-based image recognition framework, on face, object, and texture datasets, with relatively few training instances. We identify a simple method that generally works best for face and object recognition, and two that work well for recognizing textures. Citation: Kanan C, Cottrell GW (2012) Color-to-Grayscale: Does the Method Matter in Image Recognition? PLoS ONE 7(1): e29740. doi:10.1371/ journal.pone.0029740 Editor: Eshel Ben-Jacob, Tel Aviv University, Israel Received July 13, 2011; Accepted December 4, 2011; Published January 10, 2012 Copyright: ß 2012 Kanan, Cottrell. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: This work was supported by the James S. McDonnell Foundation (Perceptual Expertise Network, I. Gauthier, PI), and the National Science Foundation (NSF) (grant #SBE-0542013 to the Temporal Dynamics of Learning Center, G.W. Cottrell, PI). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: The authors have declared that no competing interests exist. * E-mail: [email protected] Introduction influences performance and if so, to identify which method is preferred regardless of the dataset or descriptor type. Modern descriptor-based image recognition systems often Our experiments are conducted with relatively few instances, operate on grayscale images, with little being said of the since classifier performance is much more sensitive to the quality mechanism used to convert from color-to-grayscale. This is of the descriptors in this setting [4]. One reason for this because most researchers assume that the color-to-grayscale phenomenon is that an image recognition system can obtain method is of little consequence when using robust descriptors. invariance properties simply by training it with more data, as long However, since many methods for converting to grayscale have as the additional data exhibits the same variation as the test set [5]. been employed in computer vision, we believe it is prudent to For many applications this is infeasible (e.g., automated surveil- assess whether this assumption is warranted. The most common lance systems for detecting suspected criminals) and it could techniques are based on weighted means of the red, green, and reduce execution speed for some non-parametric classification blue image channels (e.g., Intensity and Luminance), but some algorithms, e.g., nearest neighbor. If a descriptor is not suitably methods adopt alternative strategies to generate a more percep- robust when the size of the training set is small, the classifier may tually accurate representation (e.g., Luma and Lightness [1]) or to inappropriately separate the categories. We believe this is preserve subjectively appealing color contrast information in especially likely with large changes in illumination. grayscale images (e.g., Decolorize [2]). A priori, none of these Related work has shown that illumination conditions and criteria suggest superior recognition performance. camera parameters can greatly influence the properties of several The main reason why grayscale representations are often used recent image descriptor types [6]. This suggests grayscale for extracting descriptors instead of operating on color images algorithms that are less sensitive to illumination conditions fmay directly is that grayscale simplifies the algorithm and reduces exhibit superior performance when illumination is variable. To computational requirements. Indeed, color may be of limited our knowledge, this is the first time color-to-grayscale algorithms benefit in many applications and introducing unnecessary have been evaluated in a modern descriptor-based image information could increase the amount of training data required recognition framework on established benchmark datasets. to achieve good performance. In this paper we compare thirteen different methods for Methods converting from color-to-grayscale. While we do not evaluate every method that has been developed, we evaluate all of the Color-to-Grayscale Algorithms widely used methods, as well as some less well known techniques In this section we briefly describe thirteen methods with linear (e.g., Decolorize). All of the methods are computationally inexpen- time complexity for converting from color-to-grayscale, i.e., | | sive, i.e., they all have linear time complexity in the number of functions G that take a Rn m 3 color image and convert it to a | pixels. This comparison is performed using the Naive Bayes Rn m representation. All image values are assumed to be between Nearest Neighbor (NBNN) [3] image recognition framework and 0 and 1. Let R, G, and B represent linear (i.e., not gamma four different types of image descriptors. Our objective is to corrected) red, green, and blue channels. The output of each determine if the grayscale representation used significantly grayscale algorithm is between 0 and 1. Since some methods have PLoS ONE | www.plosone.org 1 January 2012 | Volume 7 | Issue 1 | e29740 Color-to-Grayscale: Does the Method Matter? names like Luster and Luminance, we denote all grayscale algorithms greater than gamma corrected Intensity, and it follows that by capitalizing the first letter and italicizing in the text. All ÀÁ transformations are applied component-wise, i.e., applied inde- GIntensityƒGGleamƒC GIntensity : pendently to each pixel. Several of the methods use the standard 1=2:2 gamma correction function CðÞt ~t’~t [7]. We denote When gamma corrected Intensity and Gleam are both applied to gamma corrected channels as R’, G’, and B’. The output of the natural images, we found that Gleam produces pixel values around grayscale algorithms on several images is shown in Fig. 1. 20–25% smaller on average. Perhaps the simplest color-to-grayscale algorithm is Intensity [1]. Unlike Intensity and Gleam, Luminance [8] is designed to match It is the mean of the RGB channels: human brightness perception by using a weighted combination of the RGB channels: 1 GIntensity/ ðÞRzGzB : ð1Þ 3 GLuminance/0:3Rz0:59Gz0:11B: ð3Þ Although Intensity is calculated using linear channels, in practice Luminance does not try to match the logarithmic nature of human gamma correction is often left intact when using datasets brightness perception, but this is achieved to an extent with containing gamma corrected images. We call this method Gleam: subsequent gamma correction. Luminance is the standard algorithm used by image processing software (e.g., GIMP). It is implemented 1 by MATLAB’s ‘‘rgb2gray’’ function, and it is frequently used in G ~ ðÞR’zG’zB’ : ð2Þ Gleam 3 computer vision (e.g. [9]). Luma is a similar gamma corrected form used in high-definition televisions (HDTVs) [1]: In terms of pixel values, Intensity and Gleam produce very different results. Since CðÞt is a concave function, Jensen’s inequality [7] z z implies that Gleam will never produce a representation with values GLuma/0:2126R’ 0:7152G’ 0:0722B’: ð4Þ Lightness is a perceptually uniform grayscale representation used in the CIELAB and CIELUV color spaces [10]. This means an increment in Lightness should more closely correspond to human perception, which is achieved via a nonlinear transformation of the RGB color space [10], 1 G / ðÞ116fYðÞ{16 , ð5Þ Lightness 100 where Y~0:2126Rz0:7152Gz0:0722B, and 8 <> t1=3 if twðÞ6=29 3 ftðÞ~ 1 29 2 4 ð6Þ :> tz otherwise: 3 6 29 We have normalized Lightness to range from 0 to 1, instead of the usual range of 0 to 100. The Lightness nonlinearity f (t) implements a form of gamma correction. Value is the achromatic channel in the Hue, Saturation, and Value (HSV) color space and it provides absolute brightness information. It is computed by taking the maximum of the RGB channels [10]: G ~maxðÞR,G,B : ð7Þ Figure 1. Qualitative comparison of color-to-grayscale algo- Value rithms. The four images shown are: (1) a panel of fully saturated colors; (2) Ishihara plate 3, in which a person with normal vision will see the Since gamma correction is a monotonically increasing function it number 29, while a person with red-green deficient vision may see the follows that, number 70; (3) a green shrub laden with red berries; and (4) a picture of the Pacific Ocean. All images are shown gamma corrected so that the ~ details are not excessively dark, except for Gleam, Luma, and Lightness. CðÞmaxðÞR,G,B maxðÞR’,G’,B’ : The color panel contains fully saturated colors, which Value, Intensity, and Luster convert to the same shade of gray; however, humans do not HSV is occasionally used in image recognition (e.g., [9,11,12]), but perceive these colors as having equivalent brightness which is a trait Value is equally sensitive to changes in the brightness of one color captured by Lightness and Luminance. Gleam, Intensity, Luminance, channel as it is to changes to all color channels, so we expect it to Lightness, and Decolorize all lose most of the chromatic contrast present perform poorly when significant brightness variation is present. in the Ishihara plate, while Luster, and Value preserve it. The same pattern of chromatic contrast degradation is present in the fruit image, Luster is the L channel in the HLS (Hue, Lightness, and with the fruit becoming much more difficult to distinguish from the Saturation) color space [1].

Load more