
J Real-Time Image Proc (2015) 10:329–344 DOI 10.1007/s11554-012-0291-4 SPECIAL ISSUE An effective real-time color quantization method based on divisive hierarchical clustering M. Emre Celebi • Quan Wen • Sae Hwang Received: 1 August 2012 / Accepted: 17 October 2012 / Published online: 6 November 2012 Ó Springer-Verlag Berlin Heidelberg 2012 Abstract Color quantization (CQ) is an important oper- 1 Introduction ation with many applications in graphics and image pro- cessing. Clustering algorithms have been extensively True-color images typically contain thousands of colors, applied to this problem. In this paper, we propose a simple which makes their display, storage, transmission, and yet effective CQ method based on divisive hierarchical processing problematic. For this reason, CQ is commonly clustering. Our method utilizes the commonly used binary used as a preprocessing step for various graphics and image splitting strategy along with several carefully selected processing tasks. In the past, CQ was a necessity due to the heuristics that ensure a good balance between effectiveness limitations of the display hardware, which could not handle and efficiency. We also propose a slightly computationally over 16 million possible colors in 24-bit images. Although expensive variant of this method that employs local opti- 24-bit display hardware has become more common, CQ mization using the Lloyd–Max algorithm. Experiments on still maintains its practical value [6]. Modern applications a diverse set of publicly available images demonstrate that of CQ in graphics and image processing include: (1) the proposed method outperforms some of the most popular compression [63], (2) segmentation [17], (3) text locali- quantizers in the literature. zation/detection [48], (4) color-texture analysis [47], (5) watermarking [35], (6) non-photorealistic rendering [55], Keywords Color quantization Á Clustering Á and (7) content-based retrieval [18]. Divisive hierarchical clustering The process of CQ is mainly comprised of two phases: palette design (the selection of a small set of colors that represents the original image colors) and pixel mapping (the assignment of each input pixel to one of the palette colors). The primary objective is to reduce the number of unique colors, N0, in an image to K (K N0) with minimal distortion. In most applications, 24-bit pixels in the original M. E. Celebi (&) image are reduced to 8 bits or fewer. Since natural images Department of Computer Science, Louisiana State University, often contain a large number of colors, faithful represen- Shreveport, LA, USA e-mail: [email protected] tation of these images with a limited size palette is a dif- ficult problem. Q. Wen CQ methods can be broadly classified into two catego- School of Computer Science and Engineering, ries [60]: image-independent methods that determine a University of Electronic Science and Technology of China, Chengdu, People’s Republic of China universal (fixed) palette without regard to any specific e-mail: [email protected] image [21, 39], and image-dependent methods that deter- mine a custom (adaptive) palette based on the color S. Hwang distribution of the images. Despite being very fast, image- Department of Computer Science, University of Illinois, Springfield, IL, USA independent methods usually give poor results since they e-mail: [email protected] do not take into account the image contents. Therefore, 123 330 J Real-Time Image Proc (2015) 10:329–344 most of the studies in the literature consider only image- agglomerative. Since agglomerative methods typically dependent methods, which strive to achieve a better bal- have at least quadratic time complexity, most of the ance between computational efficiency and visual quality existing preclustering methods are of divisive type. A of the quantization output. divisive algorithm partitions the 3-dimensional color space Numerous image-dependent CQ methods have been of the input image into K subspaces using K - 1 planes developed over the past three decades. These can be cat- each of which is uniquely defined by a normal vector and a egorized into two families: preclustering (hierarchical point. The main heuristics used by divisive algorithms are clustering) methods and postclustering (partitional clus- the following [6]: tering) methods [6]. The former methods recursively find 1. Selection of a splitting strategy: Following tree nested clusters either in a top-down (divisive) or bottom-up structured vector quantizers, most divisive algorithms (agglomerative) fashion. In contrast, the latter ones find all employ binary splitting. In other words, the color space the clusters simultaneously as a partition of the data and do of the input image is partitioned into K subspaces by a not impose a hierarchical structure [30]. sequence of K - 1 split operations. Note that the Preclustering methods are mostly based on the statistical number of binary splits that can be performed to obtain analysis of the color distribution of the images. Divisive K subpartitions equals the number of full binary trees preclustering methods start with a single cluster that con- 2K À 2 tains all N0 image colors. This initial cluster is recursively having exactly K leaves, 1 which is K K À 1 subdivided until K clusters are obtained. Well-known typically too large to permit exhaustive enumeration. divisive methods include median-cut [24], octree [22], 2. Selection of the next partition to be split: In each variance-based method [54], binary splitting [40], greedy iteration, the algorithm selects a partition and splits it orthogonal bipartitioning [58], center-cut [31], and rwm- into two subpartitions. Possible choices for the parti- cut [64]. More recent methods can be found in [13, 25, 32, tion to be split include the most populated partition 37, 49]. On the other hand, agglomerative preclustering [24], the partition with the greatest range on any methods [1, 5, 19, 51, 61] start with N0 singleton clusters coordinate axis [31], the partition with the greatest each of which contains one image color. These clusters are dominant eigenvalue [40], and the partition with the repeatedly merged until K clusters remain. In contrast to greatest sum of squared error (SSE) [13, 49, 54, 58, preclustering methods that compute the palette only once, 64]. Among these criteria, the last one is the most postclustering methods first determine an initial palette and sensible one as the partition with the greatest SSE is then improve it iteratively. Since these methods involve the one that contributes to the total distortion the most. iterative or stochastic optimization, they can obtain higher 3. Selection of the partitioning plane normal vector: The quality results when compared to preclustering methods at partitioning plane may be orthogonal to the coordinate the expense of increased computational time. Clustering axis with the greatest range [24, 31], the coordinate algorithms adapted to CQ include maxmin [23, 59], axis with the greatest variance [49], the major axis k-means [9, 26, 27, 29, 33], k-harmonic means [20], [40], or some other specially chosen axis [13, 54, 58, competitive learning [8, 10, 46, 52], fuzzy c-means [7, 34, 64]. Among these choices, the major axis is the most 41, 45, 57], rough c-means [44], BIRCH [3], and self- sensible one as this is the axis along which the data organizing maps [12, 14, 16, 42, 43, 62]. spread is the greatest. However, determination of the In this paper, we present an effective divisive preclu- major axis requires the computation of the cluster stering method for CQ. The rest of the paper is organized covariance matrix, which is expensive. Therefore, the as follows. Section 2 describes the anatomy of a divisive coordinate axis with the greatest variance can be used hierarchical clustering algorithm and the proposed CQ as a computationally efficient alternative to the major method. Section 3 presents the experimental setup and axis. compares the proposed method to other CQ methods. 4. Selection of the partitioning plane position: The Finally, Section 4 gives the conclusions. partitioning plane may pass through the mean [13, 31, 40], the median [24], the radius-weighted mean 2 Divisive hierarchical clustering for CQ [64], or some other specially chosen point [49, 54, 58, 64] on the partitioning axis. The rationale behind the 2.1 Anatomy of a divisive hierarchical clustering choice of the median point, which is adapted from the algorithm original kd-tree construction algorithm [2], is that the resulting subpartitions will contain approximately As described in the previous section, preclustering the same number of colors. However, there is no sound methods can be divided into two categories: divisive and justification to require that each cluster contain a 123 J Real-Time Image Proc (2015) 10:329–344 331 nearly equal number of colors, while ignoring the a vector x ¼ ðÞx1; x2; x3 : Let w, m, and v denote distribution of these colors [53]. In contrast, for the weight, mean, and variance of the parent hyperspherical clusters, it can be shown that the mean partition C, respectively. The weight, mean, and var- point is the optimal choice [40]. iance of the other subpartition, Cb; are then given by hi.wb = w -wa, mb ¼ ðÞwm À wama =wb; and 2 2 2.2 Proposed CQ method vb ¼ wv À wa va þ ðÞm À ma wb À ðÞm À mb ; respectively. Motivated by computational efficiency considerations, we propose a new divisive CQ method called variance-cut (VC) that employs the binary splitting strategy. Following the majority of divisive algorithms [13, 24, 49, 54, 58, 64], VC starts by building a 32 9 32 9 32 color histogram using 5 bits/channel uniform quantization. In each itera- tion, the method splits the partition with the greatest SSE along the coordinate axis with the greatest variance at the mean point. After K - 1 iterations (splits), the centroids of the resulting K subpartitions are taken as the color palette.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages16 Page
-
File Size-