<<

A Colorization Based on Local MAP Estimation

∗ Hideki Noda a, , Jin Korekuni b, Michiharu Niimi a

aDepartment of Systems Innovation and , Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, 820-8502 Japan bDepartment of Electrical, Electronic and Engineering, Kyushu Institute of Technology, 1-1 Sensui-cho, Tobata-ku, Kitakyushu, 804-8550 Japan

Abstract

This paper presents a colorization algorithm which adds color to monochrome im- ages. In this paper, the colorization problem is formulated as the maximum a poste- riori (MAP) estimation of a color image given a monochrome image. Markov random field (MRF) is used for modeling a color image which is utilized as a prior for the MAP estimation. The MAP estimation problem for a whole image is decomposed into local MAP estimation problems for each pixel. Using 0.6% of whole pixels as references, the proposed method produced pretty high quality color images with 25.7 dB to 32.6 dB PSNR values for eight images.

Key words: Colorization, Monochrome image, MAP estimation, MRF

1 Introduction

Colorization is a process, usually a computer-aided process of adding color to monochrome images or movies. There should be considerable demands for colorization of monochrome images or movies. Colorization is now generally carried out manually using some drawing tools. A user typically car- ries out segmentation of a monochrome image by giving region boundaries by hand and then assigns a color to each region. Obviously such manual work is very expensive and time-consuming.

∗ Corresponding author. Tel.: +81-948-29-7714; fax: +81-948-29-7709 Email address: [email protected] (Hideki Noda).

Preprint submitted to Elsevier Science 28 March 2006 Recently several colorization methods [1–3] have been proposed which do not require intensive manual effort. Welsh et al. proposed a semi-automatic method to colorize a monochrome image by transferring color from a reference color image [1]. The entire color ”mood” of the reference image is transferred to the target monochrome image by matching luminance and texture information between the two images. This method requires an appropriate reference color image which should be prepared by a user and works well only for images where differently colored regions have distinct luminance values or distinct textures. Levin et al. have proposed an interactive method which does not require precise manual segmentation [2]. In their method, instead of manual segmentation, a user needs to give some color scribbles, and the colors are automatically propagated to produce a fully colorized image. Horiuchi [3] has proposed a method where a user gives colors for some pixels and colors for all other pixels are determined automatically by using the probabilistic relax- ation. One of serious problems in his method is that it is computationally very expensive; it takes almost one day to colorize one image.

Unlike previously proposed colorization methods, this paper formulates the colorization problem as Bayesian inference, i.e., the maximum a posteriori (MAP) estimation of a color image given a monochrome image. Markov ran- dom field (MRF) [4] is used for modeling a color image which is utilized as a prior for the MAP estimation. In this paper, the global MAP estimation problem for a whole image is approximately decomposed into local MAP es- timation problems for each pixel, and the local MAP estimation is reduced to a simple quadratic programming problem with constraints.

2 Color Image Modeling By Markov Random

2.1 Markov Random Field

Let L = {(i, j); 1 ≤ i ≤ N1, 1 ≤ j ≤ N2} denote a finite set of sites of an N1 × X N2 rectangular lattice. Let ηij ⊂ L denote the (i, j) pixel’s neighborhood of a 1 X random field XL defined on L.LetCij denote the set of cliques associated X X with ηij which contains the (i, j) pixel, i.e., (i, j) ∈ Cij . For example, in the X first-order neighborhood, ηij = {(i, j +1), (i, j − 1), (i +1,j), (i − 1,j)} and X Cij = {{(i, j)}, {(i, j), (i, j+1)}, {(i, j), (i, j−1)}, {(i, j), (i+1,j)}, {(i, j), (i− 1,j)}} which consists of one singleton and four doubleton cliques. Let the random field XL = {Xij;(i, j) ∈ L} be a Markov random field (MRF) defined on L with Xijs taking values from a common local state space QX .Itiswell

1 x f x {x ,...,x } In this paper, A and ( A)denotetheset a1 al and the multivariable f x ,...,x A {a ,...,a} ( a1 al ) respectively, where = 1 l .

2 known that an MRF is completely described by a Gibbs distribution

1 p(xL)= exp{−U(xL)}, (1) ZX

N1×N2 where xL is a realization of XL from the configuration space ΩX = QX and U(xL)= U(xC )(2) (i,j)∈L X C∈Cij

is the global energy function whereas U(xC ) is the clique energy function and ZX = exp{−U(xL)} (3) xL∈ΩX is the partition function. For details on MRFs and related concepts such as the neighborhoods and cliques, see Ref. [4].

2.2 A Color Image Model Using Gaussian MRF

A color image can be considered as a realization xL = {xij;(i, j) ∈ L} of T a random field XL = {Xij;(i, j) ∈ L},wherexij =(rij,gij,bij) is a color vector at (i, j) pixel composed of red rij, green gij and blue bij components. Color images are modeled by a Gaussian MRF (GMRF) characterized by the following local conditional density function (pdf) 2 :

1 1 T −1 | X {− − − } p(xij xη )= 3/2 1/2 exp (xij mij) (ΣX ) (xij mij) ,(4) ij (2π) |ΣX | 2 1 mij = xij+τ . (5) |N| τ∈N

Here mij is the mean of neighboring pixels’ color vectors x X = {xij+τ ,τ ∈ ηij N},whereN denotes the neighborhood of (0, 0)-pixel. For example, N = {(0, 1), (0, −1), (1, 0), (−1, 0)} for the first-order neighborhood, and if τ = (0, 1), xij+τ = xi,j+1. ΣX is the covariance of xij − mij.

2 The used GMRF is one of the simplest GMRFs, which can model only nontextured smooth images. We here used this GMRF as a first step, though there are more complicated GMRFs applicable to textured images.

3 3 Color Image Estimation

3.1 Derivation of Estimation Algorithm

We assume that a monochrome image yL = {yij;(i, j) ∈ L} is associated with a color image xL = {xij;(i, j) ∈ L} under the following relation:

T yij = a xij =0.299rij +0.587gij +0.114bij, 0 ≤ yij,rij,gij,bij ≤ 255.(6)

Given yL, xL can be estimated by maximizing the a posteriori probability p(xL | yL), i.e., by MAP estimation. The MAP estimate xˆL is written as

xˆL =argmax p(xL | yL), (7) xL∈ΩX where the a posteriori probability p(xL | yL) is described as

p(yL | xL)p(xL) | p(xL yL)= | . (8) xL∈ΩX p(yL xL)p(xL)

Note that it is practically impossible to find the MAP estimate xˆL since the 3|L| search space over all possible configurations of xL is huge, i.e., |ΩX | = 256 . To overcome this problem, hereinafter we consider mean-field-based decom- position of the a posteriori probability.

Considering (6), p(yL | xL) is described as

T p(yL | xL)=1({yij = a xij, (i, j) ∈ L}) T = 1(yij = a xij), (9) (i,j)∈L where

⎧ ⎨⎪ T T 1ifyij = a xij 1(yij = a xij)=⎪ (10) ⎩ 0otherwise.

Using the mean field approximation, p(xL) can be decomposed as [5]

p(xL)  p(xij |x X ), (11) ηij (i,j)∈L

4 where xηX denotes the mean fields for xηX . Substituting (9) and (11) into ij ij (8) and replacing xL∈ΩX (i,j)∈L by (i,j)∈L xij ∈QX , we obtain the following decomposition for p(xL | yL):

p(xL | yL)  p(xij | yij, x X ), (12) ηij (i,j)∈L where

T 1(yij = a xij)p(xij |x X ) ηij p(xij | yij, x X )= . (13) ηij T ∈Q 1(yij = a xij)p(xij |x X ) xij X ηij

In the following, x X is simply used for x X .Thenp(xij | yij, x X )=p(xij | ηij ηij ηij yij, x X ) is considered as a local a posteriori probability (LAP). Using these ηij LAPs, the problem shown by Eq. (7) is approximately decomposed into the local optimization problems

xˆij =arg max p(xij | yij, xηX ). (14) xij ∈QX ij

In order to solve (14) for all (i, j) pixels, their neighboring color vectors x X ηij should be given. Since such a problem as shown in (14) can be solved iteratively as is popular in numerical , we rewrite Eq. (14) as

(p+1) | (p) xij =arg max p(xij yij, xηX ), (15) xij ∈QX ij where p represents the pth iteration. Considering (4), (5), (6) and (13), the local MAP estimation (15) is rewritten as the following constrained quadratic programming problem:

T −1 1 (p) minimize (xij − mij) (ΣX ) (xij − mij)withmij = xij+τ (16) |N| τ∈N T subject to a xij = yij, 0 ≤ rij,gij,bij ≤ 255 (17)

3.2 Initial Color Estimation

Since the color estimation shown by Eq. (15) is carried out iteratively, an initial color image is needed to start the iterative procedure. Initial color image estimation using some reference colors is here described. Assuming that color

5 vectors for K pixels, cikjk ,k =1,...,K are given, consider how to derive an initial color image. We consider an initial color estimation procedure which consists of two steps.

(1) Selection of a reference color vector (0) (0) In order to estimate an initial color image xL = {xij ;(i, j) ∈ L}, a refer- ence color vector for each pixel is selected from given K references, cikjk ,k = 1,...,K. The used measure to select a reference for (i, j) pixel is

{ − 2 − 2}1/2 | − T | (i ik) +(j jk) yij a cikjk Fij(k)=w +(1− w) , (18) (N1 + N2)/2 255 where w is a weighting factor, and the first term measures a spatial distance from a reference cikjk and the second term measures a difference between (i, j) pixel’s brightness yij and that of cikjk . The reference cikjk which minimizes Fij(k) is selected for the (i, j) pixel. An appropriate value of w is determined experimentally.

(2) Color estimation using a reference (0) Once a reference cikjk is selected for (i, j) pixel, an initial estimate xij can T be determined as the closest point to cikjk within the plane a xij = yij. − Considering that cikjk xij for such xij should be orthogonal to the plane, (0) i.e., xij should be the projection vector of cikjk onto the plane, xij is derived as

− T (0) yij a cikjk x = ci j + a. (19) ij k k aT a

(0) (0) (0) (0) T However the derived projection point xij =(rij ,gij ,bij ) is sometimes out of the range of 0 ≤ rij,gij,bij ≤ 255 (color cube). The occurrences of such cases were from about 0.5 % of total 256 × 256 = 65536 pixels for Lena to 6

% for Milkdrop among four images. In such cases, the closest point to cikjk within the color cube should be on sides of the planar polygon which is the T cross section of the color cube cut by a given brightness plane a xij = yij. The closest point on sides can be determined as follows.

(i) Find the closest vertex x1 of the planar polygon to the reference cikjk .

(ii) Find two vertices (x2 and x3)adjacenttox1. The closest point should be on one of two sides, i.e., the side x1x2 or the side x1x3.

(iii) If the closest point is on the side x1x2, it can be derived as follows. Let x = tx1 +(1− t)x2, 0 ≤ t ≤ 1 be a point on the side. The distance D12(t)

between x on the side x1x2 and cikjk is written as

6 − T − D12(t)=(x cikjk ) (x cikjk ) 2 − 2 − T −  − 2 = t x1 x2 +2t(x2 cikjk ) (x1 x2)+ x2 cikjk . (20)

− − T − The closest point is derived as tx1 +(1 t)x2 with t =(cikjk x2) (x1 2 x2)/x1 − x2 which minimizes D12(t).

(iv) Let t12 and t13 be t values which minimize D12(t)andD13(t), respectively, for −∞ 0.5sincex1 is the closest

vertex to cikjk , there are four cases: case 1 is 0.5

4 Experimental Results

In order to evaluate the performance of the proposed colorization method, ex- periments were carried out using eight color images from a (http:// sampl.eng.ohio-state.edu/~sampl/database.htm). Four images (Lena, Milkdrop, Peppers, Mandrill) were used as a kind of training images to derive parameters such as the aforementioned weighting factor w, and another four images (Girl, Im8, Aerial, 384010) were used as test images. These images are 256 × 256 pixels in size and 24 bit per pixel (bpp) full color images. Their monochrome images were produced by the transform shown in (6) from the original color images and used for colorization experiments.

For initial color estimation, several numbers of reference color vectors were given from each original image, which were evenly spaced on its image lattice for the sake of simplicity. The selection of evenly spaced references is somewhat fair because good positions are not necessarily selected. The weight value w in (18) was set as follows. After optimal weight values, which depend on the number of references as well as image, were determined experimentally, the average of optimal weight values for four images was used as w.

The local MAP estimation problem, i.e., the constrained quadratic program- ming problem in (16) and (17), was here directly solved using a quadratic programming solver (http://plato.asu.edu/guide.html). In the calculation of 3 (p) mij in (16), the third-order neighborhood was used and xij+τ whose lumi-

3 For the third-order neighborhood, N = {(0, 1), (0, −1), (1, 0), (−1, 0), (1, 1), (−1, −1), (1, −1), (−1, 1), (0, 2), (0, −2), (2, 0), (−2, 0)}

7 nance value yij+τ is far from yij was excluded from the calculation. In the following experiments, if |yij+τ − yij| > 0.5s,wheres is the standard devia- (p) tion of luminance values averaged over four images, xij+τ was excluded from the calculation of mij. For the covariance matrix ΣX in (16), the average of normalized covariance matrices (normalized by their maximum components) for four images was used.

Colorization performance for training images for several cases using different numbers of references is shown in Fig. 1 and Table 1. That for test images is shown in Fig. 2 and Table 2. For each case, the upper row shows performance of initial color estimation and the lower row shows the final result after the iterative MAP estimation. Iterations were stopped when the difference of esti- mated color components averaged over all pixels at a current and the previous iteration became less than 0.5. Though improvement on PSNR value by MAP estimation is not very significant, it is still effective to a certain extent. Com- putational time to colorize one image was six seconds at most. Comparing with Horiuchi’s colorization method [3] 4 , where colorization performance for Lena was about 20 dB and 28dB with 1% and 7% of whole pixels, respec- tively, used as reference pixels, and that for Milkdrop was about 20 dB and 27dB with 1% and 7% used, respectively, the proposed method has definitely outperformed it since 20 × 20 pixels only amount to 0.6%.

5 Conclusions

This paper presented a colorization algorithm given some pixels’ colors as ref- erences. The proposed algorithm is based on the MAP estimation of a color image given a monochrome image, where a color image is modeled by a Gaus- sian MRF model. The global MAP estimation problem for a whole image is decomposed into local MAP estimation problems for each pixel, and the local MAP estimation is reduced to a simple quadratic programming problem with constraints. Using 0.6% of whole pixels as references, the proposed method produced pretty high quality color images with 25.7 dB to 32.5 dB PSNR values for eight images.

In this paper, the colorization problem was formulated as the MAP estimation in RGB color space. We plan to consider it in other color spaces such as YUV. In order to realize a practical and convincing colorization method which does not require many references, we are going to improve the way to select spatial positions of references, which could be guided by region information derived

4 We could make comparison of PSNR values only with Ref. [3] since no PSNR values were given in the other papers [1,2]. Subjective comparison of image quality remains for future investigation.

8 by automatic segmentation of monochrome images.

References

[1] T. Welsh, M. Ashikhmin, K. Mueller, Transferring color to greyscale images, ACM Transactions on Graphics 21 (2002) 277-280.

[2] A. Levin, . Lischinski, Y. Weiss, Colorization using optimization, ACM Transactions on Graphics 23 (2004) 689-694.

[3] T. Horiuchi, Colorization algorithm using probabilistic relaxation, Image and Vision 22 (2004) 197-202.

[4] S. Geman, D. Geman, Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images, IEEE Trans. Pattern Anal. & Machine Intell., PAMI-6 (1984) 721-741.

[5] H. Noda, M.N. Shirazi, E. Kawaguchi, MRF-based texture segmentation using wavelet decomposed images, Pattern Recognition, 35 (2002) 771-782.

9 Captions

Fig. 1. Experimental results for training images: (a) original color images, (b) monochrome images, (c) estimated color images using evenly spaced 20 × 20 references.

Fig. 2. Experimental results for test images: (a) original color images, (b) estimated color images using evenly spaced 5 × 5 references, (c) estimated ones using evenly spaced 20 × 20 references.

Table 1 Colorization performance (PSNR(dB)) for training images

Table 2 Colorization performance (PSNR(dB)) for test images

10 (a) (b) (c)

Fig. 1. Experimental results for training images: (a) original color images, (b) monochrome images, (c) estimated color images using evenly spaced 20 × 20 refer- ences.

11 (a) (b) (c)

Fig. 2. Experimental results for test images: (a) original color images, (b) estimated color images using evenly spaced 5 × 5 references, (c) estimated ones using evenly spaced 20 × 20 references.

12 Table 1 Colorization performance (PSNR(dB)) for training images references Lena Milkdrop Peppers Mandrill (evenly spaced) 3 × 3 initial 24.0 21.7 19.8 15.1 final ( iterations) 25.6 (5) 22.0 (4) 21.1 (7) 16.9 (11) 5 × 5 initial 25.4 24.5 22.1 17.1 final ( iterations) 26.9 (4) 24.8 (5) 23.8 (7) 19.9 (9) 10 × 10 initial 27.8 24.7 23.1 20.7 final ( iterations) 29.5 (4) 25.0 (4) 25.2 (6) 23.0 (8) 20 × 20 initial 30.4 27.1 25.5 23.1 final ( iterations) 32.5 (3) 27.7 (5) 27.4 (5) 25.7 (7) Table 2 Colorization performance (PSNR(dB)) for test images references Girl Im8 Aerial 384010 (evenly spaced) 5 × 5 initial 24.3 27.0 21.9 25.1 final ( iterations) 25.5 (4) 29.1 (4) 23.4 (5) 26.6 (4) 20 × 20 initial 28.6 28.9 24.5 29.1 final ( iterations) 30.2 (4) 31.5 (3) 26.5 (5) 31.0 (3)

13