Cycle-CNN for Colorization Towards Real Monochrome-Color Camera Systems
Total Page:16
File Type:pdf, Size:1020Kb
The Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20) Cycle-CNN for Colorization towards Real Monochrome-Color Camera Systems Xuan Dong,1 Weixin Li,2∗ Xiaojie Wang,1 Yunhong Wang2 1School of Computer Science, Beijing University of Posts and Telecommunications, Beijing, 100876 2Beijing Advanced Innovation Center for Big Data and Brain Computing, Beihang University, Beijing, 100191 {dongxuan8811, xjwang}@bupt.edu.cn, {weixinli, yhwang}@buaa.edu.cn Abstract Colorization in monochrome-color camera systems aims to colorize the gray image IG from the monochrome camera us- ing the color image RC from the color camera as reference. Since monochrome cameras have better imaging quality than color cameras, the colorization can help obtain higher quality color images. Related learning based methods usually sim- ulate the monochrome-color camera systems to generate the synthesized data for training, due to the lack of ground-truth color information of the gray image in the real data. However, (a) Input pair of gray image IG and color image RC. (b) the methods that are trained relying on the synthesized data ∗ may get poor results when colorizing real data, because the Our colorization result IC. synthesized data may deviate from the real data. We present a new CNN model, named cycle CNN, which can directly Figure 1: The input pair of gray image IG and color image use the real data from monochrome-color camera systems for RC are shot by the monochrome and color cameras, respec- training. In detail, we use the colorization CNN model to do tively. By directly using these real data for training, our al- the colorization twice. First, we colorize IG using RC as gorithm learns to colorize IG using RC as reference. reference to obtain the first-time colorization result IC. Sec- ond, we colorize the de-colored map of RC, i.e. RG, us- ing the first-time colorization result IC as reference to obtain the monochrome camera using the color image RC from the second-time colorization result RC. In this way, for the the color camera as reference. Between the monochrome second-time colorization result RC, we use the original color and color cameras, there exist different hardwares, e.g. the map RC as ground-truth and introduce the cycle consistency color filter array, and different software modules, e.g. white ≈ loss to push RC RC. Also, for the first-time colorization balance, demosaic, etc. As a result, on the one hand, the result IC, we propose a structure similarity loss to encour- monochrome camera has better light efficiency (Jeon et al. age the luminance maps between IG and IC to have simi- 2016; Dong et al. 2019) than the color camera, and thus the lar structures. In addition, we introduce a spatial smoothness gray image has higher quality, i.e. signal-noise ratio, than loss within the colorization CNN model to encourage spa- tial smoothness of the colorization result. Combining all these the color image. This motivates researchers to do the col- losses, we could train the colorization CNN model using the orization so as to get higher quality color images using the real data in the absence of the ground-truth color information monochrome-color camera systems. On the other hand, the of IG. Experimental results show that we can outperform re- pair of gray and color images have different luminance, blur, lated methods largely for colorizing real data. noises, etc. These cause the difficulty for the colorization. Among existing methods for colorization within the Introduction monochrome-color camera system, some are traditional hand-crafted methods, e.g. (Jeon et al. 2016). With the suc- With the increasing use of monochrome-color camera sys- cessful use of deep learning in various computer vision prob- tems in high-end smart phones, e.g. Huawei P30, Mate30, lems, some deep learning based methods, e.g. (Dong et al. etc., the colorization problem within these systems is attract- 2019) are proposed recently, which have shown to be able ing more and more attentions from the academic and indus- to obtain higher accuracy than the traditional ones. How- trial communities. ever, in the deep learning methods, e.g. (Dong et al. 2019), As shown in Fig. 1, colorization in monochrome-color the models usually need ground-truth color information of camera systems aims to colorize the gray image IG from the input gray images as annotations for training. Due to ∗Corresponding Author. the lack of ground-truth color information in the real data, Copyright c 2020, Association for the Advancement of Artificial as shown in Fig. 2, current methods, e.g. (Jeon et al. 2016; Intelligence (www.aaai.org). All rights reserved. Dong et al. 2019), usually synthesize data to simulate the 10721 Synthesized Data Real Data Structure Similarity Loss Dual Monochrome Color & Color Y Cb Cr Concat (IG ,IC ,IC ) Cameras Cameras & Flip Add Decolor & Add Input gray image I First-time Flipped first-time G colorization result distortions distortions colorization result IC IC Colorization CNN , The input pair of color and Ground-truth The input pair of color and gray images gray images Flip and de-color Figure 2: How the synthesized data and the real data are ob- Input color image R R ’s de-colored C C tained. The real data are the pair of gray and color images and flipped map RG shot from the monochrome-color camera system. The syn- Flip thesized data are the pair of gray and color images that are Cycle synthesized using a pair of color images from the dual-color Consistency Loss camera system. Second-time ' ' R G ’s colorization result RC colorization result RC (I ,R ) → I G C C Figure 4: The overall structure of our Cycle CNN model. real data from the monochrome-color system. In detail, as Input gray image I Colorization result I G C shown in Fig. 4, we use the colorization CNN to do the col- orization twice, i.e. firstly colorizing IG using RC as ref- erence and secondly colorizing RG using IC as reference. Our objective have three terms, i.e. structure similarity loss, → (RG ,IC ) RC cycle consistency loss, and spatial smoothness loss. For the Input color image RC RC’s de-colored map RG first-time colorization result IC, we introduce the structure similarity loss to encourage the structure similarity of the luminance maps between IG and IC. For the second col- Figure 3: The insight of cycle colorization consistency. When doing the colorization twice, i.e. firstly (IG, RC) → orization result RC, we introduce the cycle consistency loss → to encourage the similarity of the color maps between RC IC and secondly (RG, IC) RC, the second-time col- and R . In addition, we introduce the spatial smoothness orization result R should arrive back at RC. C C loss to encourage the spatial smoothness of the colorization result. We train the colorization CNN model by combining real data from the monochrome-color camera system. How- all these loses. We also use a refinement CNN to refine IC I I∗ ever, the degradation models for synthesizing the data may with G as guidance to get the final result C. deviate from the ones in real imaging systems within the Experimental results show that we can outperform related monochrome and color cameras. Thus, the synthesized data methods largely for the real data from the monochrome- could hardly simulate the real data perfectly. As a result, the color camera system. deep learning methods, which are trained relying on the syn- Our contributions include 1) the cycle CNN structure thesized data, may have very poor results when colorizing that enables to train the model using real data from the the real data. monochrome-color camera systems, 2) the cycle consistency To overcome this limitation, in this paper, we propose a loss for the second-time colorization result, 3) the structure new convolutional neural network (CNN) model and aim to similarity loss for the first-time colorization result, and 4) directly use the real data from the monochrome-color cam- the spatial smoothness loss for spatial smoothness of the col- era system for training. orization result. Our insight is based on the property of cycle colorization consistency. As shown in Fig. 3, when we do the coloriza- Related Works tion twice, i.e. firstly colorizing IG using RC as reference The existing colorization tasks can be divided into four and secondly colorizing the gray map of RC, i.e. RG, using kinds, i.e. automatic colorization, scribble-based coloriza- I the obtained first-time colorization result C as reference, tion, reference-based colorization, and monochrome-color the second-time colorization result RC should arrive back dual-lens colorization. at RC. In automatic colorization, the input is only a single gray Based on this insight, we propose a new CNN model, image and the algorithms need to automatically colorize it named Cycle CNN, that can be learned directly using the without any reference. Recent deep learning based methods, 10722 Y e.g. (Zhang, Isola, and Efros 2016) and (Iizuka, Simo-Serra, IG ResNet and Ishikawa 2016), make great progress to solve this prob- Y IC ResNet lem. However, these methods are not proper for our problem Y Y SSM l Concat ResNet (IG ,IC ) (IC ,IG ) because they fail to make use of the color image from the c Y Y (IC ,IG ) color camera, which contains much useful color information s(IY ,IY ) for colorizing the gray image from the monochrome camera. C G In the scribble-based colorization task, the input includes Figure 5: The structure of our structure similarity CNN a single gray image and several color scribbles which are model.