<<

2018 International Conference on Communication, Network and Artificial Intelligence (CNAI 2018) ISBN: 978-1-60595-065-5

Research on Underwater Image Enhancement Technology Based on Generative Adversative Networks Geng-ren , Bo YIN*, Xiao WANG and Ze- Ocean University of China, No.238, Songling Road, 266100, Qingdao, Shandong Province, P.R. China *Corresponding author

Keywords: Underwater image enhancement, GANs, Confrontation training, Color restoration.

Abstract. Under the influence of light refraction and particle scattering, the underwater image has a low contrast and color attenuation. This paper proposes underwater image enhancement method based on Generative Adversative Networks (GANs). The main principle is to apply the GANs to learn the pixel transformation between images through confrontation training, so as to achieve the purpose of image enhancement. In this paper, we select the real underwater image as the data set and degrade it before inputting the image into the neural network. Then we input the corresponding degenerate image, so that the neural network can learn the mapping from input image to real image and realize the color reduction and enhancement of underwater image. Experiments show that this method has strong robustness and can be applied to underwater shooting and underwater navigation.

Introduction The underwater environment is complex and changeable. When light is propagated in water, some wavelengths are absorbed, resulting in attenuation of image color and color cast. The scattering effect of various suspended particles in water will result in low contrast and blur. In addition, the density of water itself and the influence of the underwater light source can also cause the degradation of underwater images. The traditional image processing method can improve the visual effect of the image by adjusting the pixel value of the image, or by establishing the corresponding degradation model and inverting the degradation process to obtain a clear image. But because of the complexity of the underwater environment and the influence of various interference sources, simply by changing pixel values it cannot restore the original visage of underwater objects and also cannot eliminate all types of noise by using the method of inverse regression models only. With the development of deep learning and the application of neural network in the field of image processing, there are new solutions to this problem. The deep neural network can learn the distribution of images and adjust the weights and thresholds through training. In this way, we can create the same distribution model as the original image. This method can be used to effectively achieve image restoration and enhancement, thereby improving the quality of underwater images.

Related Work There are many specialized methods for underwater images. [2] proposed a feature-based color constant assumption algorithm to correct the color deviation of underwater images. [3] put forward a method based on the fusion principle to improve the visual quality of underwater images and videos. [4] used Particle Swarm Optimization(PSO) to reduce the influence of light absorption and scattering on underwater images. However, these methods are limited by the quality of the image or the influence of mathematical models and can’t be fully applied to complex underwater environments. At present, deep learning is widely used in the field of image processing and has been made great progress. [5] proposed a natural image enhancement and denoising method using convolutional networks. [6] proposed a multi-layer stacking self-coding method for image processing. Lore et al. [9]proposed a method to simulate low illumination environment by using nonlinear obscuration and adding gaussian noise to enhance image contrast enhancement. In confrontation with the network, Goodfellow et al. proposed the generation of confrontation networks (GANs) in 2014 [1]. Subsequently, many researchers use the GANs for various image processing and propose improvements. For example, Radford et al. proposed a method of using micro-step convolution for image sampling and generation of anti-network (DCGAN) [10]. In [10], the working process and principle of micro-step convolution are visualized. The generation of confrontation network does not need to presuppose the data distribution, and it is free from the limitations of the data distribution model. However, because the model is too free, it is not ideal when the data set is too large or too many pixels. Therefore, Mehdi Mirza et al. proposed a conditional generation of confrontation networks (CGANs) [11]. It increases the condition of the model by introducing the conditional variable and prevents the GANs from being too free to generate the unideal results.

Generative Adversative Networks The Generative Adversative Networks consists of a Generator and a Discriminator. The generator inputs real samples to capture the distribution of it and outputs a forged sample, while the discriminator inputs the sample generated by the generator and discriminates the probability that whether the input is a real sample. After confrontation training, the generator can estimate the data distribution of the sample. The basic structure of the network is as follows (Figure 1).

Real Samples

Latent Space

Is D D Correct? Discriminator

G Generated Generator Fake Samples

Fine Tune Training Noise Figure 1. The basic structure of GANs The objective function of the network can be expressed by the formula(1).

min max (, ) =~ ()[log ()] +~ () log 1 − (). (1) In the formula (1), G represents the Generator, D represents the Discriminator. The x and G(z) are the parameters of G and D. The () and P(z) are their probability density functions. The D(x) represents the probability that x belongs to the distribution M. When optimizing the Discriminator, the output V(D,G) of the network is expected to be the minimum, so as to ensure that the Discriminator can identify the fake samples to the maximum extent. While optimizing the Generator, we hope that V(D,G) is the largest, so as to ensure that the generated image can’t be detected by the Discriminator. This goal formula can be disassembled into the following two steps.

max (, ) =~ ()[log ()] +~ () log 1 − (). (2) min (, ) =~ ()[log (1 − (()))]. (3) For generator network G, we assume that the dimension space of the input image is x∈ℝ××. Convolution kernel is a free matrix z ∈ Ζ. The image area where the two matrices are multiplied is the characteristic region. Our purpose is to make the similarity between the generated image and the real image as high as possible. In order to learn more features, the input image is first binarized and converted into a three-channel monochrome image, so that the neural network learns transforming from monochrome image to real image. When the Generator produces a forged sample, it is given to the Discriminator for discrimination. The output of D=0 or 1 indicates that the input is a fake or real sample. The probability density function of G(z) is defined as (), when G is fixed, the output value of D can be expressed as formula(4).

∗ () () = . (4) ()() After training, the optimal result is that the probability of D outputs 0 and 1 is 1/2, so that the output of D is optimal when p =p. Compared with traditional neural networks, GANs can learn the data distribution of real sample sets without the need to pre-set the data distribution model. The Generator is trained through the resistance training of the Discriminator, and it can eventually generate an image close to the real sample. In underwater image processing, GANs can be used to learn the data distribution of various topography and underwater organisms. When the images we shoot are disturbed by various factors, they can be restored through the neural network to achieve image restoration and enhancement. In the original text of Goodfellow et al., the learning process of GANs is introduced in detail[1], which is that the discriminant boundary is fitted with the real image through continuous iterative optimization.

Experiment In this paper, we use the Generative Adversative Networks to process underwater images, and the image generation effect is judged from both subjective and objective aspects. The data set used in this paper is from underwater video screenshots and some underwater images selected from the ImageNet dataset, with 1760 training sets and 440 test sets. Our purpose is to learn the pixel transform from the input image to the output image. In order to facilitate the production of the input image corresponding to the original image, the image is first preprocessed by binary processing and converted to YIQ format to highlight the features of the input image. The input image looks very different from the original image. Then train the model with the original image. In the training process, the Generator learns the distribution of underwater images through continuous confrontation with the Discriminator. Finally, the Generator can generate color images with high similarity to the original image based on the input images. Figure 2 shows our process flow. Image Binarization Training set & Convert to YIQ Input format Training model

Input

Output

Image Binarization Input Image & Convert to YIQ Input Model Output Output image format

Figure 2. Our basic process flow During the training, we recorded the losses of the Generator and the Discriminator and draw them with line graph to reflect the fitting degree of the two network models to the data. Since the two networks conduct confrontation training, their losses are negatively correlated. In order to make the generated images more similar to the real image and enhance the generalization ability of the model, L1 regularization is used as the penalty for the loss between the generated image and the real image. L1 loss can partly reflect the effect of the model on subjectivity. During the training process, it iterated 300 times. The following figure shows the loss of the Generator (Figure 3), the Discriminator (Figure 4) and the L1 (Figure 5). 1.2

1

0.8

0.6

0.4

0.2

0 1 531 1061 1591 2121 2651 3181 3711 4241 4771 5301 5831 6361 6891 7421 7951 8481 9011 9541 10071 10601 11131 11661 12191

Figure 3. The loss of discriminator

6

5

4

3

2

1

0 1 531 1061 1591 2121 2651 3181 3711 4241 4771 5301 5831 6361 6891 7421 7951 8481 9011 9541 10071 10601 11131 11661 12191 Figure 4. The loss of generator

0.35

0.3

0.25

0.2

0.15

0.1

0.05

0 1 554 1107 1660 2213 2766 3319 3872 4425 4978 5531 6084 6637 7190 7743 8296 8849 9402 9955 10508 11061 11614 12167

Figure 5. The loss of L1 We can see from the Figure 5 that the loss of L1 gradually converges to around 0.1. This is because the image binarization process can not completely preserve all the features of the original image, and the original image will also have certain noise to be filtered out during the input image preprocessing, so there will be a certain error between the generated image and the real image. After the model training is completed, we use the test set to examine the training effect of the model. First, we subjectively evaluate the generated image. Figure 6 is partial images extracted randomly. We can see that the generated image can distinguish underwater organisms and topography in the case of human eye observation. The generated image and Ground truth is basically the same in color brightness, and the color reproduction degree is better, but vague on the details. Input image Output image Ground truth Input image Output image Ground truth

Figure 6. Partial Input images, Generated images and Ground truths Then we evaluate the image objectively. We compare the structural similarity index between the generated image and the real image at first. Structural similarity index (SSIM) is a kind of new indicators used for evaluation of image quality, which needs to get rid of the brightness of the image information in calculation and contrast information, only compares the structure information of image, namely remove image of the mean and normalized variance of the image. Table 1 shows the SSIM values of the above generated images and real images. We can see that the SSIM values of the input image and the Ground truth are very small, it can be determined that their structural information is completely different. This value is much smaller than the SSIM value of the output image and the Ground truth, so we can come to the conclude that the output image has good structural similarity to the real image. Table 1. The SSIM of Input image, Output image and Ground truth No Input-Ground truth Output-Ground truth 1 0.0575 0.9370 2 0.0661 0.8945 3 0.2906 0.8943 4 0.1094 0.7756 Peak signal-to-noise ratio (PSNR) is an objective criterion to measure image distortion and can be used to objectively evaluate the degree of deterioration of image. Table 2 shows part of the input image and its corresponding generation peak signal-to-noise ratio between images and real images. It can be seen that after preprocessing of the input image degradation is serious, the peak signal-to- noise ratio is very low. The peak signal-to-noise ratio of the generated image is basically maintained between 20db and 30db, which indicates that the generated image is better than the input image quality, but still not ideal, so the image quality needs to be further improved. Table 2. The PSNR of Input image, Output image and Ground truth No Input-Ground truth Output-Ground truth 1 6.3424 22.2287 2 7.5897 24.7275 3 9.2916 29.6081 4 9.2389 21.9271 From the experimental results, this method can be used to obtain the color and brightness consistent with the real image. Moreover, the structure information similarity is also high, which can be applied to underwater image enhancement and correction of color deviation and brightness enhancement. But the image is fuzzy in detail, and the PSNR value is low. The main reason is that the input image and the real image are different in detail, so we need to improve the method of image preprocessing, so that the input image can retain more detailed features of the original image.

Conclusion In this paper, we use the current popular deep learning method to enhance underwater image, which is an application of deep learning in underwater image processing field. The generated antagonistic network used in this paper does not need to presuppose the data distribution model. The characteristics of GAN are used to learn the data distribution of underwater images. As long as the model is well trained, it can theoretically generate images that are identical to the original image, which is also our goal. Next, we will continue to study the application of the generation of the network in underwater image processing. Combining existing methods to improve the way of input image preprocessing, we will better preserve the characteristics of input images for neural network learning. We will collect more underwater images to train, improve the image resolution, and hope to get a clearer image of the output.

Acknowledgement This work was supported in part by Qingdao nationallaboratory for marine science and technology Aoshan science and technology innovation project (2016ASKJ07-4), China International Scientificand Technological Cooperation Special (2015DFR10490) and Qingdao innovation and entrepreneurship leading talent project (13-cx-2).

References [1] Lan J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, et al. “Generative Adversarial Nets.” In NIPS, 2014. [2] Kan L Y, J, Y, et al. “Color correction of underwater images using spectral data.” Proceedings of the SPIE 9273, Optoelectronic Imaging and Multimedia Technology. [3] Ancuti C, Ancuti C O, Haber T, et al. “Enhancing underwater images and videos by fusion.” Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence, 2012:81-88. [4] AbuNaser A, Doush I A, Mansour N, et al. “Underwater image enhancement using particle swarm optimizatio.” Journal of Intelligent Systems, 2015, 24(1): 99–115. [5] , HuiMing. "Deep Learning for Image Denoising." International Journal of Signal Processing, Image Processing and Pattern Recognition 7.3 (2014): 171-180. [6] , Yangwei, Haohua Zhao, and Liqing . "Image Denoising with Rectified Linear Units." Neural Information Processing. Springer International Publishing, 2014. [7] Kin Gwn Lore, Adedotun Akintayo, Soumik Sarkar. "LLNet: A Deep Autoencoder Approach to Natural Low-light Image Enhancement." International Journal of Pattern Recognition , 2016 , 61 :650-662. [8] Alec Radford, Luke Metz, Soumith Chintala. ”Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks.” Under review as a conference paper at ICLR 2016. arXiv:1511.06434(2016). [9] Vincent Dumoulin, Francesco Visin. “A guide to convolution arithmetic for deep learning.” arXiv preprint arXiv:1603.07285(2016). [10] Mirza M, Simon Osindero, et al. “Conditional Generative Adversarial Nets.” International Journal of Computer Science, 2014:2672-2680. [11] Phillip Isola, Jun- Zhu, Tinghui Zhou, et al. “Image-to-Image Translation with Conditional Adversarial Networks.” Conference on Computer Vision & Pattern Recognition, 2017:5967-5976.