Applying New Deep Learning Method to Multi-Spectral Image Fusion

International Journal of Advanced Science and Technology Vol. 29, No. 5, (2020), pp. 9441-9446

A.Ashrith1, K.Anusri2, B.Reshwik3, K.Pradeep4, Trauma Sri5 1, 2, 3, 4 U.G Student, Department of Computer Science GITAM Deemed To University, Hyderabad 5. Assistant Professor, Department of Computer Science, GITAM Deemed To University, Hyderabad [email protected], [email protected], [email protected], [email protected] , [email protected]

Abstract

CNN model plays an important role to extract the features of an image and a new deep neural network, an efficient Infrared (IR) and Visible Image fusion process (VIS) are applied to the image. In our system, a Siamese Convolutional Neural Network (CNN) is used to construct a weight plan that represents the output of every pixel for a couple of source images automatically. A CNN plays a role when an image is automatically encoded into a classification function domain. The key issue of image fusion, which is activity level calculation and fusion rule design, can be defined in one shot by applying the proposed procedure. The Image fusion process is perceptible to the human visual system by the multi-scale decomposition of images based on a wavelet transformation. In addition, by comparing pedestrian detection outcomes with other approaches using the YOLOv3 object detector using a public comparison dataset, the visual qualitative efficiency of the proposed fusion approach is assessed. The experimental results indicate that by using Relu model to the fused image we are achieving best results when compared to other activation functions. This approach shows both quantitative evaluation and visual quality competitive outcomes. Keywords: image fusion; visible; infrared; convolutional neural network; Siamese network

1. INTRODUCTION Infrared (IR) and Visual (VIS) image fusion technology is used to produce a composite image from multiple spectral source images to combine the same scene information. The source images of inputs are obtained with different parameter settings from various imaging modalities. It is assumed the fused image would be more suitable for human viewing than any of the individual input images. Because of this benefit, image fusion techniques have large applications in the fields of image processing and computer vision to enhance human and machine visual ability[1]. The general image fusion process extracts representative salient features from the same scene source pictures, and then the salient features are merged by a proper fusion method into a single image. IR pictures, such as sun, fog, and smog, are highly affected by external climate. IR images are superior to VIS images in areas where low-light conditions render the VIS image invisible. As mentioned above, due to variations in the imaging process, the intensities in IR and VIS images at the same pixel position also vary significantly. One of the most critical problems in image fusion is computing a weighted map that integrates knowledge about the image fusion Production of pixels from different source images[2]. The target in most current methods of image fusion is double: In particular, the calculation of activity rates and weight distribution. The sum of the absolute values of the decomposed coefficients is used for calculating the degree of operation in a conventional transform domain fusion process, and the "selected limit" or "weighted average" rule is applied to other sources, depending on the measurement acquired. Clearly, this method of measuring operation and allocating weight is susceptible to many factors.

International Journal of Advanced Science and Technology Vol. 29, No. 5, (2020), pp. 9441-9446

2. LITERATURE REVIEW A new effective infrared (IR) and visible (VIS) image fusion method In this paper, we present a new effective infrared (IR) and visible (VIS) image fusion method by using a deep neural network. In our method, a Siamese convolutional neural network (CNN) is applied to automatically generate a weight map which represents the saliency of each pixel for a pair of source images. A CNN plays a role in automatic encoding an image into a feature domain for classification. By applying the proposed method, the key problems in image fusion, which are the activity level measurement and fusion rule design, can be figured out in one shot[3]. The fusion is carried out through the multi-scale image decomposition based on wavelet transform, and the reconstruction result is more perceptual to a human visual system. In addition, the visual qualitative effectiveness of the proposed fusion method is evaluated by comparing pedestrian detection results with other methods, by using the YOLOv3 object detector using a public benchmark dataset. The experimental results show that our proposed method showed competitive results in terms of both quantitative assessment and visual quality. Convolutional Auto encoder-Based Multispectral Image Fusion This paper presents a deep learning-based pan sharpening method for fusion of panchromatic and multispectral images in remote sensing applications. This method can be categorized as a component substitution method in which a convolutional auto encoder network is trained to generate original panchromatic images from their spatially degraded versions[7]. Low resolution multispectral images are then fed into the trained convolutional auto encoder network to generate estimated high resolution multispectral images. The fusion is achieved by injecting the detail map of each spectral band into the corresponding estimated high resolution multispectral bands[8-12]. Full reference and no-reference metrics are computed for the images of three satellite datasets. These measures are compared with the existing fusion methods whose codes are publicly available. The results obtained indicate the effectiveness of the developed deep learning-based method for multispectral image fusion.

3. METHODOLOGY The main contribution of this paper can be summarized as follows: 1. We modeled a CNN-based learning scheme for measuring the activity measurement and automatically generating the weight map according to the saliency property of each pixel in the source imagepair. 2. Using a 3-level wavelet transform, the source image pairs were decomposed into low and high frequency sub-bands, and the fused image was obtained by recreating wavelet images with the scaled weight maps. This created less harmful artefacts with human visual perception for good consistency. 3. We systematically evaluated the experimental findings from both the quantity and the consistency standpoint. Twelve comparison data were used for quantitative evaluation, and the findings were contrasted with those of eighteen traditional prior art methods. Furthermore, the visual qualitative efficacy of the proposed fusion approach was assessed by analyzing the results of pedestrian detection after fusion using YOLOv3 object detector on a public benchmark dataset. To implement this paper following modules are used: 1) Upload Image: Using this module we will upload IR and VIS image to the application. 2) Siamese CNN Model: Using Siamese CNN we will extract features weight map which contains high and low frequency bands from both images. 3) Reconstruct Fusion Image: Using this image we will replace all low frequency bands of VIS image with high frequency bands from IR image to reconstruct new image called Fusion Image. 4) YOLOv3 Object Detection From Fusion Image: In this module we will apply YOLOv3 object detection algorithm on Fusion image to detect all visible objects from the image. Different imaging modalities capture the IR and VIS images while the transform domain fusion process is ideal for generating fewer unintended artifacts for good compatibility with human visual perception. To

International Journal of Advanced Science and Technology Vol. 29, No. 5, (2020), pp. 9441-9446

tackle this issue, we decomposed IR and VIS images using a 3-level 2-D hair Wavelet transform, and pairs of input images were decomposed into sub-bands of low and high frequency. Since the size of the original image is sampled down in each step during wavelet transformation, the weight map is scaled to match the size of the images sampled down[16]. Finally, the fused image is obtained by reconstructing the 3level wavelet images which the scaled weight maps cooperate with the weighted average. The number of levels to decompose depends on the size of the image. Most of the images were scaled (350– 400) pixels (400–450) in this report. Images are sampled down and filtered at each stage to the low pass. If the number of levels is too high, images may be blurred due to the lack of high frequency part, which affects the output of the reconstruction. The number of rates chosen is taking certain things into account. Information of decomposition and reconstruction of wavelet transform-based images are added. The wavelet transform-based fusion scheme is illustrated in the flow diagram in the Figure1.

Figure 1 : 3-level wavelet transform-based image fusion Procedure

DATA SETS: We have used these few sample IR and VIS images from TNO Multi-Spectral IR and VIS image dataset as shown in figure 3

VIS image: IR image:

Figure 2: Showing the VIS and IR images considered in the data set In above images first image is VIS image and second image is IR image. In second image we can see two persons there, but they are missing in first image due to low light and by adding high intensity pixel from IR image to VIS image, we can reconstruct fusion image and then both persons can be detected by YOLOv3 algorithm.

International Journal of Advanced Science and Technology Vol. 29, No. 5, (2020), pp. 9441-9446

4 .RESULT ANALYSIS We can see CNN has generated total 15 layers from uploaded images and it extracts features from all 15 layers. Each layer information using CNN model are shown in the Figure 3.

Figure 3: Showing the layers of CNN.

In above screen we can see convolution CONV2D generated 15 layers and each layers features extracted with different image size such as first layer image size is 128 and 128 and next one 64 and 64 and goes on. After extracting features click on „Reconstruct Fusion Image‟ to get construct new fusion image as shown in the Figure 4.

Figure 4 : After Image Fusion (Reconstruction of Fusion image)

You can see fusion image which is the combination of VIS and IR image. We observe two persons in the fusion image. we can detect these persons using YOLOv3 Object Detector. Object detection within a bounding box is shown in the Figure 5.

International Journal of Advanced Science and Technology Vol. 29, No. 5, (2020), pp. 9441-9446

Figure 5: Object detection with Bounding Box

We have applied three different activation models ReLu , Tanh and Sigmoid functions to the Fusion image and classification has been performed. In our paper ,the best accuracy result 86 % achieved in choosing ReLu activation function when compared with remaining Tanh activation function accuracy is 72 % and Sigmoid activation function ,we have achieved 61% as classification accuracy. Classification Accuracy results are shown in the Figure 6:

ACTIVATION FUNCTION ACCURACY(%) LOSS PREDICTION ReLu 86 0.147 Tanh 72 0.227 Sigmoid 61 1.338 Figure 6: Classification accuracy results

ReLU stands for rectified linear unit, and is a type of activation function. Mathematically, it is defined as y = max(0, x). ReLU is the most commonly used activation function in neural networks, especially in CNNs. We have achieved 86% of accuracy in this paper ReLU is linear (identity) for all positive values, and zero for all negative values. This means that:  It‟s cheap to compute as there is no complicated math. The model can therefore take less time to train or run.  It converges faster. Linearity means that the slope doesn‟t plateau, or “saturate,” when x gets large. It doesn‟t have the vanishing gradient problem suffered by other activation functions like sigmoid or tanh.  It‟s sparsely activated. Since ReLU is zero for all negative inputs, it‟s likely for any given unit to not activate at all.

5. CONCLUSION In this paper, we proposed an IR and VIS image fusion approach based on deep learning. To produce a weight map that indicated the likelihood of each source pixel to fuse from a pair of source images, a CNN-based image encoding function and a function classification approach was applied in our process. The main problems in image fusion which are the calculation of the activity level and the nature of the fusion rule could be established at once by applying the proposed process. The visual consistency of the system has been proved by analyzing outputs on a public benchmark dataset using an object detector. The results of the quantitative evaluation show that the CNN-based fusion method was more effective in terms

International Journal of Advanced Science and Technology Vol. 29, No. 5, (2020), pp. 9441-9446

of noise, distortion and the difference in intensity than manually constructed methods. We believe our method of fusion of pre-registered multi-spectral images is very effective and robust. A Future Works, by implementing the algorithm with parallel computing units, we aim to create new deep neural networks for image fusion and increase the efficacy of the fusion process.

REFERENCES

[1]. Zhang, B.; Lu, X.; Pei, H.; Zhao, Y. A fusion algorithm for infrared and visible images based on saliency analysis and non- subsampled Shearlet transform. Infrared Phys. Technol. 2015, 73, 286–297. [2]. Jin, H.; Wang, Y. A fusion method for visible and infrared images based on contrast pyramid with teaching learning based optimization. Infrared Phys. Technol. 2014, 64, 134–142. [3]. Cui,G.;Feng,H.;Xu,Z.;Li,Q.;Chen,Y.Detailpreservedfusionofvisibleandinfraredimagesusingregional saliency extraction and multi-scale image decomposition. Opt. Commun. 2015, 341, 199–209. [4]. Fan, X.; Shi, P.; Ni, J.; Li, M. A thermal infrared and visible images fusion based approach for multitarget detection under complex environment. Math. Probl. Eng. 2015. [5]. Du, J.; Li, W.; Xiao, B.; Nawaz, Q. Union Laplacian pyramid with multiple features for medical image fusion. Neurocomputing 2016, 194, 326–339. [6]. Toet,A.Amorphologicalpyramidalimagedecomposition. PatternRecognit. Lett. 1989,9,255–261. [7]. Singh, R.; Khare, A. Fusion of multimodal medical images using Daubechies complex wavelet transform c a multiresolution approach. Inf. Fusion 2014, 19, 49–60. [8]. Li, H.; Manjunath, B.; Mitra, S. Multi sensor image fusion using the wavelet transform. Graph. Models Image Process. 1995, 57, 235–245. [9]. Lewis, J.; Callaghan, O.; Nikolov, S.; Bull, D.; Canagarajah, N. Pixel- and region-based image fusion with complex wavelets. Inf. Fusion 2007, 8, 119–130. [10]. Yang, L.; Guo, B.; Ni, W. Multimodality medical image fusion based on multiscale geometric analysis of contourlet transform. Neurocomputing 2008, 72, 203–211. [11]. Zheng, L.; Bhatnagar, G.; Wu, Q. Directive contrast based multimodal medical image fusion in nsct domain. IEEE Trans. Multimedia 2013, 15, 1014–1024. [12]. Wang, L.; Li, B.; Tan, L. Multimodal medical volumetric data fusion using 3-d discrete shearlet transform and global-to-local rule. IEEE Trans. Biomed. Eng. 2014, 61, 197–206. [13]. Yang, B.; Li, S. Pixel-level image fusion with simultaneous orthogonal matching pursuit. Inf. Fusion 2012, 13, 10–19. [14]. Li, S.; Yin, H.; Fang, L. Group-sparse representation with dictionary learning for medical image denoising and fusion. IEEE Trans. Biomed. Eng. 2012, 59, 3450–3459. [15]. Liu, Y.; Wang, Z. Simultaneous image fusion and denosing with adaptive sparse representation. IET Image Process. 2015, 9, 347–357. [16]. Ma,J.;Yu,W.;Liang,P.;Li,C.;Jiang,J.FusionGAN:Agenerativeadversarialnetworkforinfraredandvisible image fusion. Inf. Fusion 2019, 48, 11–26.