Transforming Thermal Images to Visible Spectrum Images Using Deep Learning
Total Page:16
File Type:pdf, Size:1020Kb
Master of Science Thesis in Electrical Engineering Department of Electrical Engineering, Linköping University, 2018 Transforming Thermal Images to Visible Spectrum Images using Deep Learning Adam Nyberg Master of Science Thesis in Electrical Engineering Transforming Thermal Images to Visible Spectrum Images using Deep Learning Adam Nyberg LiTH-ISY-EX–18/5167–SE Supervisor: Abdelrahman Eldesokey isy, Linköpings universitet David Gustafsson FOI Examiner: Per-Erik Forssén isy, Linköpings universitet Computer Vision Laboratory Department of Electrical Engineering Linköping University SE-581 83 Linköping, Sweden Copyright © 2018 Adam Nyberg Abstract Thermal spectrum cameras are gaining interest in many applications due to their long wavelength which allows them to operate under low light and harsh weather conditions. One disadvantage of thermal cameras is their limited visual inter- pretability for humans, which limits the scope of their applications. In this the- sis, we try to address this problem by investigating the possibility of transforming thermal infrared (TIR) images to perceptually realistic visible spectrum (VIS) im- ages by using Convolutional Neural Networks (CNNs). Existing state-of-the-art colorization CNNs fail to provide the desired output as they were trained to map grayscale VIS images to color VIS images. Instead, we utilize an auto-encoder ar- chitecture to perform cross-spectral transformation between TIR and VIS images. This architecture was shown to quantitatively perform very well on the problem while producing perceptually realistic images. We show that the quantitative differences are insignificant when training this architecture using different color spaces, while there exist clear qualitative differences depending on the choice of color space. Finally, we found that a CNN trained from day time examples generalizes well on tests from night time. iii Acknowledgments First I would like to thank FOI for providing me with the opportunity to do my master thesis at FOI. A big thanks to my supervisors David Gustafsson at FOI and Abdelrahman Eldesokey at Linköping University for supporting me with both technical details and the thesis writing. I would also like to thank my exam- iner who was responsive, supportive and believed in the thesis. An extra thanks goes to Henrik Petersson, David Bergström and Jonas Nordlöf at FOI for being supportive and helping me with the hardware and other issues. A final thanks is addressed to my fellow master thesis students at FOI who helped make the time at FOI wonderful. Linköping, June 2018 Adam Nyberg v Contents 1 Introduction 1 1.1 Motivation . 1 1.2 Problem formulation . 2 1.3 Research questions . 3 1.4 Related work . 3 1.5 Limitations . 5 1.6 Thesis outline . 5 2 Background 7 2.1 Camera calibration . 7 2.2 Image registration . 8 2.3 Image representation . 9 2.4 Artificial Neural Network (ANN) . 10 2.5 Convolutional Neural Network (CNN) . 11 2.5.1 Convolutional layer . 11 2.5.2 Pooling layer . 12 2.5.3 Objective function . 12 2.5.4 Training a CNN . 13 2.5.5 Dropout layer . 13 2.5.6 Batch normalization . 14 3 Method 15 3.1 Data acquisition system . 15 3.1.1 Camera calibration . 15 3.1.2 Image registration . 16 3.1.3 Data collection . 16 3.1.4 Data split . 18 3.2 CNN models . 18 3.2.1 Gray to color model . 18 3.2.2 Thermal to color model . 18 3.3 Pansharpening . 18 4 Experiments 23 vii viii Contents 4.1 Evaluation metrics . 23 4.2 Experimental setup . 24 4.3 Experiments . 24 4.3.1 Pretrained colorization model on TIR images . 24 4.3.2 Pretrained TIR to VIS transformation . 24 4.3.3 Color space comparison . 25 4.3.4 Night time generalization . 25 4.4 Quantitative results . 25 4.5 Qualitative analysis . 25 5 Discussion 33 5.1 Method . 33 5.2 Results . 34 6 Conclusions 37 6.1 Future work . 37 Bibliography 39 1 Introduction Thermal infrared (TIR) cameras are becoming increasingly popular due to their long wavelengths which allow them to work under low-light conditions. TIR cameras require no active illumination as they sense emitted heat from objects. This opens up for many applications such as driving in complete darkness or bad weather. However, one disadvantage of TIR cameras is their limited visual inter- pretability for humans. In this thesis, we investigate the problem of transforming TIR images, specifically long-wavelength infrared (LWIR), to visible spectrum (VIS) images. Figure 1.1 shows an example of a TIR and the corresponding VIS image. The problem of transforming TIR images to VIS images is inherently chal- lenging as they do not contain the same information in the electromagnetic spec- trum. Therefore, one TIR image can correspond to multiple different VIS images, e.g. blue and green balls of the same material and temperature would look almost identical in a LWIR image but very different in a VIS image. 1.1 Motivation Transforming TIR images to VIS images can open up a wide variety of applica- tions, for example enhanced night vision while driving in the dark. Another possible application is detecting objects in the transformed TIR images that are difficult to see in regular TIR images. This thesis was conducted at the Swedish Defence Research Agency (FOI) and is has three main applications for transfor- mation of TIR images: – Enhanced night vision where users should be able to detect and classify targets by looking at the transformed TIR images. – Detection of Improvised Explosive Devices (IED) in natural environments 1 2 1 Introduction (a) A LWIR image. (b) A visible spectrum image. Figure 1.1: A pair of thermal and visible spectrum images from the KAIST- MS dataset. [24] such as fields, forests or along roads. The transformed TIR images can po- tentially improve the user ability to detect IEDs. – Improve user ability to drive at night by increasing the ability to find stones and other obstacles that are hard to see using only VIS. The intended use may be while driving off-road or on roads with little traffic. 1.2 Problem formulation FOI has developed a co-axial imaging system that makes it possible to capture both TIR and VIS images with pixel-to-pixel correspondence. Figure 1.2 shows the principle of this co-axial imaging system. The dichroic filter is a very accu- rate color filter that selectively refracts one range in the electromagnetic spec- trum while reflecting other ranges. This thesis investigates different supervised learning methods to do the transformation from TIR images to VIS images. To achieve this, we investigate the process of geometric calibration and image regis- tration between the thermal and visible spectrum camera so that a pixel-to-pixel correspondence is possible between the image pairs. The robustness of calibra- tion and registration are crucial for the learning process, therefore, we evaluate them using a special calibration board that is visual in both visible and infrared spectrum. Furthermore, we develop a GUI application that can be used when collecting image pairs and transforming a stream of TIR images into VIS images. Finally, since collecting VIS images for training under low-light conditions is not feasible, we investigate the generalization capabilities of the proposed methods by training them on image pairs from daytime and evaluating on images from night time. 1.3 Research questions 3 (a) Top down view of the system. (b) View from behind with system mounted in a car. Figure 1.2: A co-axial imaging system with one LWIR camera and one visible spectrum camera with the same optical axis. The thermal light is being re- flected and the visible light is being passed through a hot mirror, also called dichroic filter. A hot mirror is designed to refract visible wavelengths and reflect the thermal spectrum wavelengths. 1.3 Research questions In this thesis, we investigate the following research questions: 1. How would an existing pretrained CNN model for colorizing grayscale im- ages perform on the task of transforming TIR images to VIS images? 2. Which color space is more suitable for the task of TIR to VIS transforma- tion? 3. Does a model trained on thermal images from daytime generalize well on images from night time? 1.4 Related work Colorizing grayscale images has been extensively investigated in the literature. Scribbles [22] require the user to manually apply strokes of color to different regions of a grayscale image and the algorithm makes the assumption that neigh- boring pixels with the same luminance should be the same color. Transfer tech- niques [33] use the color palette from a reference image and apply that palette to a grayscale image by matching luminance and texture. For example when coloriz- ing a grayscale image of a forest, typically the user would have to provide another image of a forest with an appropriate color palette. Both scribbles and transfer techniques require manual input for colorization. Other colorization methods are based on automatic transformation, i.e. the only input to the method is the 4 1 Introduction Figure 1.3: Illustration of the relative positions of visible spectrum, NIR, short-wavelength infrared (SWIR) and LWIR in the electromagnetic spec- trum. greyscale image without explicit information about colors. Recently, promising results have been published in the area of automatic transformation by using Con- volutional Neural Networks (CNNs) because of CNNs ability to model semantic representation in images [5, 13, 19, 34]. In the infrared spectrum, less research has been done specifically on trans- forming of TIR images to VIS images. Limmer and Lensch [23] proposed a CNN to transform near-infrared (NIR) images into VIS images by training on a dataset that contains images of highway roads and surroundings. Their method was shown to perform well as the NIR and VIS images are highly correlated.