Visualize Yolo
Total Page:16
File Type:pdf, Size:1020Kb
MSc Artificial Intelligence Master Thesis Open the black box: Visualize Yolo by Peter Heemskerk 11988797 August 18, 2020 36 EC credits autumn 2019 till summer 2020 Supervisor: Dr. Jan-Mark Geusebroek Assessor: Prof. Dr. Theo Gevers University of Amsterdam Contents 1 Introduction 2 1.1 Open the neural network black box . .2 1.2 Project Background . .2 1.2.1 Wet Cooling Towers . .2 1.2.2 The risk of Legionellosis . .2 1.2.3 The project . .3 2 Related work 4 2.1 Object Detection . .4 2.2 Circle detection using Hough transform . .4 2.3 Convolutional Neural Networks (ConvNet) . .5 2.4 You Only Look Once (Yolo) . .5 2.4.1 Yolo version 3 - bounding box and class prediction . .6 2.4.2 Yolo version 3 - object recognition at different scales . .7 2.4.3 Yolo version 3 - network architecture . .7 2.4.4 Feature Pyramid Networks . .7 2.5 The Black Box Explanation problem . .9 2.6 Network Visualization . 10 3 Approach 10 3.1 Aerial imagery dataset . 10 3.2 Yolo version 3 . 12 3.2.1 Pytorch Implementation . 12 3.2.2 Tuning approach . 12 3.3 Evaluation . 12 3.3.1 Training, test and validation sets . 12 3.3.2 Evaluation Metrics . 12 3.4 Network Visualization . 13 3.4.1 Introduction . 13 3.4.2 Grad-CAM . 13 3.4.3 Feature maps . 14 4 Experiment 15 4.1 Results . 15 4.1.1 Hough Transform prediction illustration . 15 4.1.2 Yolo Prediction illustration . 16 4.1.3 Yolo Tuning . 17 4.1.4 Yolo Validation . 18 4.1.5 DCMR Wet Cooling Tower prediction . 18 4.1.6 Yolo Visualization - Grad-CAM . 18 4.1.7 Yolo Visualization - Feature maps . 21 4.2 Discussion . 25 5 Conclusion 26 6 Acknowledgement 27 7 Attachment 30 1 1 Introduction HIS thesis is based on the project work to automatically detect Wet Cooling Towers on aerial imagery T using a deep neural network. The theme of the thesis is to open the neural network black box. 1.1 Open the neural network black box Neural networks and more specifically deep convolutional networks have shown amazing results in image clas- sification and object detection. Deep neural networks break down a problem like object detection in a millions of pieces and combine them to generate predictions. The human brain does not work that way and therefore we may have problems interpreting the way an algorithm has reached its conclusion. We tend to regard the internal behaviour of the neural network as a black box [22]. Algorithms based on neural networks have gotten an important role in real life decision taking. Medical doc- tors accept computer based advice based on radio or MRI image patterns, autonomous vehicles make continuous critical decisions based on what is captured by video and other sensors. But sometimes the neural network makes prediction errors humans would not make. And there are numerous examples where neural networks trained on real life data tend to have a bias in their decisions. For automated neural networks it is therefore important that we can give insight in the way the network comes to its conclusion. By visualizing what is happening in the neural network we aim to give users of the network some evidence that decisions are made on correct assumptions. This thesis aims to demonstrate that Yolo version 3, a modern deep convolutional network architecture, is capable of the task of Wet Cooling Tower object detection. We aim to optimize results by the use of different imagery types and the use of transfer learning. The results from the Yolo version 3 will be compared with a classical and simpler object detection method, the circle Hough Transform. The reasoning behind the prediction of the Yolo deep neural network is complex and extensive. By visualizing the inner working of the deep convolutional network, evidence is provided that the network decisions are based on the right pieces of information. Opening the neural network blackbox is the aim. This thesis describes Grad-CAM and Feature Maps techniques for visualisation. To our knowledge, the use of these visualisation techniques on a Yolo version 3 architecture is novel. 1.2 Project Background 1.2.1 Wet Cooling Towers Cooling towers have the aim to cool down a large building or part of an industrial process with a sizeable cooling need. With first versions originating in the 19th century around steam engines, from early 20th century two main types of cooling methods have emerged. The Wet Cooling Tower operates on the principle of evaporative cooling and is an open circuit cooler. When liquid is converted into vapor it consumes thermal energy and therefore the temperature of the surrounding air drops. The heat transfer principle is much alike sweating. It differs from a common refrigerator, where the converted vapor is collected in a sealed system and compressed to liquid. Dry Cooling Towers on the other hand are closed circuit towers where the working fluid is separated from ambient air that is cooled using convection. Wet Cooling Towers are more efficient than Dry Cooling Towers given the higher heat transfer of water compared to air [47]. Also hybrid types of cooling systems exist. Both of these methods require air drawing along the point of heat transfer. Well known are the Dutch invented hyperboloid towers [25] that use a natural draft of warm air rising as in a normal chimney. We see these massive towers as part of energy plants. For this report a different and more frequently used type of cooling towers is considered, one that uses a fan to induce a draft mechanically. See figure 1. 1.2.2 The risk of Legionellosis The use of water evaporation in open cooling systems induces the risk of growing Legionella bacteria which may cause Legionellosis disease [28]. Legionellosis symptoms include cough, shortness of breath, fever, muscle pains and headaches. Treatment is done with antibiotics and hospitalisation is often required. Approximately 10% of infected people die. It is known as Legionaires' disease since the first known outbreak was at a convention of the American Legion, US military veterans, in 1976 were over 200 sickened and 34 died. The cause appeared to be Legionella bacteria that live in nature in low concentrations but can grow in man made equipment in a specific environment, including stagnant areas and a temperature between 20 and 45 degrees Celcius. When Legionella containing water is distributed, infection occurs when people breath air containing aerosols, small drops of water with the bacteria. 2 (a) (b) Figure 1: Cooling Towers (a) This power plant's cooling tower is a typical example of natural air draft by a hyperboloid tower. Source: Paharpur. (b) Cooling Tower with a fan for mechanical draft. This type is used for this report. Source: SPX Cooling Technologies Legionella occurs in swimming pools, spa's, showers, but the most common described source of Legionella are cooling towers. A 2003 Legionallosis break-out in Pas-de-Calais, France resulting in 18 deaths was investigated and caused by a cooling tower on a 6 km distance. In Amsterdam, the Netherlands during summer 2006 a large legionella outbreak occured with 29 people sick and 2 deceased. The source of the outbreak appeared to be the wet cooling tower of the Post CS building. Since 2010, Dutch law requires owners of Wet Cooling Towers to understand and reduce the risks of cooling towers [27]. Since 2017 by 'Besluit Omgevingsrecht' the Dutch environmental agencies have the obligation to map Wet Cooling Towers and their operating companies [57]. By estimation there are 4000 Wet Cooling Towers in the Netherlands, but for most towers the exact location or holding company is unknown [59]. 1.2.3 The project This project aims to automatically identify and map Wet Cooling Towers in a specific area using computer vision and machine learning techniques on publicly available aerial imagery. The project is a cooperation of the Utrecht based data science company Ynformed [62] with one of the 29 Dutch environmental agencies, DCMR Milieudienst Rijnmond. See figure 2 for the DCMR working area. Figure 2: DCMR working area is the larger Rotterdam and Rijnmond area. 3 2 Related work 2.1 Object Detection UMANS are well able to quickly detect and identify the object on an image. With the current development H of autonomous cars and robots, there is also the need for fast and accurate algorithms for letting computers identify objects. Already in 1959, Paul Hough wrote: 'Many people have suggested that a modern digital computer should be able to recognize a fairly complex pattern of tracks in a bubble chamber photograph' [43]. He was right. Using automated algorithms, and large amounts of data, object detection has been proven to deliver very successful results. Object detection is the task of classifying and localizing objects on an image or in a video, and is now a core problem in computer vision. Due to large variations in viewpoints, poses, occlusions and lighting conditions image object detection has been difficult to solve. Traditionally, the task of object detection has been divided in the following main subtasks. 1. feature extraction Extracting a set of features from the image is an important step in detection pipelines. In 1972 the Hough Transform method of image line and circle features was proposed [42]. During the 1990's and 2000's for representing local key points in an image several methods have been developed, with the most commonly known HAAR [55] [38], SIFT [13], and HOG [7]. More recently it has become clear that one can also rely on learned features, and that moderately deep unsupervised models outperform the state of the art gradient based features [16], and with the use of back propagating [36] deep convolutional networks could learn features relevant for object recognition [53] 2.