Provided by the author(s) and NUI Galway in accordance with publisher policies. Please cite the published version when available.

Title Real-time detection of pedestrians in night-time conditions using a vehicle mounted camera

Author(s) Hurney, Patrick

Publication Date 2016-09-30

Item record http://hdl.handle.net/10379/6374

Downloaded 2021-09-27T10:18:27Z

Some rights reserved. For more information, please see the item record link above.

Real-time Detection of Pedestrians in Night-time Conditions Using a Vehicle Mounted Infrared Camera

A dissertation presented by

Patrick Joseph Hurney

to Electrical and Electronic Engineering College of Engineering and Informatics National University of Ireland, Galway

in fulfillment of the requirements for the degree of Doctor of Philosophy

in the subject of Electronic Engineering

Supervisors Dr. Martin Glavin, Dr. Edward Jones and Dr. Fearghal Morgan

Research Director Prof. W. G. Hurley

September 2016 c 2016 - Patrick Joseph Hurney

All rights reserved. Abstract Current statistics show that a significant number of road fatalities occur during night-time hours despite a smaller number of vehicles on the road. This number could be significantly reduced with the use of systems that automatically detect Vulnerable Road Users (VRUs) and alert the driver of their presence. This thesis presents an efficient embedded Advanced Driver Assistance System (ADAS) to detect pedestrians with a far-infrared sensor in real-time. The ADAS proposed in this thesis is implemented on a low power Intel Atom Central Processing Unit (CPU) and an Altera Arria GX II Field Programmable Gate Array (FPGA). The CPU and FPGA communicate over a PCI-express 2.0 bus using a Direct Memory Access engine on the FPGA. The pedestrian detection algorithm was partitioned between the FPGA and CPU to ensure real-time performance. Far-infrared images are first acquired with a microbolometer at a rate of 25fps, and a morphological closing operation is applied to remove distortion on the pedestrian’s torso. The frames are sent to the FPGA to be processed using dedicated hardware. Regions of Interest (ROI) that could potentially contain pedestrians are isolated from the background using hardware accelerated Seeded Region Growing (SRG). The isolated ROI are classified using a Support Vector Machine (SVM) on the CPU. Histogram of Oriented Gradient (HOG) features and Local Binary Pattern (LBP) features are extracted from the ROI. These features are then concatenated to form a HOG-LBP feature vector and are passed to a classifier that determines if the ROI contains a pedestrian or non-pedestrian object. Successfully classified pedestrians are tracked between frames using a Kalman filter. The system runs in real-time at a rate of 25fps, the frame rate of the micro bolometer. The hardware accelerated SRG method obtained a 97.93% reduction in execution time compared to the software implementation than on the CPU alone. Detection rates of 98% have been achieved with HOG-LBP features on a database of 2,000 individual frames and video containing 15,000 frames. The total power consumption of the system is 3.12W. This results in a highly accurate, low power, low cost system suitable for the automotive sector.

iii Declaration of Originality iv

I hereby declare that the work contained in this thesis has not been submitted by me in pursuance of any other degree

Signed:

Date: Contents

1 Introduction 1 1.1 Motivation and Background ...... 1 1.2 Contributions of this Thesis ...... 4 1.3 Chapter by Chapter Overview ...... 5

2 Literature Review 7 2.1 Introduction ...... 7 2.2 Infrared Technology ...... 8 2.2.1 Infrared Radiation ...... 8 2.2.2 Infrared Sensors ...... 9 2.3 Overview of Pedestrian Detection Systems ...... 11 2.4 Region of Interest Isolation ...... 15 2.5 Region of Interest Classification ...... 20 2.6 Pedestrian Tracking ...... 23 2.7 Summary and Conclusions ...... 26

3 Embedded System Architecture 28 3.1 Introduction ...... 28 3.2 Processing Platforms ...... 28 3.2.1 Central Processing Unit (CPU) ...... 29 3.2.2 Digital Signal Processor (DSP) ...... 29 3.2.3 Graphics Processing Units (GPU) ...... 30 3.2.4 Field Programmable Gate Arrays (FPGA) ...... 31 3.3 Embedded ADAS ...... 32 3.3.1 CPU ...... 32 3.3.2 FPGA ...... 33 3.3.3 DMA Engine Device Driver ...... 34 3.4 System Testing ...... 34 3.5 Summary and Conclusion ...... 36

v Table of Contents vi

4 Region of Interest Isolation 38 4.1 Introduction ...... 38 4.2 Clothing Distortion Compensation ...... 39 4.3 Seeded Region Growing ...... 40 4.3.1 Binary Thresholding ...... 41 4.3.2 Connected Component Labelling ...... 41 4.3.3 Duplicate Removal ...... 46 4.4 Static Threshold vs Seeded Region Growing ...... 49 4.5 Software Execution Time ...... 50 4.6 Seeded Region Growing Hardware Design ...... 52 4.6.0.1 Two Pass CCL ...... 53 4.6.0.2 Multiple Pass CCL ...... 55 4.6.0.3 Parallel CCL ...... 56 4.6.0.4 Contour Tracing CCL ...... 56 4.6.0.5 Single Pass CCL ...... 58 4.6.1 Algorithm Timing Analysis ...... 62 4.7 SRG Hardware Architecture ...... 64 4.7.1 Seeded Region Growing Finite State Machine ...... 64 4.7.2 Binary Thresholding ...... 64 4.7.3 Connected Component Labelling ...... 66 4.7.3.1 Label Assignment ...... 66 4.7.3.2 Row Length Buffer ...... 69 4.7.3.3 Region Parameter Storage ...... 71 4.7.4 Region Filtering ...... 73 4.7.4.1 Aspect Ratio ...... 73 4.7.4.2 Location Based Filtering ...... 74 4.7.5 Duplicate Removal ...... 75 4.8 Resource Utilisation ...... 78 4.9 Hardware Accelerated SRG Execution Times ...... 78 4.10 Hardware Accelerated SRG Results ...... 79 4.11 Summary and Conclusion ...... 79

5 Region of Interest Classification 81 5.1 Introduction ...... 81 5.2 Region of Interest Feature Extraction ...... 82 5.2.1 Histogram of Oriented Gradients ...... 82 5.2.2 Local Binary Patterns ...... 83 5.2.3 Computation of LBP Feature Vector ...... 85 5.2.4 HOG-LBP Features ...... 87 5.2.5 Training Data ...... 88 5.2.6 Support Vector Machine Classifier ...... 89 5.3 Pedestrian Tracking ...... 89 Table of Contents vii

5.4 Classification Results ...... 92 5.4.1 Performance Metrics ...... 93 5.4.2 Classifier Performance ...... 94 5.4.3 Classifier Performance on FIR Video ...... 98 5.5 Execution Time ...... 101 5.6 Summary and Conclusion ...... 101

6 Conclusions and Future Work 104 6.1 Project Summary and Conclusions ...... 104 6.2 Primary Contributions ...... 107 6.3 Suggestions for Future Work ...... 108

Bibliography 110

A Seeded Region Growing Documentation 130 A.1 Introduction ...... 130 A.2 Binary Thresholding ...... 133 A.2.1 Data Dictionary ...... 133 A.3 Connected Component Labelling ...... 135 A.3.1 Label Assignment ...... 136 A.3.1.1 Data Dictionary ...... 136 A.3.2 Row Length Buffer ...... 139 A.3.2.1 Data Dictionary ...... 139 A.3.3 Region Parameter Storage ...... 141 A.3.3.1 Data Dictionary ...... 141 A.4 Duplicate Removal ...... 144 A.4.1 Data Dictionary ...... 144

B DMA Engine Overview 149 B.1 Linux device driver overview ...... 149 B.2 DMA Engine Parameters ...... 150 B.3 Transfer Procedure ...... 151 B.3.1 Initialise DMA engine ...... 151 B.3.2 Write Data to FPGA from Host ...... 152 B.3.3 Read Data from FPGA to Host ...... 152 B.3.4 Close DMA Engine ...... 152 B.4 DMA Descriptor Table ...... 153 B.5 DMA Engine Testing ...... 153

C Publications 154 C.1 Journals ...... 154 C.1.1 Published ...... 154 C.1.2 In Submission ...... 155 Table of Contents viii

C.2 Conferences ...... 155 C.2.1 Published ...... 155 List of Figures

2.1 The electromagnetic spectrum ...... 8 2.2 Flowchart displaying the stages in a typical FIR pedestrian detection algorithm ...... 13 2.3 Images depicting various stages of FIR pedestrian detection algorithm 16 2.4 Static binary threshold operation performed on FIR image ...... 18 2.5 Examples of various templates used in probabilistic template matching. 21

3.1 Development prototype for the embedded pedestrian detection ADAS. The Intel N270 CPU is connected to the Altera Arria GX II FPGA over a PCI-express lane...... 33 3.2 Linux kernel space ...... 35

4.1 Clothing distortion compensation ...... 40 4.2 Binary Thresholding process ...... 41 4.3 Connected Component Labelling process ...... 42 4.4 Forward and backward scanning masks ...... 43 4.5 CCL 4-way and 8-way masks applied to image ...... 44 4.6 Connected Component Labelling applied to binary representation of FIR frame ...... 45 4.7 Duplicate Removal process ...... 47 4.8 Flowchart of stages in Seeded Region Growing algorithm ...... 48 4.9 Results of different connected component labelling windows ...... 51 4.10 Top level view of the proposed Seeded Region Growing hardware ar- chitecture...... 52 4.11 Two pass labelling method ...... 54 4.12 Stages in contour tracing labelling ...... 57 4.13 Stages in single pass CCL algorithm ...... 60 4.14 Seeded region growing architecture ...... 65 4.15 Seeded region growing architecture ...... 66 4.16 Seeded region growing architecture ...... 68 4.17 Example of a label merger scenario ...... 69 4.18 Functional partition of Row Length Buffer ...... 70

ix List of Figures x

4.19 Example merger tables ...... 71 4.20 Functional partition of Region Parameter Storage stage of SRG design 72 4.21 Various stages in distance estimation filtering ...... 75 4.22 Functional partition of Duplicate Removal stage ...... 76 4.23 Bounding box comparator of Duplicate Removal stage ...... 77

5.1 Images from the various stages of generating a Histogram of Oriented Gradients (HOG) feature vector ...... 82 5.2 Examples of structuring elements used in Local Binary Pattern (LBP) feature vector generation ...... 84 5.3 Images from the various stages of generating a LBP feature vector . . 85 5.4 Examples of uniform and non-uniform local binary patterns . . . . . 87 5.5 Examples of pedestrians and non-pedestrians found in the FIR database 88 5.6 ROC curves for a selection of feature vectors ...... 96 5.7 Sample frames of output of SVM classifier trained with HOG, LBP and HOG-LBP feature vectors ...... 99 5.8 Expanded output of SVM classifier trained with HOG-LBP feature vectors ...... 100

A.1 Top level block diagram of FPGA Seeded Region Growing design . . 131 A.2 Top level block diagram of FPGA Seeded Region Growing design . . 132 A.3 Functional partition of binary thresholding architecture ...... 133 A.4 Functional partition of the Label Assignment portion of the CCL design138 A.5 Functional partition of Row Length Buffer stage ...... 140 A.6 Functional partition of Region Parameter stage ...... 143 A.7 Functional partition of Duplicate Removal architecture ...... 146 A.8 Functional partition of Duplicate Removal architecture ...... 148 List of Tables

2.1 Comparison of NIR and FIR sensors for night-time pedestrian detection. 11 2.2 Summary of pedestrian detection methods used in FIR spectrum. . . 27

3.1 Sample descriptor table used in DMA data transfer...... 36

4.1 Percentage of successfully isolated pedestrians in FIR video using static thresholding and seeded region growing...... 50 4.2 Execution times for Clothing Distortion Compensation and Seeded Re- gion Growing software implementations on the Intel Atom CPU. Times are displayed in milliseconds (ms)...... 50 4.3 Two pass labelling merger table...... 55 4.4 FPGA based CCL performance comparison...... 61 4.5 Labelling configurations for seeded region growing and associated exe- cution times. All times are displayed in milliseconds (ms)...... 63 4.6 Resources utilised by the hardware accelerated Seeded Region Growing implementation...... 78 4.7 Execution times of Clothing Distortion Compensation and hardware accelerated Seeded Region Growing...... 79

5.1 Parameters used in HOG feature vector generation ...... 84 5.2 Parameters used in LBP feature vector generation ...... 86 5.3 Confusion matrix displaying detection rates for multiple LBP struc- turing elements...... 95 5.4 Confusion matrix for SVM pedestrian classifier trained with various feature vectors on a database of FIR imagery...... 97 5.5 Results of pedestrian classification on streams of FIR footage. . . . . 98 5.6 Execution times of ROI classification and pedestrian tracking stages on the CPU...... 101 5.7 Summary of pedestrian detection methods used in FIR spectrum . . . 103

A.1 Data dictionary for binary thresholding architecture ...... 134 A.2 Label Assignment data dictionary ...... 137 A.3 Row Length Buffer data dictionary ...... 139

xi List of Tables xii

A.4 Region Parameter Storage data dictionary ...... 142 A.5 Duplicate Removal data dictionary ...... 145 A.6 Bounding Box Comparator data dictionary ...... 147

B.1 PCIe System Settings ...... 150 B.2 PCIe Address Registers ...... 150 B.3 PCIe Read Only Registers ...... 151 List of Abbreviations

ADAS Advanced Driver Assistance System CCL Connected Component Labelling CPU Central Processing Unit DSP Digital Signal Processor DR Duplicate Removal FIR Far-Infrared FPGA Field Programmable Gate Array FPS Frames Per Second FSM Finite State Machine GPU Graphical Processing Unit HDL Hardware Description Language HOG Histogram of Oriented Gradients IPP Integrated Performance Primitives IR Infrared LBP Local Binary Patterns NIR Near-Infrared RBF Radial Basis Function ROI Region Of Interest ROC Receiver Operating Characteristics SVM Support Vector Machine VRU Vulnerable Road User

xiii Sponsor Acknowledgments xiv

This research was funded by the Irish Research Council and Intel Shannon under the Enterprise Partnership Scheme. Acknowledgments

Completing this doctorate has been the most challenging experience of my life so far. I have developed new skills, expanded upon old ones, and made many personal discoveries. There are many people who I need to thank for helping me throughout this work. I would like to thank my mother and father, Geraldine and Denis. You have always supported me and and shown interest in everything I have pursued in my life, I will never be able to thank you enough. My thanks also to my brothers Damien and Donnacha for their support. Thanks to all of the Hurney, O’Kane, Hession and Forde families. A very special thanks to Julia Forde, who thought me at a young age to value my education and helped me achieve what I am truly capable of. I wish to thank my supervisors Dr. Edward Jones, Dr. Fearghal Morgan, and Dr. Martin Glavin. They are without a doubt the greatest supervisors a student could ever ask for. They were always there to guide me through the best and worst times I experienced throughout this research, and encouraged me at every opportunity possible. I also wish to thank the other staff of Electrical and Electronic Engineering in NUI Galway in particular, Mary Costello, Martin Burke, Myles Meehan, Paddy Melia, Pat McGrath, Liam Kilmartin, Prof. Gearoid O’Laighin and Prof. Ger Hurley. I would like to thank Peter Waldron of Intel Shannon, who acted as industrial liai- son for this project. His knowledge and continued advice aided me greatly throughout this work. I would also like to acknowledge the Irish Research Council and Intel for their financial support. A special thanks to my friends Dr. Ronan O’Malley, Dr. Dallan Byrne, Kevin Burke, Martin Gallagher, and all of the residents of 89 Wellpark Grove. You all helped me through very difficult times over the course of this work, and without you none of this research would have been possible. Thanks to all of the past and present members of the Connaught Automotive Re- search Group, Dr. Ciaran Hughes, Dr. Diarmuid O’Cualain, Dr. Robert McFeely, Dr.

xv Acknowledgments xvi

Shane Tuohy, Dr. Anthony Winterlich, Damien Dooley and Dr. Brian McGinley. I would also like to thank my other friends in Electrical and Electronic Engineering, Dr. Shane Lowe, Dr. Seamus Cawley, Dr. Sandeep Pande, Dr. Martin O’Halloran, Dr. Garry Higgins, Dr. Sean Finn, Dr. Racquel Conceiaco, Dr. Ciaran Feeney, Darren Craven, Darragh Mullins, Atif Shahzad, Adnan Elahi, B´arbaraOliveira, Grazia Cap- piello, Declan O’Loughlin, Richie Harte, Ana Cimpian, Dean Sweeney, David Newell, Tireoin McCabe, Kevin Farrell, and John Maher. The time spent in the canteen in both Nuns Island and the Engineering building around the foosball table with all of you made this work that much easier.

Dad, I wish you were still here to see this, I know you would have been proud of what I have accomplished. Dedicated to my father Denis Hurney, no more cattle, no more bales...

Grant me O Lord, a hurler’s skill, With strength of arm and speed of limb Unerring eye for the flying ball And courage to match whate’er befall May my stroke be steady and my aim be true My actions manly and my misses few No matter what way the game may go May I rest in friendship with every foe When the final whistle for me has blown And I stand at last before God’s judgement throne May the great referee when he calls my name Say, you hurled like a man; you played the game.

xvii Chapter 1

Introduction

1.1 Motivation and Background

Road fatalities are among the top 10 leading cause of death in many countries throughout the world [1]. Approximately 1.24 million fatalities a year are attributed to road accidents, with half of these deaths being Vulnerable Road Users (VRU) [2]. The OECD defines VRU as those unprotected by an outside shield, namely pedestri- ans and two-wheelers. Pedestrians are twice as likely to be involved in a fatal accident than vehicle occupants in the UK [3]. Consumers are becoming increasingly safety conscious and automotive manufacturers also recognise that safety is commonly a high priority when purchasing a new vehicle. In recent years, automotive manufacturers have begun to focus on the protection of VRU. One area that will potentially reduce the number of road fatalities is the adoption of Advanced Driver Assistance Systems (ADAS) [4–7] to assist the driver in making decisions that avert possibly dangerous situations. Some examples of current ADAS are vehicle detection [8–10], lane departure [11–13] and pedestrian detection [14,15]. The number of deaths on Irish Roads in 2015 was 166, 32 of whom were pedestrians [16]. In 2014 the total number of pedestrian deaths in the European Union (EU) was approximately 7,600 [17] making up 29% of all road deaths across the EU; over half

1 Chapter 1: Introduction 2 of these deaths occurred at night where visibility was a contributing factor. The National Highway Traffic Safety Authority (NHTSA) in the USA claim that 70% of pedestrian fatalities in 2012 occurred during night-time hours (6pm to 5:59am) [18]. In Japan, 55% of road fatalities occur during hours of low-visibility [19]. These figures are of particular interest as they show that a disproportionate num- ber of accidents occur at night, even though only approximately 28% of traffic occurs during the hours of darkness [20]. It has been estimated that per vehicle mile, the fatality rate is 3 to 4 times higher in darkness than in daylight conditions [21]. The EU has laid out plans to halve the number of deaths on roads by 2020 by imple- menting a number of strategies [22] such as better road infrastructure, increased law enforcement, and more education focused on road safety. A particular area of inter- est highlighted in the report is the adoption of ADAS that can automatically detect other road users, and warn the driver of potential hazards to avert potentially fatal collisions. Current trends within the automotive industry indicate that cameras will provide the best means for reducing the number of collisions on roads in comparison with other technologies such as Radar and [23]. A report by the NHTSA in 2006 stated that “no efficient sensing-based solution exists for preventing many accidents, but camera-based systems may have the greatest potential to provide drivers with re- liable assistance at identifying people in the path of the vehicle” [24]. A number of automotive manufacturers have recently begun to include cameras that operate in the visible spectrum on their vehicles for obstacle detection, such as parking assistance systems in premium Audi [25], Volkswagen [26], Toyota [27] and Ford models [28]. A number of important considerations should be taken into account when design- ing an ADAS suitable for vision based applications in a vehicle. The system should consume low power, have a high degree of accuracy, and perform in real-time [29]. Visible spectrum cameras are limited to conditions where there is a source of illumination (sunlight, artificial lighting) and as a result they function poorly at Chapter 1: Introduction 3 detecting pedestrians in low light conditions. In night-time conditions a vehicle’s headlights are used to illuminate the scene but full headlights are not always suitable as they can temporarily blind other road users, dimmed headlights can be used but visibility is reduced. To address this shortcoming, Infrared (IR) sensors that rely solely on heat signatures in the environment to form an image can be used. Numerous studies have demonstrated the advantage of IR based night-vision systems in assisting drivers [30–34]. Tsimhoni et al. claim that an IR night-vision system allows drivers to see pedestrians at distances of up to 150m from the vehicle [30]. The average stopping distance of a mid-sized vehicle travelling at a speed of 50km/h is approximately 23m [35]. This figure includes a reaction distance of 9m, equivalent to a time of approximately 0.75 seconds [35]. Reaction times can vary be- tween 0.75 and 1.5 seconds depending on the driver [35]. A vehicle stopping distance of 14m is also included; assuming that the are in good working order and the road surface is dry. Driver reaction time can be further increased due to factors such as low luminance [36], fatigue [37], alcohol [37], age [38] and glare from oncoming headlights [39] leading to a longer overall stopping distance. In night-time conditions on rural roads where street lighting is largely absent, the driver’s view is generally restricted to the path illuminated by the vehicle’s headlights, as a result of this the driver may only notice and begin to react to a pedestrian when they are in the path of the vehicle and it may be too late to avoid a collision. A pedestrian detection ADAS that relays data in real-time can give the driver more time to react to the situation and avoid a potential collision. The authors in [40] state that a minimum frame rate of 15 frames per second (fps) is required for humans to be satisfied with the quality of video playback, so the ADAS should be able to display the data to the driver at this speed or higher; a lower frame rate is not fluid and can be distracting to the driver. Methods of presenting the data to the driver without causing distraction are currently being investigated by a number of research groups and automotive manu- facturers, some methods include illumination of the windscreen (Heads Up Display) Chapter 1: Introduction 4 to identify where the pedestrian is located [41] and haptic feedback e.g. vibrating the [42]. The embedded ADAS described in this thesis obtains a detection rate of 98% and false positive rate of 1% when tested on a database of 2,000 still FIR frames and video containing 15,000 frames. The pedestrian detection ADAS hardware consists of an x86 based Intel Atom CPU and an Altera Arria GX II FPGA and performs in real-time at a rate of 25fps while consuming 3.12W of power.

1.2 Contributions of this Thesis

The objective of this thesis is the development of a low power, low cost embedded system to automatically detect pedestrians in Far-Infrared (FIR) video in real-time for automotive applications. More specifically, the contributions of this work are:

1. A review of state-of-the-art in FIR pedestrian detection systems is provided.

2. A CPU-FPGA architecture for an embedded system to perform vision-based ADAS tasks in real-time. The system contains an Intel Atom N270 micro- processor and an Altera Arria GX II FPGA. A device driver was developed to facilitate communication between the devices over a PCI-express connection using the Direct Memory Access (DMA) engine on the FPGA.

3. A seeded region growing architecture that isolates ROI from the background in FIR video using dedicated FPGA hardware. The seeded region growing method is robust to variations in the environment, and out-performs previous methods that utilised static binary threshold values to isolate pedestrians from background clutter.

4. A location-based filtering method that refines the ROI selection process, result- ing in fewer ROI requiring classification without lowering the detection rate of the system. Chapter 1: Introduction 5

5. A classification algorithm for pedestrian detection in FIR images utilising both Histograms of Oriented Gradient (HOG) vectors and Localised Binary Pattern (LBP) vectors for detection. HOG and LBP features of isolated ROI are cal- culated, and are concatenated to form a HOG-LBP feature vector for SVM training purposes.

1.3 Chapter by Chapter Overview

This thesis consists of 6 chapters, including this introductory chapter. The con- tents of the document are as follows: A detailed literature review of existing FIR pedestrian detection algorithms is provided in Chapter 2. Chapter 3 presents a review of technologies suitable for developing an embedded ADAS for pedestrian detection, including a detailed description of the embedded system architecture used in the ADAS in this work. The communication protocols between the CPU and FPGA to transfer FIR video are also described. A comparison is made between two ROI isolation methods in Chapter 4. A ROI isolation architecture to execute seeded region growing in real-time using dedicated hardware is then presented. Seeded region growing is used to isolate ROI that may contain a pedestrian, with the co-ordinates of each isolated ROI sent to the CPU where they are used to extract the ROI for classification as either a pedestrian or non-pedestrian. Classification of the ROI isolated by the seeded region growing method is described in Chapter 5. HOG and LBP features are calculated for each ROI, and concatenated to form a feature vector for classification by a SVM classifier. A comparison is made between the HOG, LBP, and HOG-LBP features performance on streams of FIR video containing multiple pedestrians. Finally, a summary of the work in this thesis is given in Chapter 6, conclusions Chapter 1: Introduction 6 are drawn, and future work that can be carried out in the area of real-time FIR pedestrian detection is proposed. Chapter 2

Literature Review

2.1 Introduction

This chapter gives an in-depth description of state of the art in night-time pedes- trian detection systems. Firstly, an introduction to the IR portion of the electro- magnetic spectrum is provided along with a description of sensors used to detect the IR wavelengths emitted by pedestrians in the IR spectrum. Next, an overview of pedestrian detection systems and the challenges associated with these systems is discussed. A number of existing techniques used to isolate and classify Regions of Interest (ROI) as pedestrian or non-pedestrian objects are discussed, and the use of tracking in streams of FIR video is then examined. Finally, the chapter concludes with a table summarising the methods of the night-time pedestrian detection algorithms discussed in this chapter.

7 Chapter 2. Literature Review 8

2.2 Infrared Technology

This section describes the IR wavelength of the electromagnetic spectrum. An overview of sensors used to detect specific wavelengths in the IR spectrum is also provided.

2.2.1 Infrared Radiation

Figure 2.1: The electromagnetic spectrum. The IR portion of the spectrum is located between the microwave and visible spectra.

The spectral wavelength referred to as the IR spectrum was discovered in 1800 by Sir William Herschel. Infrared is the portion of the electromagnetic spectrum with a wavelength between 0.75-1000nm, between the visible and microwave spectra, as shown in Figure 2.1. Any object with a temperature above absolute zero (-273.5◦C) radiates energy in the IR spectrum. The IR spectrum is broken up into 3 sub spectra: namely the Near-Infrared (NIR), Medium-Infrared (MIR) and Far-Infrared (FIR) spectra. The NIR and FIR spectra Chapter 2. Literature Review 9

are commonly used for pedestrian detection in night time conditions. A comparison of the NIR and FIR spectra in the context of pedestrian detection is provided in the next section.

2.2.2 Infrared Sensors

Technology to form pictures from IR sources, called thermographs, was developed around 1840, followed by the invention of the bolometer in 1880 by the American astronomer Samuel Pierpont Langley. The bolometer was composed of two blackened strips of platinum forming two branches of a Wheatstone Bridge. The device was said to have been able to detect the heat from a cow situated 400 meters from the sensor. Modern-day microbolometers use semiconductors for increased image resolution and contrast [43]. The first applications developed to take advantage of IR radiation were developed for the Korean War (1950-1953) by the American military. These IR applications were developed to allow infantry to see opposing forces in night-time conditions based on their heat signature. The first commercial IR products were introduced in 1983 and opened up a large area of new applications in agriculture [44,45], housing inspection [46,47], and human detection [14,48–52]. There are currently two types of IR sensor employed in the detection of pedestrians in dark conditions, Near-Infrared and Far-Infrared: Near Infrared sensors are an active form of IR sensor that capture wavelengths in the range of 0.15-1.4µm. The sensor is usually stored in a sealed, cryogenically cooled casing to keep the sensor at a temperature of between 4K - 110K (-269.15◦C to -163.15◦C). The cooling allows for a significant contrast between the sensor and the objects in the field of view of the sensor. To form an image the scene must be illuminated with IR energy. Without adequate cooling, the sensor can be affected by its own heat, leading to distortion of captured frames, potentially reducing the effectiveness of a pedestrian detection system. Chapter 2. Literature Review 10

Far Infrared sensors are a passive form of IR sensor that capture wavelengths in the region of 6-15µm (FIR is also referred to as Thermal Infrared in some works). They have a number of advantages over active NIR sensors, such as:

• Initial setup is far simpler as data can be captured immediately after the sensor is powered on.

• FIR sensors are smaller than NIR sensors and more portable as a result.

• There is minimal danger of interference with other ADAS present in the vehicle.

• There are no active components, therefore no cooling is required as they operate at ambient temperatures.

• No IR illumination of the environment is required; as is the case with NIR sensors.

A microbolometer consists of a pixel matrix of IR absorbing vanadium oxide (VOx) or amorphous silicon (a-Si) sitting on a silicon substrate [53]. Readout circuitry measures a change in resistance proportional to the amount of FIR energy the sensor is exposed to. This resistance is converted into temperatures that can be represented graphically. Thermal IR energy from a human peaks in the 8-14µm range, therefore a FIR sen- sor is more suitable for pedestrian detection in the IR spectrum. According to [20,54], FIR yields better response times from drivers than NIR due to the less cluttered frames generated by FIR sensors. The feasibility of FIR sensors in automotive ap- plications has increased in recent years due to decreasing sensor costs, improved resolution, and increased frame rates. A comparison of the strengths and weakness of each sensor are shown in Table 2.1. Chapter 2. Literature Review 11

Table 2.1: Comparison of NIR and FIR sensors for night-time pedestrian detection.

NIR FIR

Strength High image resolution Superior detection range

Emphasises pedestrians in frame

Images contain less visual clutter

Weakness Sensitive to light from on- Lower contrast between coming vehicles objects of similar temper- atures Requires the scene to be illuminated with IR en- ergy

2.3 Overview of Pedestrian Detection Systems

Pedestrian detection systems in the visible and IR spectra generally follow a set approach to detect pedestrians within the environment. These steps can be sum- marised as follows:

• Preprocessing is performed to remove noise in the image or to perform other operations such as contrast enhancement.

• A search is performed within the current frame to locate ROI that may poten- tially contain a pedestrian, this step is commonly referred to as ROI generation.

• The isolated regions are classified as either a pedestrian or non-pedestrian region with a suitable classifier such as a Support Vector Machine (SVM) or Neural Chapter 2. Literature Review 12

Network (NN).

• If the system operates on video, the successfully classified pedestrians are tracked to improve system robustness and extrapolate the pedestrian’s location if the classification stage temporarily fails.

A flow chart of the stages in a typical pedestrian detection algorithm is shown in Figure 2.2. Chapter 2. Literature Review 13

Figure 2.2: Flowchart displaying the stages in a typical FIR pedestrian detection algorithm. Chapter 2. Literature Review 14

There are numerous challenges associated with pedestrian detection in FIR video, these challenges can be summarised as follows:

• Pedestrians are unpredictable; they can change speed or direction without warn- ing.

• The appearance or shape of pedestrians can vary greatly in automotive FIR video frames. Factors that can influence a pedestrian’s appearance include clothing, ambient temperature, variations in pose, and the pedestrian’s height and weight. The appearance of pedestrians can also be distorted by objects in the environment e.g. vehicles, street lamps, buildings, and railings.

• From the perspective of a moving vehicle, both the camera and pedestrians are in relative motion making the scene highly dynamic. As a result of this, pedes- trian detection algorithms must be capable of dealing with significant variation in the environment, particularly background changes.

• In a typical road scene, there may be a number of objects besides pedestrians that appear bright in a FIR image, such as vehicle exhausts and street lamps. These objects can cause false positives during the classification stage.

• The computational resources required to implement a pedestrian detection sys- tem can be very demanding. In order to effectively deploy a pedestrian detection system in an embedded ADAS, the algorithm must perform reliably in real-time, and the power consumption kept to a minimum.

• The resolution of FIR cameras currently available in the market is quite low in comparison with visible spectrum cameras, primarily due to the physical con- straints of the microbolometer used to capture FIR wavelengths. Low resolution can make the detection of pedestrians at a distance from the sensor difficult as very little textural information is available in the resultant image. Increased Chapter 2. Literature Review 15

resolution requires a larger lens to allow more thermal radiation to reach the sensor. The low transmittance percentage for thermal radiation of glass makes it unsuitable for the lens of an IR sensor so a different material must be used. Germanium (Ge) is commonly used in lenses for IR cameras but its cost is far higher than glass, leading to a significant increase in sensor cost.

Detection of pedestrians in FIR images requires a number of processing stages, the first of which commonly involves isolating ROI from the current FIR frame. This generally involves traversing the image and extracting regions that might potentially contain a pedestrian. After the image has been traversed and ROI have been iso- lated, they are passed to a classifier that determines if the ROI is either a pedestrian or non-pedestrian object. The aim of the classification stage is to maximise the detection rate while keeping the false positive and false negative rates low. Some detection algorithms also utilise tracking in order to increase system robustness by tracking pedestrians between frames, and using this data to estimate the location of a pedestrian in a frame if classification temporarily fails. A sequence of images displaying the stages involved in a typical FIR pedestrian detection algorithm are shown in Figure 2.3. The isolated ROI are highlighted in yellow in Figure 2.3(b). Regions of interest are then classified as either “pedestrian” or “non-pedestrian” objects. Bounding boxes containing pedestrians are highlighted in blue and non-pedestrian objects are discarded in Figure 2.3(c). The estimated path of each pedestrian is then determined based on the tracked trajectory over previous frames, this is denoted by a red arrow in Figure 2.3(d).

2.4 Region of Interest Isolation

Accurate isolation of ROI in a FIR image is an important task in pedestrian detection. It involves scanning the image for regions that might potentially contain a pedestrian, and marking those regions for classification. Background subtraction has Chapter 2. Literature Review 16

(a) (b)

(c) (d)

Figure 2.3: FIR image containing pedestrians at a range of distances from the FIR sensor. The original unprocessed frame is shown in 2.3(a). Isolated ROI are sur- rounded by a yellow bounding box in 2.3(b). The ROI are then classified in 2.3(c), pedestrians are highlighted blue while non-pedestrian objects are discarded. Tracking is performed on each successfully classified pedestrian in Figure 2.3(d) estimating the position of the pedestrian based on information from previous frames (indicated by the direction of the red arrows). been used to assist this process in some applications [55, 56] but the dynamic, often chaotic nature of the environment imaged by a vehicle-mounted camera prevents this method from being effective. Numerous methods of isolating ROI have been explored in the literature and are discussed in this section. In an automotive FIR image, pedestrians generally appear brighter than the sur- rounding environment due to their body temperature being higher than the ambient temperature of the environment [48, 57, 58]. As a result of this contrast, a binary Chapter 2. Literature Review 17

threshold can be used as a starting point to isolate pedestrians in an FIR frame. Static binary thresholding was examined in [59] where the threshold value was based on the difference between the maximum pixel intensity found in the FIR frame and a fixed constant. The disadvantage of a static threshold is that a threshold too low can cause background objects to merge with the pedestrian silhouette. A threshold that is too high can fragment the appearance of a pedestrian’s body (usually the head and legs) leading to the problem of having to accurately group regions together to isolate a pedestrian. An example of a pedestrian being split into separate regions after a high intensity threshold can be seen in Figure 2.4. The head, torso, and legs appear as separate white regions; the authors in [14] refer to these as “Body- Ground-Candidates”. Active contours are used in [60,61] after binary thresholding to group the fragmented regions into suitable ROI. A morphological closing operation is performed prior to thresholding to remove the darker region present in the pedestrians mid section in [48]; the authors refer to this as “Clothing Distortion” as it is generally caused by insulating clothing worn by a pedestrian, and makes detection challenging. This prevents the pedestrian silhouette in the binary image from fragmenting into discrete sections. The authors claim that a vertically orientated kernel is preferable to a circular kernel as it prevents merging of bright regions situated close to one another horizontally. However, this method is only suitable for the removal of clothing distortion from upright pedestrians, as it cannot remove distortion from pedestrians in a sitting or other non-upright position. Dynamic binary thresholding is more robust to fluctuations in greyscale inten- sities between regions in the current frame, that can be caused by fluctuations in environmental conditions or the appearance/disappearance of bright objects such as pedestrians and street lamps. Shifts in background intensity can be particularly evi- dent when transitioning from urban areas to rural areas. A dynamic threshold value is derived in [14] based on the mean and maximum intensities encountered within the FIR frame, whereas the dynamic threshold value, as computed in [62], is based Chapter 2. Literature Review 18

Figure 2.4: Static binary threshold operation performed on FIR image (T =0.9). Greyscale intensities above the threshold value appear white (warm regions) while the remaining values appear black (cold regions). on the proportion of the area under the smoothed image histogram. Static and dy- namic binary thresholding methods often fail to address the potential differences in appearance between individual pedestrians in the same frame. A Seeded Region Growing (SRG) approach is used in [48,63] to isolate ROI from the background. Seeded region growing first isolates “seeds” from the FIR image with a high intensity binary threshold. A series of progressively lower thresholds are then applied to the original image while monitoring properties of each of the previously identified seeds at each iteration (such as the seed’s height and width). This method of ROI segmentation has demonstrated robustness to fluctuations in ambient temperatures and variations in pedestrian poses between successive frames [48]. The main weakness of SRG is that if two pedestrians are situated in close proximity to one another, the regions may merge when grown, causing the group to be dismissed by subsequent processing stages. A large amount of data is generated by SRG and can be difficult to execute in real-time on traditional CPU hardware without hardware acceleration. Head detection is performed in [64] using the P-tile method to detect pedestrians Chapter 2. Literature Review 19 in FIR images. This method is based on the assumption that the desired object occupies a fixed proportion of the image, which can be problematic in the automotive environment where pedestrians appear in a variety of sizes. The threshold value is automatically varied until the desired fraction of image pixels are above this threshold value, which is effective in [64] because it deals with a stationary camera. This algorithm can estimate the size of the pedestrian’s head by estimating how far away the pedestrian will be in the image, and only allows for one pedestrian per image. Head detection may not be suitable as the primary means of detection of pedestrians in an on- application, however it would be useful for the detection of pedestrian groups (notoriously difficult to detect because of merging and occlusion, leading to loss of aspect-ratio, which is assumed by most algorithms). The authors in [65] segment ROI using grey-level symmetry and edge information (density, symmetry) checks based on the assumption that an upright pedestrian has an inherent symmetry associated with their silhouette, and that the edge of a pedestrian should cause a significant contrast in greyscale intensities as the pedestrian is generally much brighter than their surroundings in an FIR image [66]. Keypoint generation presents an alternative to binary thresholding and involves locating keypoints within the image, these keypoints are then used to isolate bounding boxes for classification. The Smallest Univalue Segment Assimilating Nucleus (SU- SAN) method [67] of keypoint generation is implemented in [50]. In [49], keypoints of ROI are determined with the use of Speed-Up Robust Features (SURF), which are based on 2D Haar wavelet responses and are rotation and scale invariant. SURF features are also used in [68] to detect pedestrians in both visible spectrum and FIR images. ROI isolation methods that utilise stereo configurations have been proposed in numerous works such as [51,52,69–74]. Disparity information provided by the stereo FIR sensors is used to isolate ROI from the background scenery. Stereo systems generally provide high pedestrian detection rates but can also provide information for Chapter 2. Literature Review 20 other ADAS applications such as distance estimation, and based on the speed of the vehicle could potentially estimate time-to-collision. The number of ROI isolated can be reduced by filtering according to various properties commonly associated with pedestrians. One property commonly used is the ROI bounding box aspect-ratio [48,65,70] (a measure of the ratio of the bounding box height to width). It is reasonable to assume that a pedestrian that is standing or walking, is taller than they are wide. A similar method is used in [14, 75] where the authors exclude ROI candidates in regions of the image deemed not to be in the immediate path of the vehicle. Road detection is used in [14] to identify the more salient parts of the image for filtering ROI in an FIR frame. Background subtraction is a useful tool for segmenting moving ROI from stationary thermal imagery [55,76], but it is not very useful in the highly dynamic environment encountered by a car mounted camera-based ADAS.

2.5 Region of Interest Classification

Classification is used in a pedestrian detection system to determine if an isolated ROI contains a pedestrian or non-pedestrian. In a pedestrian detection ADAS ac- curate classification is critical because a classifier that does not accurately classify a pedestrian could be the cause of a potentially fatal collision. The most commonly used classification methods are Support Vector Machines (SVM) [48,49,77], Adaboost [78] and Artificial Neural Networks (ANN) [79, 80]. These classifiers are computation- ally expensive, but this cost is mitigated somewhat by the fact that they operate only on the isolated ROI and not on the full FIR frame. The authors in [14] claim that greyscale classification performs better than binary classification as the binary candidates were too shape sensitive. Probabilistic template matching is applied by Bertozzi et al. in [65] and [70]. Template matching attempts to find a correlation between a probablistic model tem- Chapter 2. Literature Review 21 plate and processes each ROI to determine if the ROI is a pedestrian. Examples of templates (based on a standing upright pedestrian facing the sensor) used in [65] and [70] are shown in Figure 2.5. A range of templates of different sizes are used in [81], with each template representing a pedestrian located at a different distance from the sensor. The authors claim a detection rate of 90% when objects are located at a maximum of 45m from the FIR sensor. A probabilistic template is used in [15] and [75] to detect pedestrians at distances up to 50m from the FIR sensor; the au- thors claim a detection rate of 69%. Template matching is used in conjunction with AdaBoost in [78,82]. AdaBoost creates a strong classifier by attaching multiple weak classifiers in an iterative manner, where each new classifier focuses on misclassified instances.

Figure 2.5: Examples of various templates used in probabilistic template matching.

Histogram of Oriented Gradients (HOG) was first proposed as a feature vector for pedestrian classification with a linear SVM in the visible spectrum by Dalal and Triggs [83]. HOG vectors are very useful for providing information about an object’s shape. Bertozzi et al. describe a stereo combination of FIR cameras used in [51] to isolate ROI that are then classified with a SVM trained with HOG feature vectors. They claim a detection rate of 91% on a database containing 500 pedestrian and 500 non-pedestrian objects. F. Suard et al. [52] expanded on the stereo method proposed by Bertozzi, claiming a detection rate of 97% by obtaining improved parameters for HOG feature vector generation. In [48], a single FIR camera is used to isolate ROI Chapter 2. Literature Review 22 and generate HOG vectors for isolated ROI; the HOG vector is then classified as a pedestrian or non-pedestrian object by a SVM using a Radial Basis Function (RBF) kernel. The authors report a 96% detection rate and 1% false positive rate with this method, which is comparable to the stereo method of Suard and an improvement on the stereo work of Bertozzi. Intensity Self Similarity (ISS) descriptors are proposed for pedestrian detection in [49]. An ISS feature vector is generated by first scaling the isolated ROI to 64 x 128 pixels. The ROI is then divided into n blocks of 8 x 8 pixels, and histograms are generated for each block. Once all of the histograms have been generated, a similarity vector of n(n-1)/2 elements is computed by comparing the contents of the histogram of each block with all of the remaining histograms. The ISS vector is classified as a pedestrian or non-pedestrian object by a SVM. The authors claim that the execution time of computing an ISS vector is too high for real-time performance on their chosen CPU. Local Binary Patterns (LBP) are a form of feature vector originally proposed for texture classification but recently featured in pedestrian detection systems in the FIR spectrum [50, 84]. The advantage of LBP features for classification is that they are quite simple and have very little processing overhead. As a result of this efficiency they can be quite well suited to a system that must perform in real-time. LBP features are used in [50] to classify pedestrians in FIR imagery in real-time with a SVM. The authors achieve a detection rate of 91% with a false positive rate of 1%. Haar-wavelets were originally proposed for object detection by Papageorgiou et al. in [85] and have been used in pedestrian detection [78, 86, 87]. Haar wavelets are used as weak classifier inputs for Adaboost in [78] to classify pedestrians using a cascade of classifiers. A classifier cascade aims to reject a large number of ROI in the early computationally inexpensive stages and to classify the remaining regions with the computationally expensive classifiers. The authors claim a detection rate of 95% with a false positive rating of 1%. Haar-wavelets are used in [87] with Adaboost to Chapter 2. Literature Review 23 train a cascade classifier, the authors claim a detection rate of 92%. The combination of HOG and ISS features is performed in [49] to create a descrip- tor for a SVM. This has advantages over a purely HOG based SVM as it provides information not only about the shape of the ROI contained within the bounding box, but also the similarities between regions within the bounding box. The authors claim a detection rate of 97% when tested on a database of FIR images, however no infor- mation is provided about the percentage of false positives and false negatives that occur. The overall computation time of HOG-ISS vectors for a frame on a Core 2 Duo CPU with a clock speed of 2.4GHz is very high (185ms) making it unsuitable for a real-time pedestrian detection application. In recent years Deep Learning using Convolutional Neural Networks (CNN) has been explored as a method of classifying objects in machine learning applications. CNNs have featured in the automotive environment in areas such as vehicle detection [88], sign detection [89], and pedestrian detection in the visible spectrum [90, 91]. Pedestrian detection in the FIR spectrum has not been explored in great detail with CNNs, this could be attributed to a lack of easily accessible annotated data as CNNs require large amounts of annotated data to prevent over-fitting of a network to a particular dataset. John et al. [92] proposed a deep CNN architecture for classifying pedestrians in FIR images, the authors claim higher accuracy than HOG-SVM based approaches on a database of FIR images.

2.6 Pedestrian Tracking

Tracking has been used in numerous FIR pedestrian detection algorithms where it has demonstrated significant improvements in overall detection rates. The main function of tracking is to associate detections in the current frame with detections in previous frames. Tracking has several purposes within a pedestrian detection system, such as pre- Chapter 2. Literature Review 24 dicting a pedestrian’s position based on data from previous frames, which can be use- ful if classification temporarily fails due to partial or full occlusion of the pedestrian. Tracking can also reduce the number of false positives encountered by the system by keeping track of the number of frames an object has appeared in. False positives generally only occur in a single frame, and the tracking algorithm can weight the importance of objects based on how many successive frames an object has appeared in. Kalman filters [93] are a common tool for tracking objects in automotive video in both the FIR and visible spectra [94]. A Kalman filter functions by estimating the state of an object based on a series of incomplete or noisy measurements [93]. It is suited to pedestrian detection applications as it can track pedestrians when factors such as occlusion by objects in the environment prevent successful classification. The pedestrian detection system in [48] uses a Kalman filter to track the bounding boxes of ROI that have been classified as pedestrians; the parameters associated with the bounding box are the x-position, y-position, width and height in pixels. When an object is detected, the system attempts to associate it with a pedestrian detected in the previous frame. If a pedestrian is detected in five consecutive frames, it is tracked, and if detection of a pedestrian temporarily fails, a “time-to-live” counter is decremented. If the pedestrian is re-acquired before the counter reaches zero, the counter is reset. If the counter reaches zero, the system ceases to track the pedestrian in subsequent frames and will not try to associate any new pedestrian bounding boxes with it. The authors in [14] use a Kalman filter to track the heads of pedestrians rather than a bounding box containing the full body of the pedestrian between frames based on past history analysis. The head is chosen as the shape varies very little between frames and is generally one of the brightest regions on a pedestrian in an FIR frame. The authors claim that inclusion of tracking in the system increased overall detection rates in video streams from 35% (tracking disabled) to 94% (tracking enabled). A simplified Kalman filter, known as an alpha-beta tracker, is used in [79] to Chapter 2. Literature Review 25 estimate object state parameters. This is a simplified version of a Kalman filter that uses pre-estimated steady state gains and a constant velocity model. Particle filters are also used in pedestrian detection systems to track successfully classified ROI [78, 95, 96]. A particle filter uses a set of random samples spread in the state space and calculates their weighted mean value to obtain the expected state of an object. Tracking of classified pedestrians is performed using a particle filter in [78]. A modified particle filter is implemented in [96], this method determines intensity and edge cues for each successfully classified pedestrian and combines them with the particle filter framework by an adaptive integration scheme leading to more accurate tracking. Chapter 2. Literature Review 26

2.7 Summary and Conclusions

This chapter has presented a comprehensive review of existing techniques used for FIR pedestrian detection in the automotive environment. The challenges associated with detection of pedestrians in low light conditions with a FIR sensor are discussed, along with the justification for choosing a FIR sensor over a NIR sensor in a pedestrian detection system. An overview of pedestrian detection algorithms is provided, including an in-depth discussion of each stage within a pedestrian detection algorithm in automotive FIR video. A summary of each of the most prominent algorithms discussed in this chapter is presented in Table 2.2. The table highlights the methods used in each pedestrian detection algorithm to isolate, classify, and track pedestrians in each frame, including the sensor configuration employed and the detection rate achieved by each algorithm. A large proportion of the literature reviewed in this chapter is executed on a desk- top CPU and there is little emphasis on an embedded implementation. The choice of hardware for an embedded system plays a key role in how well the algorithm will perform in terms of functional performance and power consumption; these are key elements in automotive embedded system design. The initial ROI isolation requires a large number of low-level pixel operations to determine if a region in the image could potentially contain a pedestrian. The ROI classification stage of a pedestrian detec- tion algorithm generally requires complex floating point mathematical operations to determine if the isolated ROI contains a pedestrian or non-pedestrian object with a relevant classifier. The next chapter presents a review of existing hardware platforms in the automotive domain and proposes a novel, low-power, low-cost embedded sys- tem developed to execute a FIR pedestrian detection algorithm in real-time using a single FIR sensor. Chapter 2. Literature Review 27 -- Gradients + SVMPatterns - + SVM Gradients + SVM - Contour Tracing Silhouette MatchingDisparity Information MatchingThreshold -Static Threshold Gradients - + SVM -+ Seeded Region Growing + SVM - 90% Head Detection Mean Shift Histogram of Oriented - Table 2.2: Summary of pedestrian detection methods used in FIR spectrum. Sensor Config. ROIStereo IsolationMono Symmetry Object Checking ClassificationMono Silhouette Symmetry Matching Checking TrackingMono + Detection Rate Laser Contour Silhouette Scanner Tracing MatchingStereo - Contour Tracing Silhouette MatchingMono - Dynamic Threshold + SilhouetteStereo Silhouette Dynamic - -Stereo Matching Static Threshold - Mono - - SVM Histogram of - Mono Oriented - Clothing Dist. Comp.Mono - Histogram SUSAN of Oriented Gradients Kalman 90% SURF Histogram + of Oriented Kalman 96% 97% Local Binary - 96% Intensity Self Similarity + - 91% - 97% 91% Authors Tsuji et al. (2003) [74] Bertozzi et al. (2003) [65] Fang et al. (2004) [59] Fardi et al. (2004) [59] Bertozzi et al. (2005) [70] Xu et al. (2005) [14] Suard et al. (2006) [52] Bertozzi et al. (2007) [51] O’Malley et al. (2010) [48] Xia et al. (2010) [50] Miron et al. (2012) [49] Chapter 3

Embedded System Architecture

3.1 Introduction

This chapter proposes an embedded system architecture that can be used in a vision-based ADAS to detect pedestrians in real-time. A review of processing plat- forms currently used in the automotive environment is first provided. Based on this review, suitable technologies for an embedded vision-based ADAS are decided upon. The embedded ADAS is then described before implementation of the FIR pedestrian detection algorithm.

3.2 Processing Platforms

This section will assess systems to determine their suitability for FIR pedestrian detection. Many of the algorithms presented in the literature have been implemented on a desktop CPU with little consideration of running the algorithm in real-time on an embedded platform. Without accurate execution times the algorithm cannot be benchmarked and evaluated for implementation on an embedded system. There are currently 4 technologies used for data processing in the automotive environment:

• Central Processing Unit

28 Chapter 3. Embedded System Architecture 29

• Digital Signal Processor

• Graphics Processing Unit

• Field Programmable Gate Arrays

3.2.1 Central Processing Unit (CPU)

A wide variety of in-vehicle applications are controlled using CPUs, that generally provide relatively low cost, medium performance levels, and sequential processing of data. Multi-core versions of CPUs capable of running a number of tasks concurrently on physical and virtual cores exist, but the scale of achievable concurrency is far lower than the other technologies listed. Examples of such systems include Electronic Control Units (ECU) for adaptive suspension [97], in-vehicle-infotainment [98] and vehicle detection [99]. Numerous libraries exist that are optimised for execution on a CPU, such as the OpenCV image processing library [100] containing a large number of highly optimised open source implementations of common image processing oper- ations. A performance boost can be achieved through the use of the Intel Integrated Performance Primitives [101]. Intel’s low power Atom CPUs [102] have moderately high clock speeds and low power consumption. These qualities can be very useful in sequential image post-processing tasks such as classification and tracking of pedestri- ans i.e. tasks that require rapid sequential processing.

3.2.2 Digital Signal Processor (DSP)

Digital Signal Processors (DSP) are specifically designed for processing complex, math-intensive applications such as image processing, and have been used in numerous applications such as facial detection [103], communications [104,105], and automotive applications [106–109]. DSPs differ from conventional CPUs in their architecture and instruction set. The instruction sets used by DSPs are developed to efficiently process digital signals such Chapter 3. Embedded System Architecture 30 as video and audio by including specialised operations such as Multiply-Accumulate (MAC) instructions to reduce execution times when compared to traditional micro- processors. DSPs can be broadly separated into two categories: fixed-point and floating-point. Fixed-point DSPs are used in a greater number of high volume appli- cations and are typically less expensive than floating-point DSPs due to the scale of manufacturing. DSPs are highly prominent in the automotive industry and feature in a wide array of in-vehicle systems. However, recent advances in both GPUs and FPGAs could challenge that dominance in the future.

3.2.3 Graphics Processing Units (GPU)

Graphics Processing Units (GPUs) are constructed using a large number of pro- cessing cores to allow for massively parallel designs. The cores generally operate at slower clock speeds than desktop CPUs and have smaller caches, but each core can be assigned to a specific task leading to significantly reduced execution times compared with CPUs. GPUs have been used in a range of automotive image pro- cessing research [108, 110]. They are well suited to processing large amounts of data quickly and in parallel, they support floating-point operations, and can be reconfig- ured through software updates. Early GPUs were very specific in their application to graphics processing applications but their use in scientific computing has expanded quickly with the introduction of libraries that allow for more general purpose com- puting. NVIDIA’s CUDA [111] libraries facilitate rapid development of algorithms with a large number of parallel branches running on each CUDA core on the chosen device. GPU based solutions exist in visible spectrum imaging for detecting pedestri- ans in real-time [112–114]. Visible spectrum images are more complex than their FIR equivalents due to 32-bit pixels containing red, green, and blue channels, compared to 8-bit greyscale pixels in FIR images. The power consumption of many GPUs make them unsuitable for the automotive environment although these devices are becoming more efficient, and may become a viable option in the future. The NVIDIA GPU used Chapter 3. Embedded System Architecture 31 in [112] detects pedestrians at a rate of 30fps but consumes approximately 244W of power. The authors in [113] present a pedestrian detector capable of processing FIR frames at a rate of 50fps with an NVIDIA Geforce GTX 470 [115] that consumes approximately 150W of power.

3.2.4 Field Programmable Gate Arrays (FPGA)

Field Programmable Gate Arrays (FPGAs) are composed of programmable logic components and programmable interconnects that can be used to create complex functions and high performance parallel pipelined data processing structures. The FPGA processes data using dedicated hardware rather than software as is the case with a GPU or DSP; resulting in faster operation since instructions do not need to be fetched and queued as is the case with software. Other advantages of FPGAs for au- tomotive image processing applications are: reconfigurability, low power and low cost. Automotive grade FPGAs exist that are more robust than consumer grade models to environmental noise and can operate in a wider range of conditions without impact- ing performance. FPGA clock speeds are generally lower than the other technologies listed, however their ability to run a large number of processes in parallel can compen- sate for the lower clock speed, and allow for equal or better performance depending on the application. Some existing FPGA-based automotive algorithm implementa- tions include day-time pedestrian detection [116], road sign detection [117], and lane departure warning [11]. The authors in [56] presented a real-time FIR pedestrian detection system on a Xilinx Virtex-II Pro XC2VP30 FPGA using a static thresh- old with connected component labelling to detect pedestrians at a rate of 25fps with a wall mounted FIR sensor. Using Xilinx power estimation tools [118], the system draws <2 Watts of power. The review of FIR pedestrian detection algorithms in the previous chapter effec- tively seperated pedestrian detection algorithms into 2 sections (ROI Isolation, ROI Classification) with a third optional section (Tracking). The ROI isolation stage re- Chapter 3. Embedded System Architecture 32 quires a large number of repetitive low-level pixel operations to be applied to a frame to isolate regions that may contain a pedestrian. Performing these operations in par- allel can result in significantly reduced execution time. The ROI classification stage requires a suitable classifier such as a SVM to perform complex floating point oper- ations to extract features from an ROI and determine if the features are associated with a pedestrian or non-pedestrian object. Based on the above technological review, this work proposes the combination of a low power x86 based CPU and FPGA to create a flexible, highly efficient vision-based ADAS. The CPU functions as the host and performs ROI classification and pedestrian tracking as the floating-point architecture allows for increased accuracy and faster development times over a fixed-point implementation. The open source OpenCV image processing libraries allow for rapid development of image processing algorithms on the CPU. The function of the FPGA is to perform hardware acceleration of the low level pixel operations required to isolate ROI from the current frame.

3.3 Embedded ADAS

The embedded ADAS presented in this work uses an Intel Atom N270 single core CPU [102] and an Altera Arria GX II FPGA [119] that communicate over a PCI express bus. A prototype of the embedded CPU-FPGA system is shown in Figure 3.1.

3.3.1 CPU

The Intel Atom N270 CPU has a clock speed of 1.6GHz and a Thermal Design Power (TDP) of 2.5W. The Atom line of processors have featured in in-vehicle info- tainment systems in Mercedes, Hyundai, and Nissan vehicles [120]. This CPU allows for rapid prototyping of the pedestrian detection algorithm with the open-source C/C++ based OpenCV image processing libraries; a boost in performance can be Chapter 3. Embedded System Architecture 33

Figure 3.1: Development prototype for the embedded pedestrian detection ADAS. The Intel N270 CPU is connected to the Altera Arria GX II FPGA over a PCI- express lane on a modified LGA 775 socket motherboard. achieved in OpenCV on an x86 architecture using the Intel Integrated Performance Primitive (IPP) libraries [101]. The Linux Operating System (OS) Ubuntu [121] (Ker- nel version 2.6.32-33) was run on the Intel CPU. Communication over a PCI express lane is also possible with a CPU and compatible hardware through the use of Linux libraries; this allows for transfer speeds of up to 4Gb/s.

3.3.2 FPGA

The Altera Arria GX II FPGA will be used to accelerate the ROI isolation portion of the pedestrian detection system, due to the number of pixel-level operations that Chapter 3. Embedded System Architecture 34 can be run in parallel on dedicated hardware. The FPGA functions are captured using the Verilog Hardware Description Language (HDL). A number of tools exist that are capable of generating Verilog code based on an existing C++ implementa- tion, however these tools are not always the best choice due to inefficiencies in the conversion process [122,123].

3.3.3 DMA Engine Device Driver

A Linux driver was developed to transfer FIR frames between the CPU and FPGA over a PCI-express connection. The DMA engine device driver resides in the Linux kernel space (the portion of the Linux kernel that manages any hardware connected to the system). Subroutines or functions forming part of the Linux kernel are considered to be part of kernel space. The user space is where the software based image processing functions process frames from the FIR sensor communicates with the FPGA present on the PCIe lane through the kernel space. The relationship between the user-space, kernel space, and the hardware is shown in Figure 3.2. The CPU initialises communication between the devices by creating a DMA de- scriptor table detailing the location and length of the frame in CPU RAM and trans- fers it to the FPGA’s on-board DMA engine. The DMA engine then reads the relevant data from CPU memory. The DMA transfer process is based on Altera’s Avalon ST specification [124].

3.4 System Testing

A descriptor table contains a DMA header that describes the total number of descriptor packets available for transfer, and identifies their location in memory. The descriptor table contains the address of the data in memory (the FIR frame), the FPGA endpoint address data will be transferred to, and the total number of 32-bit DWORDs that are to be transferred (a 32-bit DWORD contains four FIR pixels). An Chapter 3. Embedded System Architecture 35

Figure 3.2: User space where the pedestrian detection application resides, and kernel space where the device driver for communication between the CPU and FPGA resides. Chapter 3. Embedded System Architecture 36 outline of a DMA descriptor table is displayed in Table 3.1. The PCIe link between the two devices was tested by transferring FIR frames from the host to the FPGA and returning the frame unaltered. The original frame on the CPU was compared to the transferred frame after the transfer was completed to ensure no errors were introduced in the frame during the transfer process.

Table 3.1: Sample descriptor table used in DMA data transfer. The table contains a single header and can contain numerous descriptor packets. DW0 DW1 DW2 DW3 Desc. Num. of Descriptor Upper Address of Lower Address of EPLAST Value Header Packets Desc. Table Desc. Table Desc. Length of Packet in Endpoint Address Upper Address Lower Address Packet DWORDs

A testing framework was developed for automated testing of the FPGA design to allow for faster development times. The FPGA hardware and CPU software functions are executed in parallel and results are compared to determine if the hardware design implemented on the FPGA functions as planned. It can also be used for comparing the execution speed of each function, and determining the relative increase in performance when accelerated in hardware on the FPGA. A more detailed description of the DMA engine and Linux device driver is included in Appendix A.

3.5 Summary and Conclusion

In this chapter, an embedded ADAS suitable for real-time execution of vision- based algorithms was presented. A review of processing platforms currently in use in the automotive domain was presented. Based on this review an Intel N270 CPU and an Altera Arria GX II FPGA were chosen as technologies for an embedded ADAS implementation for real-time detection of pedestrians in FIR video. The combination of these architectures will allow for low level pixel-based ROI isolation operations to Chapter 3. Embedded System Architecture 37 be accelerated with dedicated FPGA hardware described in Verilog, while the CPU performs classification of ROI and tracking of successfully detected pedestrians using the C based OpenCV libraries. The next chapter describes a method of isolating ROI in streams of FIR video in a number of settings in real-time using dedicated hardware on the Altera FPGA. The architecture can isolate pedestrians in FIR video in real-time with a high degree of accuracy. Chapter 4

Region of Interest Isolation

4.1 Introduction

This chapter presents a method of isolating ROI in FIR video that could poten- tially contain a pedestrian. Accurate isolation of pedestrians in FIR video is a key task in a pedestrian detection system. If ROI are not accurately segmented, they may be mis-classified and go undetected by the system. A brief introduction to the methods used to label regions of high intensity in binary images is first presented. A comparison is made between the use of static threshold binary threshold values and Seeded Region Growing (SRG) to isolate ROI in FIR images. A design for the acceleration of the SRG algorithm using a FPGA is presented. The resource utili- sation and execution times of the hardware accelerated design are provided, and the results of the hardware accelerated SRG algorithm when tested on streams of FIR video are then presented. Finally, conclusions are drawn based on the performance of the hardware accelerated SRG algorithm.

38 Chapter 4. Region of Interest Isolation 39

4.2 Clothing Distortion Compensation

An FIR pedestrian detection system relies on the fact that the temperature of a pedestrian is generally higher than the surrounding environment [14, 54, 59, 64]. In real-world environments, temperatures fluctuate throughout the year and pedestrians dress in appropriate clothing for each season, thus distorting the shape of a pedestrian (particularly when heavily insulating clothing is worn in cold weather). The authors in [48] refer to this as “Clothing Distortion” and remove it from the image using a morphological closing operation. Examples of this distortion can be seen in Figure 4.1(a) where two pedestrians are situated in close proximity with clothing distortion present in their torso regions. A circular kernel would cause groups of bright regions in close proximity to one another to merge as shown in Figure 4.1(b), this would subsequently cause the classifier to falsely classify the ROI as a non-pedestrian object. To prevent merging of pedestrians in close proximity, a vertically orientated kernel (3x30 pixels) is used to remove the clothing distortion, as shown in Figure 4.1(c). The vertically oriented kernel preserves the vertical shape of a pedestrian, which aids segmentation of ROI from the background. It should be noted that the closing operation does not affect pedestrians with no visible clothing distortion on their body and their appearance is maintained. Morphological closing attenuates dark artefacts or noise from an image, while leaving bright detail relatively undisturbed. As for the binary image operator, grey scale closing is a dilation followed by an erosion of an input image (f ) by a structuring element (b):

f • b = (f ⊕ b) b (4.1)

A structuring element is the kernel that convolves with the image in morphological operations. The greyscale dilation (eq. 4.2 ) at a point (x0, y0), takes the maximum value of the image under the structuring element: Chapter 4. Region of Interest Isolation 40

(a) (b) (c)

Figure 4.1: Images from clothing distortion compensation stage. (a) The original FIR Image, (b) image closed with a circular kernel, pedestrians have merged. (c) FIR image is closed with a vertically oriented kernel, no merging of pedestrians.

(f ⊕ b)(x0,y0) = max{f(x0 − x, y0 − y) + b(x, y)|(x0 − x), (y0 − y) ∈ Df ;(x, y) ∈ Db} (4.2)

where D f and D b are the domains of f and b respectively. The dual of this operation, greyscale erosion (eq. 4.3 ), is then performed, where the value at a point is the minimum value under the structuring element:

(f b)(x0,y0) = min{f(x0 + x, y0 + y) − b(x, y)|(x0 + x), (y0 + y) ∈ Df ;(x, y) ∈ Db} (4.3)

4.3 Seeded Region Growing

This section gives a brief description of each of the stages used in the Seeded Region Growing (SRG) algorithm. The individual stages of the SRG algorithm are:

1. Binary Thresholding

2. Connected Component Labelling Chapter 4. Region of Interest Isolation 41

3. Duplicate Removal

4.3.1 Binary Thresholding

The SRG algorithm relies on the application of a high intensity binary threshold on the frame; thresholds of successively lowering intensity are then applied until a cut-off point is reached. Binary thresholding is performed on a FIR image containing a pedestrian e.g. Figure 4.2(a). A high intensity threshold is first used in Figure 4.2(b) to find “seeds” within the image. The seeds grow larger with each successive threshold value until reaching the set cut-off point (in this example the cut-off point is T = 174 ) in Figure 4.2(d). Threshold values lower than the cut-off resulted in pedestrians merging with the background and offered no increase in detection rates.

(a) (b) (c) (d)

Figure 4.2: Stages in Binary Thresholding: (a) Image segment containing a pedes- trian, (b) high intensity threshold applied to find seeds in image, (c) lower intensity threshold applied, silhouette of pedestrian in image (d) pedestrian silhouette merged with background.

4.3.2 Connected Component Labelling

A very common image processing algorithm is Connected Component Labelling (CCL) [125], which is used to label Regions of Interest (ROI) in binary images for further analysis, such as pedestrians in an automotive scene [48, 126], or malignant Chapter 4. Region of Interest Isolation 42 tumours in biomedical applications [127, 128]. CCL is an operation where groups of connected pixels are classified as disjoint objects with unique identifiers. The pixels in a binary image (B) of dimensions N x M are processed in a raster fashion from B(0, 0) to B(N, M). An example of the labelling process is illustrated in Figure 4.3. When a foreground pixel is encountered, a label is assigned to it. For each new pixel encountered, its neighbours are analysed and a label is assigned based on their values, in this case the remaining pixels all have a neighbour that have been assigned the label ‘1’.

(a) (b)

Figure 4.3: Connected Component Labelling process. (a) Foreground pixels are high- lighted in grey, background pixels are white, (b) foreground pixels assigned label 1’.

CCL is performed by traversing the binary image with a mask. Two types of mask are commonly found in the literature: an 8-way mask and a 4-way mask. Two variations of each mask exist, the forward scan and backward scan mask, as illustrated in Figure 4.4. The 8-way mask is more resource intensive but generally results in fewer regions and more accurate labelling. The 4-way mask uses fewer resources but often results in more regions. An example of regions labelled using both masks is shown in Figure 4.5. The 8-way mask results in a single region in Figure 4.5(a) while the Chapter 4. Region of Interest Isolation 43

4-way mask results in 4 regions in Figure 4.5(b).

(a) (b)

(c) (d)

Figure 4.4: Forward and backward scanning masks (a) Forward scan 8-way mask, (b) Backward scan 8-way mask, (c) Forward scan 4-way mask, (d) Backward scan 4-way mask. Chapter 4. Region of Interest Isolation 44

(a)

(b)

(c)

Figure 4.5: CCL 4-way and 8-way masks applied to image (a) Input binary frame, (b) 4 regions resulting from the use of a 4-way mask, (c) 1 region resulting from the use of an 8-way mask. Chapter 4. Region of Interest Isolation 45

The output of the CCL process when performed on a binary representation of a FIR frame containing a pedestrian is displayed in Figure 4.6. The white regions have been labelled and a bounding box placed around each region.

Figure 4.6: Connected Component Labelling applied to binary representation of FIR frame. White regions in the frame are highlighted by a blue bounding box. Chapter 4. Region of Interest Isolation 46

4.3.3 Duplicate Removal

Upon completion of the labelling stage in this system there are a large number of labelled FIR frames that contain bounding boxes for each ROI within them. Many of the bounding boxes share similar co-ordinates. Regions isolated by the highest intensity threshold (T ) are compared to the regions isolated by the threshold value immediately below it (T - 1 ). Each subsequent comparison will be between the results of the previous comparison and the next lowest threshold value (T - 2, T - 3, T - 4 ). This process is repeated until all of the regions isolated by every threshold value have been compared and the best fit bounding boxes for the ROIs in the current frame are determined. A match is made between regions if the difference in the top left and bottom right co-ordinates of the two bounding boxes are less than a set threshold, in this application a total distance of 8 pixels was found to be sufficient. The duplicate removal process is illustrated in Figure 4.7. The co-ordinates of the green, red, and blue boxes in Figure 4.7(a) differ by an amount less than the threshold limit, so the smaller red and blue boxes are removed and only the largest green box remains in Figure 4.7(b). In 4.7(c) the red and blue box differ by less than the threshold so the red box is removed. The difference between the green and blue box co-ordinates are greater than the threshold value, as a result the blue box and green box remain in Figure 4.7(d). Chapter 4. Region of Interest Isolation 47

(a) (b)

(c) (d)

Figure 4.7: Duplicate Removal process: (a) Image containing 3 bounding boxes (green, red, blue) whose co-ordinates differ by 1 pixel in each direction (b) The smaller red and blue regions are removed and the largest green bounding box remains. (c) The red and blue box co-ordinates differ by 1 pixel in each direction while the green boxes bottom right vertex differs by 10 pixels (d) The duplicate red region is removed leaving the blue and green bounding boxes.

The complete SRG process is illustrated in Figure 4.8. The raw FIR frame is first captured and binary thresholds of lowering intensity are applied to the frame. Each Chapter 4. Region of Interest Isolation 48 frame is then labelled using CCL. The bounding boxes for the ROI in each frame are then compared to find the best-fitting bounding box for each ROI within the frame. Finally, the co-ordinates of the best-fitting bounding box are overlaid on the original frame.

Figure 4.8: Flowchart of stages in the Seeded Region Growing algorithm. A range of high intensity thresholds are applied to the input image. The resulting binary images are labelled using CCL. The Duplicate Removal stage compares the regions in each labelled image and the best-fitting regions are retained resulting in a single bounding box for the pedestrian in the frame. Chapter 4. Region of Interest Isolation 49

4.4 Static Threshold vs Seeded Region Growing

This section provides a detailed comparison of the detection rates obtained using a static threshold and the SRG method. The work in [56] applies a static threshold prior to CCL to isolate pedestrians. This method is not robust to variations in the ambient temperature and can lead to a large number of missed detections. The software-based SRG method proposed in [48] uses a range of threshold values to accurately isolate pedestrians. This method is tolerant of variations in the ambient temperature and pedestrian pose, leading to highly accurate isolation of ROI. The drawback of this method is the low frame rate; the authors claim a frame rate of 2fps on a 2.4GHz CPU. The static thresholding and SRG methods were prototyped in software using Mat- lab to determine the number of pedestrians that can be accurately detected in a number of scenes using each method. Static threshold values ranging from 170 to 250 were applied; these threshold values were chosen as values greater than 250 failed to isolate hot regions containing any significant data relating to pedestrians, and values less than 170 caused the background objects to merge with the pedestrians. FIR footage was captured using a vehicle mounted camera on a car travelling at speeds ranging between 0-100km/h in ambient temperatures between -3 to 8 ◦C. The per- formance of each static threshold value and SRG when tested on over 15,000 frames of FIR footage is displayed in Table 4.1. The percentage isolated figure shown is indicative of the number of frames in which a pedestrian was correctly isolated, out of the frames in which they appeared. A number of sample frames containing regions isolated by static thresholding or SRG are displayed in Figure 4.9. The SRG method isolated 34% more pedestrians than the leading static threshold value (T = 210 ), indicating that the SRG method is more suitable for isolation of pedestrians from the background in FIR images. Chapter 4. Region of Interest Isolation 50

Table 4.1: Percentage of successfully isolated pedestrians in FIR video using static thresholding and seeded region growing. Threshold Value Percentage Isolated

170 31 180 37 190 41 200 48 210 52 220 42 230 34 240 22 SRG 86

4.5 Software Execution Time

The clothing distortion compensation and SRG algorithms were implemented in software using the C based OpenCV image processing libraries on the low power CPU to determine where potential processing bottlenecks in the ROI isolation stage of the algorithm may occur. Their execution time was determined using the Linux gprof profiler [129], and the results displayed in Table 4.2.

Table 4.2: Execution times for Clothing Distortion Compensation and Seeded Region Growing software implementations on the Intel Atom CPU. Times are displayed in milliseconds (ms). Minimum Maximum Average

Clothing Distortion Comp. (ms) 3 3 3

Seeded Region Growing (ms) 168 390 236

Execution Time (ms) 171 393 239 Chapter 4. Region of Interest Isolation 51

(a) (b) (c) (d)

(e) (f) (g) (h)

(i) (j) (k) (l)

(m) (n) (o) (p)

Figure 4.9: Results of CCL performed on various static binary threshold values and SRG : (a) - (d) T = 240, (e) - (h) T = 210, (i) - (l) T = 190, (m) - (p) Output of SRG. The SRG method encloses the full pedestrian in a bounding box leading to more accurate ROI isolation for future classification. The static values miss the pedestrians or crop the pedestrian’s silhouette (pedestrians feet in (a)-(d)) which could result in a missed detection by a classifier.

The performance figures show that the SRG algorithm is not suitable for real-time implementation on the chosen low power CPU as it requires approximately 390ms of execution time resulting in a frame rate of 2.5fps; a real-time frame rate of 25fps requires a maximum execution time of 40ms per frame. The SRG method is suitable for hardware acceleration on the FPGA as both the Binary Thresholding and CCL stages can benefit from a parallel pipleined architecture. The next section proposes a Chapter 4. Region of Interest Isolation 52 hardware architecture that significantly reduces the execution time of SRG to allow for real-time execution of the pedestrian detection algorithm on the embedded ADAS.

4.6 Seeded Region Growing Hardware Design

This section describes the hardware architecture of the SRG algorithm and each component within the system. A performance evaluation of a number of CCL algo- rithms suitable for implementation on dedicated FPGA hardware is first presented. A timing analysis of the algorithms execution times is then performed in order to determine the best configuration. This is followed by an in-depth description of the system architecture.

Figure 4.10: Top level view of the proposed Seeded Region Growing hardware archi- tecture. Pixels are transferred from the CPU to the FPGA over the PCI-ex connec- tion where Seeded Region Growing is applied. The co-ordinates of isolated ROI are transferred back to the host CPU for classification and tracking. Chapter 4. Region of Interest Isolation 53

An overview of the proposed architecture is presented in Figure 4.10. A signal is first sent to the FPGA from the host to indicate a transfer of pixels is about to begin and to initiate the binary thresholding and CCL stages. Pixels are streamed to the FPGA from a host CPU connected to an FIR sensor, and binary thresholding is applied. The resulting binary pixels are then assigned labels by the CCL stage. After the thresholding and CCL stages are completed, the regions found in each binary image are read by the duplicate removal stage and compared. The co-ordinates of the remaining regions after duplicate removal are returned to the CPU to be classified as a pedestrian or non-pedestrian. A more detailed description of the SRG architecture is provided in Appendix A. The binary thresholding and duplicate removal stages of the algorithm are im- plemented using a large number of comparison operations. A number of algorithms to perform CCL on an FPGA have been proposed in the literature [125, 130–133]. The main differences between these algorithms are: the number of passes they make over the image, memory requirements, and execution time. A number of algorithms were evaluated and compared and based on this a CCL method will be chosen for implementation on the FPGA. The candidate CCL methods are:

• Two Pass

• Multiple Pass

• Parallel

• Contour Tracing

• Single Pass

4.6.0.1 Two Pass CCL

The original CCL algorithm proposed in the literature, often referred to as the “classical” two pass algorithm was introduced by Rosenfeld et al. in (1966) [125]. Chapter 4. Region of Interest Isolation 54

FPGA implementations of the two-pass method are found in [134,135]. The classical approach for CCL requires two consecutive passes through a binary image. During the initial scan, foreground pixels are assigned preliminary labels, and label collisions are stored in a merger table (Table 4.3). During the second scan, the labels are reassigned with the labels stored in the merger table as shown in Figure 4.11(d). When all of the labels have been reassigned, data relating to each ROI can be isolated and placed in a data table. The two-pass labelling approach is illustrated in Figure 4.11.

(a) (b)

(c) (d)

Figure 4.11: Two pass labelling method: (a) input image, grey pixels denote a fore- ground pixel (Ff ), white pixels denote a background pixel (Fb), (b) image partially labelled, three labels in use. (c) image fully labelled after first pass. Labelling conflict between regions ‘1’ and ‘3’, merger table is updated to reflect the merger. (d) Image scanned again and labels reassigned based on contents of merger table (Table 4.3). Chapter 4. Region of Interest Isolation 55

Table 4.3: Two pass labelling merger table used in Figure 4.11. Provisional Label Equivalence Label 1 1 2 2 3 1 4 -

Processing complex images with multiple label collisions can be time consuming if a label merges with a large number of other labels in an image; this requires all of the merging conflicts to be resolved before the second scan can begin. For a hardware-based video processing system this can be completed during horizontal or vertical blanking periods. During the second image scan, the preliminary labels are reassigned with the labels from the merger table. The memory consumption in bits for the two-pass CCL algorithm can be calculated as:

T otal Mem. = [(log2(Lmax + Lmerge + 1)](N ∗ M) + memMT + memDT (4.4)

where Lmax is the maximum number of labels required, Lmerge is the number of mergers, memMT is the memory required for the merger table, memDT is the memory required for the data table, and N and M are the height and width of the image in pixels. The total execution time in clock cycles of the two pass method can be calculated as:

T wo P assExe = 2 ∗ (N ∗ M) (4.5)

4.6.0.2 Multiple Pass CCL

A CCL approach based on multiple image scans was introduced by Suzuki et al. [130]. The image is scanned multiple times in forward and backward directions, Chapter 4. Region of Interest Isolation 56

while a one-dimensional table that memorises label equivalences is used to unite equivalent labels. The execution time is directly proportional to the number of pixels in connected components in the image. The algorithm is based on sequential local operations, therefore there is no need to employ large equivalence tables that often require large amounts of memory and processing time. The number of scans required to label the image is dependent on the complexity of the image and can be difficult to predict. Hardware implementations of this method are described in [136,137].

4.6.0.3 Parallel CCL

A number of CCL algorithms have been developed that rely on heavily parallel architectures [131,138,139]. These architectures require one logical processing element per pixel; as a result they require a large amount of memory. The authors in [138] split the image into vertical slices to process the image in real-time and reuse labels to significantly decrease the memory requirement of the system.

4.6.0.4 Contour Tracing CCL

Contour tracing was introduced by Chang and Chen in [132]. This is an alternative to algorithms based on progressive scan methods. Here, the image frame is also scanned in raster order, however thanks to the contour tracing capability; all the connected components can be distinguished in a single image scan as the problem of label collisions does not apply. In order to label a binary image frame, the input frame has to be buffered in

memory. The image is scanned until a foreground pixel (B(x, y) = Ff ) is encountered as shown in Figure 4.12(a). A complete trace of the contour is performed until the starting pixel is reached (indicating that a full contour has been mapped) as shown in

Figure 4.12(b). The contour pixels are labelled with index lk for L ⊂ N, where 3 ≤ k ≤ K and K defines the total number of connected components within a frame. Also, horizontal neighbouring pixels are labelled with the border label (lk = 2). Once the Chapter 4. Region of Interest Isolation 57

(a) (b)

(c) (d)

Figure 4.12: Contour tracing method (a) input image, grey pixels denote a foreground pixel (Ff = 1), white pixels denote a background pixel (Fb = 0) (b) start tracing contour of first region, assigning horizontal neighbouring pixels with border label (lk = 2) (c) continue raster scan until encountering next region and trace contour of the region. (d) Continue image scan until all pixels are labelled correctly. contour is fully labelled, the index k is incremented by 1 and the algorithm resumes scanning as shown in Figure 4.12(d). At this point, one of several pixels can be encountered:

• background pixel: B(x, y) = Fb

• unlabelled foreground pixel: B(x, y) = Ff Chapter 4. Region of Interest Isolation 58

• already labelled contour pixel: B(x, y) > 1

• horizontal border pixel: x = H

When a background pixel is encountered, the algorithm keeps scanning the image and no further action is required. However, for B(x, y) = Ff the algorithm starts the contour tracing procedure as described above. Once the border pixel B(x, y) = 2 is encountered, the algorithm reads the label of the subsequent contour pixel B(x, y) = lk and keeps scanning within the contour pixels while label lk is assigned to all the pixels B(x, y) = Ff until another pixel B(x, y) = 2 is encountered. Once the last pixel in a row is reached (x = H), the scan continues from the first pixel in the next row according to raster scan. A detailed description of the contour tracing algorithm implementation can be found in [132]. The total memory consumption in bits of the contour tracking method can be calculated with:

T otal Mem. = log2[(Lmax + 3)](N ∗ M) + memDT (4.6)

The total execution time in clock cycles can be calculated as:

Contour T racingExe = 3 ∗ (N ∗ M) (4.7)

4.6.0.5 Single Pass CCL

The single-pass CCL method is a relatively new method introduced by Bailey et al. [133] and has advantages over previous methods, such as:

• This method operates on streamed data so there is no need to buffer an entire input frame.

• A single raster scan of the image is all that is required to label and extract regions from the current frame. Chapter 4. Region of Interest Isolation 59

• There is no need to buffer a temporary image with labelled objects leading to a significant reduction in required resources compared to other CCL methods.

The single pass labelling method is illustrated in Figure 4.13. In Figures 4.13 (a) - (c), the neighbours of the current foreground pixel (p1) being processed are all background pixels and a new label is assigned (L = 1). The associated entry in the data table is initialised three parameters are stored: the location of the label, the width and height of the region. In Figure 4.13(d) a merger scenario is encountered between labels ‘1’ and ‘3’. The data associated with label ‘3’ is merged with label ‘1’ and the entry in the data table for label 3 is reset. All subsequent reads from the row buffer for the current row with the label ‘3’ are set to ‘1’ based on the merger table. Chapter 4. Region of Interest Isolation 60

(a)

(b)

(c)

(d)

Figure 4.13: Stages in single pass CCL algorithm. (a)-(c) The forward scan window encounters a pixel with no neighbours and creates new entries in the data table for each region. (d) A merger scenario occurs leading to the data associated with label ‘3’ being fused with label ‘1’ and the corresponding entry in the data table is reset. The merger table is also updated, any subsequent pixels assigned the label ‘3’ will be read as ‘1’. Chapter 4. Region of Interest Isolation 61

FPGA implementations of the single pass algorithm are described in [133,140,141]. In order to extract ROI in a single image scan, the algorithm requires a fully pipelined architecture. Multiple memory units are required to store labels from the previous row, equivalence tables, and the data associated with each region. The total memory consumption in bits of the single pass algorithm can be calculated with:

T otal Mem. = memMT + memDT + memRB (4.8)

where memRB is the memory required to implement a row buffer to store label data from the previous row. The execution time in clock cycles can be calculated as

Single P assExe = (N ∗ M) (4.9)

The memory requirements and execution times of each CCL algorithm assuming a resolution of 320x240, and a maximum label count of 255 labels are presented in Table 4.4. The results show that the single pass method yields the shortest execution time and also consumes the smallest amount of resources, similar to the findings in [56,142]. These performance figures make it the most suitable method of CCL for hardware implementation of the SRG algorithm presented in this thesis.

Table 4.4: FPGA based CCL performance comparison for 320x240 Images with a Maximum Label Count of 255. Memory is presented in kilobits (kb), and execution time is presented in clock cycles. CCL Method Memory Requirements (kb) Clock Cycles

Two Pass 1,280 2 * (N*M ) Contour Tracing 158 ≤ 3 * (N*M ) Single Pass 12 (N*M ) Chapter 4. Region of Interest Isolation 62

4.6.1 Algorithm Timing Analysis

Before implementing the SRG algorithm in a Hardware Description Language (HDL), a preliminary timing analysis of the algorithm using the findings of the CCL review was conducted to determine the best use of FPGA resources while still achiev- ing real-time performance. An initial analysis was conducted to determine the exe- cution time of the algorithm and the number of parallel branches of SRG that are required to achieve real-time performance. The analysis was conducted based on an image size of 320x240 pixels (QVGA), the application of 64 binary thresholds, and a maximum of 32 labels. A FPGA clock speed of 125MHz (8ns per clock cycle) was used as this matched the FPGAs on-board PCI express bus clock speed and allowed for simpler communication between the CPU and FPGA. A binary threshold followed by a label assignment process takes 2 clock cycles to perform, therefore it can be determined that the total execution time for the thresholding and CCL stages is equal to the number of pixels in the frame in FPGA clock cycles, plus an additional clock cycle for the initial thresholding operation. The total execution time of the thresholding and CCL stages is calculated with:

SRG T hreshold CCL ExecutionT ime = (P (N ∗ M)) + 1 (4.10)

where P is the number of stages in each parallel branch. The duplicate removal stages cannot be included in this calculation as the CCL stage needs to be completed to ensure every region has been labelled. The execution time for the duplicate removal stage is calculated with:

SRG DR ExecutionT ime = T ∗ (L2) (4.11)

where T is the number of labelled binary images and L is the maximum number of labels in the image. A list of calculated execution times for multiple parallel combinations is shown in Table 4.5. Chapter 4. Region of Interest Isolation 63

Table 4.5: Labelling configurations for seeded region growing and associated execution times. All times are displayed in milliseconds (ms). Parallel Config. Exe. Time CCL Exe. Time DR Exe. Time Total

64 x 1 0.61 0.52 1.13 32 x 2 1.22 0.52 1.74 16 x 4 2.45 0.52 2.97 8 x 8 4.91 0.52 5.43 4 x 16 9.83 0.52 10.32 2 x 32 19.6 0.52 20.12 1 x 64 39.3 0.52 39.82

The execution time for the 64x1 implementation is 1.13ms resulting in a fast execution time but more resources are required to perform 64 operations in parallel in 2 clock cycles. The 1x64 implementation is 39.82ms, this configuration requires the least resources but the execution time does not leave enough time for the remaining stages of the algorithms to perform in real-time. The 8x8 configuration results in an execution time of 5.43ms, which was deemed to be an acceptable trade-off between resources and execution time. It also allows sufficient time for classification of ROI and tracking of pedestrians on the CPU. Chapter 4. Region of Interest Isolation 64

4.7 SRG Hardware Architecture

The SRG design consists of four key elements:

1. Seeded Region Growing Finite State Machine

2. Binary Thresholding

3. Connected Component Labelling

4. Duplicate Removal

A functional partition containing the various stages of the proposed system is shown in Figure 4.14.

4.7.1 Seeded Region Growing Finite State Machine

A Finite State Machine (FSM) is used to control the SRG design. The FSM stays in an idle state until the SRG en signal is asserted by the host CPU. The FSM then begins the binary thresholding and CCL stages until the thresh ccl done signal is asserted. The next stage of the pipeline is the duplicate removal stage, this examines the bounding boxes isolated in each thresholded image and attempts to find the best fitting bounding box for each region in the current frame. When the duplicate removal stage is completed the dr done and SRG done signals are asserted and the FSM notifies the CPU that the ROIs for the current frame have been isolated.

4.7.2 Binary Thresholding

A pixel is transferred to the FPGA every 8 FPGA clock cycles from the CPU over the PCIe bus. Eight thresholds are performed by each of the 8 parallel binary thresholding blocks in the SRG design. A functional partition of the binary thresholding block is shown in Figure 4.15. A counter is used to decrement the threshold value at each clock cycle; the counter Chapter 4. Region of Interest Isolation 65

Figure 4.14: Overview of seeded region growing architecture. is reset after eight clock cycles and the next pixel is selected from the input bus. The result of the binary threshold operation is sent to the CCL stage in the next clock cycle as the system operates on streamed pixel data. Chapter 4. Region of Interest Isolation 66

Figure 4.15: Functional partition of Binary Thresholding stage of SRG design.

4.7.3 Connected Component Labelling

The CCL design described in this thesis is based on a combination of the single- pass approach presented in [133] and a novel pipelined architecture developed for this pedestrian detection ADAS. Multiple pixels are labelled in a single clock cycle compared to the original single pass method that labelled a single pixel per clock cycle. The architecture of the CCL block can be broken down into the following sections:

• Label Assignment

• Row Length Buffer

• Region Parameter Storage

4.7.3.1 Label Assignment

The label assignment operation can be described as the assignment of a unique label l taken from a set of integral values L ⊂ C, to each connected component. Consider a binary image frame B, where all the pixels correspond to the background

(Fb = 0) or to foreground objects (Ff = 1), is transformed into a frame where each Chapter 4. Region of Interest Isolation 67

pixel is represented by a label corresponding to the region it is a part of. Here, 1 ≤ k ≤ K defines the total number of connected components within the frame. For an N *M binary image, each pixel is denoted by p(x, y), where 0 ≤ x ≤ (N -1) and 0 ≤ y ≤ (M -1). Labelling of B can be written as g : B → C, where p(x, y) is described with:

 Fb if I(x, y) = Fb, p(x, y) = (4.12)

Ff if I(x, y) ∈ Lmax. The label assignment block is a key component in the SRG design, assigning labels to the input pixel based on its neighbour’s values. A functional partition of the label assignment stage is shown in Figure 4.16. A forward scanning mask is used when labelling the current pixel. A total of 4 scenarios can occur during the label assignment process:

• Background pixels are assigned a value of ‘0’.

• If the neighbouring pixels are background pixels a new label is created and assigned.

• If neighbouring pixels share a common label the current pixel is assigned the common label.

• If neighbouring pixels have different labels, the lower value of the conflicting labels is assigned to the current pixel and the merger table and data table are updated to reflect the merger.

The input ‘pixelIn’ is a binary pixel input to the label assignment block from the binary thresholding stage. It is first determined if the input pixel is a foreground or background pixel. If the input is a foreground pixel, the neighbouring pixel B is analysed. If B is a foreground pixel, its associated label can be assigned to the input pixel without checking the remaining region labels. This is because a merger can only Chapter 4. Region of Interest Isolation 68

Figure 4.16: Functional partition of Label Assignment stage of SRG design.

occur if B is a background pixel and a new label can only be assigned if all of the neighbouring pixels are background pixels. A label merger occurs if a region is encountered and its neighbours have been assigned 2 different labels. In this situation, one of the two labels will continue to be used, and all instances of the other label need to be replaced with the retained label. A merger will only occur if the current pixel’s neighbour B: (x, y-1) is a background pixel and neighbours C: (x+1, y-1), A: (x-1, y-1) and D: (x-1, y) are different labels; in all other cases the labels will already be equivalent from prior processing. If a merger does occur, the label with the lower value will be assigned to the current pixel and the dimensions of the region associated with the higher label will be merged with the lower label’s region data. An example of a merger is shown in Figure 4.17, the conflict occurs at the location denoted by x where the neighbour D has been assigned ‘1’ and the neighbour C has been assigned ‘2’. The labels are resolved and the data Chapter 4. Region of Interest Isolation 69

Figure 4.17: Example of a label merger scenario. A merger occurs at the location denoted by ‘x’, the neighbour D in the forward scan mask has been assigned ‘1’ while the neighbour C has been assigned ‘2’. related to region 1 is updated while info related to region 2 is set to NULL. A label active index is maintained at all times so labels that have been merged can be re-assigned at the earliest opportunity. If a label is active its entry in the index is asserted, if a merger occurs and the region label is no longer in use it is de-asserted. This reduces the resources required by the system due to label reuse. Current pixel co-ordinates can influence the neighbouring labels: e.g. if the current pixel’s co-ordinate is on the right edge of the image, the neighbouring C label is automatically set to a background pixel. At the beginning of the row (B(0, y)), both the A and D neighbour labels are set to background pixels.

4.7.3.2 Row Length Buffer

The labels from the previous row are stored in a FIFO in the row storage block. The output of each FIFO is sent to the current image’s corresponding merger table. Shift registers are used to implement the A, B, C and D registers (forward scan 8-way Chapter 4. Region of Interest Isolation 70

Figure 4.18: Functional partition of Row Length Buffer.

mask). The registers are capable of holding 8 x Log2(L) bits, where L is the number of labels required for the application. A functional partition of the Row Length Buffer stage is shown in Figure 4.18. The merger table serves as a look-up table on the output of the row storage FIFO. If a merger occurs, this table is updated to reflect the merger. A read and a write have to be performed in the same clock cycle as the data is streamed, as a delay would cause errors in the labelling process; as a result of this, the merger table was implemented using dual-port BRAM. The label that is replaced is updated to point to the merged label. An example of the merger table process is displayed in Figure 4.19. The merger table in Figure 4.19(a) has not yet experienced a merger within the current frame, however a merger has occurred between labels 2 and 4 in Figure 4.19(b). The output of the merger table is placed into the ‘C ’ register (Figure 4.18) to be Chapter 4. Region of Interest Isolation 71

(a)

(b)

Figure 4.19: Example merger tables. (a) No merger has occurred in the current frame. A 2 is input and a 2 is output from the table. (b) A merger has occurred between regions 2 and 4 in the current frame and the table has been updated. A 4 is input to the table and a 2 is output from the table. sent to the label assignment stage. When the next label is output from the merger table, the data in ‘C ’ is transferred to ‘B’ and the data previously in ‘B’ is sent to ‘A’. This ensures the forward scan mask always contains the correct labels for the current pixel being analysed in the label assignment stage. The merger table is used to resolve merger conflicts that occur during the labelling process. If a merger occurs this table is updated to reflect the merger. This serves as a look-up table for the data read from the FIFO. The merger table was implemented using dual-port Block RAM (BRAM).

4.7.3.3 Region Parameter Storage

If a foreground pixel is encountered within the image, the data table (Figure 4.20) is updated to reflect the parameters associated with the current label; these parameters are the (xcoor, ycoor, xlen, ylen). Chapter 4. Region of Interest Isolation 72

Figure 4.20: Functional partition of Region Parameter Storage stage of SRG design.

A functional partition of the Region Parameter Storage design is displayed in Figure 4.20. If the signal newLabel is asserted, a new region is defined in the Data Table BRAM and is assigned the current (x, y) co-ordinates, and its length in both the x and y directions are set to 1. If existLabel is asserted, the existing region data is read from the Data Table. The region is updated with current data and overwrites the previous entry in the Data Table. If mergerLabel is asserted, the data present at both region’s locations is read and the data associated with the higher label is merged with the lower label. The higher label’s data is cleared to 0 and stored in the data table along with the updated data for the lower label. The Region Parameter Storage is implemented using FPGA dual-port RAM. A total of eight BRAMs are used to store information related to each binary frame, amounting to a total of 64 BRAMs over the entire parallel CCL design. Chapter 4. Region of Interest Isolation 73

When all of the binary pixels for the current frame have been labelled the re- gions are filtered according to their geometric properties as they are streamed to the Duplicate Removal stage of the SRG implementation.

4.7.4 Region Filtering

The number of regions labelled after the CCL process has completed can be re- duced by filtering the regions based on their aspect-ratio and location within the current FIR frame.

4.7.4.1 Aspect Ratio

Pedestrians are generally taller than they are wide if they are in an upright po- sition. Based on this assumption the labelled regions can be filtered based on their aspect-ratio:

w AspectRatio = (4.13) h where w is the width of the bounding box and h is the height of the bounding box. This is a measure of the proportion of the pedestrian’s height to its width. Pedestrians in an image from a camera mounted on a vehicle will generally appear taller than they are wide, (this assumes that the pedestrians are not seated, crouched or lying down). The aspect-ratio of a region can be calculated by discarding the MSB of the region’s height (effectively halving the value) and comparing it to the region’s length. If the width is less than the height, the ROI meets the requirements and is stored for further examination. Chapter 4. Region of Interest Isolation 74

4.7.4.2 Location Based Filtering

The number of ROI can be further reduced based on the position and relative size of their bounding boxes within the FIR frame. A pedestrian situated far from the sensor will have a bounding box with an upper vertex located close to the horizon, and the height and width of the ROI will be quite small. Pedestrians located close to the sensor will have an upper vertex situated further from the horizon and will have a taller, wider bounding box. Based on these observations, threshold values were empirically derived and used to define an Area Of Interest (AOI) within the image. ROI that have an upper vertex located outside of this AOI are discarded. ROI with an upper vertex located inside of the AOI are further analysed and based on the location of the upper vertex relative to the height of the ROI, it is either discarded or stored for classification. The values used to filter the ROI are scaled, based on the location of the upper vertex within the AOI. This method removes a large number of common false positives, such as street lights and wheel arches, while preserving the true-positives within the image for classification. A number of samples of the output from the ROI isolation stages are shown in Figure 4.21. Figures 4.21(a) and 4.21(b) display the output of the seeded region growing stage, where a ROI encompassed by a yellow bounding box has satisfied the aspect ratio metric, the boundaries of the AOI are overlaid (area located between the green lines in the example frames displayed). Figures 4.21(c) and 4.21(d) display the resulting filtered regions according to their location and size. The large bounding boxes in the background of Figure 4.21(c) are removed. The bounding boxes of the traffic lights and the building windows have an upper vertex outside of the AOI and are discarded in Figure 4.21(d). This method of ROI isolation is more adaptable to different pedestrian poses and variations in ambient temperature than a window based method. This leads to fewer classification operations by the SVM resulting in fewer opportunities for the SVM to generate false positives, and a lower computational overhead for the system. Chapter 4. Region of Interest Isolation 75

(a) (b)

(c) (d)

Figure 4.21: Sample images from the outputs of seeded region growing: (a)-(b) ROI that satisfy aspect ratio filter are highlighted in yellow, the AOI is located between the green lines within the image, (c)-(d) ROI that satisfy the distance estimation filter are highlighted in purple. The large candidates in the background containing non-pedestrians are removed prior to classification.

4.7.5 Duplicate Removal

The filtered ROIs are then compared based on their co-ordinates. The regions that are being compared in this stage are referred to as ROI A and ROI B. The data table containing the regions of the highest intensity threshold (T ) is compared to the next highest intensity (T - 1 ) and results are stored in a temporary BRAM until the final comparison where the results of the comparison are sent to the host CPU Chapter 4. Region of Interest Isolation 76

Figure 4.22: Functional partition of Duplicate Removal stage. for classification. A functional partition of the duplicate removal process is shown in Figure 4.22. A total of three scenarios can occur during the ROI bounding box comparison process:

• A match is found between ROI A and ROI B

• No match is found for ROI A

• No match is found for ROI B

If a match is made between ROI A and ROI B, the data present in ROI B is stored as this will be slightly larger than ROI A due to the binary threshold value used to isolate the region. The address of ROI B is marked in the B active index. If no match Chapter 4. Region of Interest Isolation 77

is found for ROI A after all 32 values of ROI B have been analysed, then ROI A is stored. After all 32 values of ROI A have been examined, the B active index is exam- ined and any regions that have not been marked as matched with another region are read from BRAM B and stored. This method results in an accurate bounding box for ROI within the image and can lead to more accurate classification. A more compre- hensive view of the internal structure of the bounding box comparator is displayed in Figure 4.23.

Figure 4.23: Bounding box comparator of Duplicate Removal stage. Chapter 4. Region of Interest Isolation 78

4.8 Resource Utilisation

The SRG hardware design was captured in Verilog using Altera’s Quartus 12 software and simulated in ModelSim. The design was implemented on the Altera Arria GX II EP2AGX125EF35C FPGA [119]. The resources utilised by the hardware design are shown in Table 4.6.

Table 4.6: Resources utilised by the hardware accelerated Seeded Region Growing implementation. Device Thresholding Connected Comp- Duplicate Total Utilisation onent Labelling Removal Logic Utilisation <1% 48% 2% 51% Combination ALUTS 128 (<1%) 33,041 (34%) 1,476 (1%) 34,645 (36%) Memory ALUTs 0 0 0 0 Dedicated Logic Registers 60 (<1%) 25,546 (9%) 133 (<1%) 25,739 (10%) BRAM Bits 0 262,144 (54%) 1,024 (< 1%) 263,168 (55%)

4.9 Hardware Accelerated SRG Execution Times

The updated execution times of the ROI isolation stage are presented in Table 4.7. The results indicate a significant improvement over the previous software imple- mentation. A 97.93% reduction in the execution time of the ROI isolation stage was achieved by accelerating the SRG algorithm with dedicated hardware (390ms down to 5.1ms). There remains a total of 31.57ms after CDC (3.3ms) and SRG (5.1ms) for subsequent classification of isolated ROI and tracking of successfully classified ROI in the subsequent stages of the FIR pedestrian detection algorithm. Chapter 4. Region of Interest Isolation 79

Table 4.7: Execution times of Clothing Distortion Compensation and hardware ac- celerated Seeded Region Growing. Times shown are in milliseconds (ms). Execution Time Clothing Distortion Compensation (ms) 3

Seeded Region Growing (ms) 5.31 Total (ms) 8.31

4.10 Hardware Accelerated SRG Results

The algorithm was tested on a database of 15,000 frames of FIR footage captured using an automotive grade FIR microbolometer, the same footage used to compare static thresholding and SRG in software. The hardware accelerated SRG algorithm isolated 86% of pedestrians in the footage achieving the same level of ROI isolation as its software counterpart.

4.11 Summary and Conclusion

This chapter proposed a novel SRG architecture that can accurately isolate ROI in FIR images in real-time using dedicated FPGA hardware. A comparison based on the number of pedestrians isolated was made between the use of static threshold values and SRG. The maximum number of pedestrians isolated using the static threshold method was 52% while the SRG method isolated 86%. This leads to a higher detection rate as more ROI are available for classification, but also results in an execution time outside the bounds of real-time performance. A review of FPGA based CCL methods was performed and the single pass method was chosen to be implemented on the FPGA as it required the least amount of resources and yielded a lower execution time than the other CCL methods listed. The hardware accelerated SRG method decreases the execution time from a worst- case scenario of 390ms to 5.31ms allowing real-time isolation of ROI from FIR video. Chapter 4. Region of Interest Isolation 80

Finally, the resources required to implement the SRG design on the Altera Arria GX II FPGA were identified; the design uses 51% of the FPGA on-board resources. The functional performance of the hardware accelerated SRG algorithm is equal to that of the software implementation, with 86% of pedestrians isolated. The results of this chapter indicate that hardware acceleration can result in a significant decrease in execution time of low-level pixel operations in an embedded ADAS without sacrificing detection rates; the detection rates of the FPGA based SRG architecture are equal to that of the CPU based SRG algorithm (86%). Hard- ware acceleration of low level image processing tasks could potentially be extended beyond pedestrian detection in the FIR spectrum to a number of other automotive applications that could benefit from a highly pipelined parallel architecture. The next chapter describes classification of the ROI isolated by SRG as either pedestrian or non-pedestrian objects on the x86 CPU, using a SVM classifier trained with novel HOG-LBP features. Chapter 5

Region of Interest Classification

5.1 Introduction

Accurate classification of data is an extremely important stage in any application based on machine vision. This is especially true in a pedestrian detection system, where false information conveyed to the driver could result in them losing trust in the system, or worse, it could lead to a serious collision with a pedestrian. This chapter proposes the combination of Histogram of Oriented Gradients (HOG) and Local Binary Patterns (LBP) for ROI classification in FIR images. The combina- tion of HOG and LBP feature vectors has been demonstrated in day-time pedestrian detection systems [143] where it has acheived higher detection rates than systems that exclusively used HOG features to train a Support Vector Machine (SVM). The extraction of HOG and LBP features from ROI isolated by the FPGA-based SRG method and the process of training a SVM classifier for pedestrian classification is described. The chapter then discusses pedestrian tracking between frames for added system robustness. Classification results for tests on a database of FIR images and streams of FIR video are then presented. The execution times of the classification and tracking stages in software are then determined. Finally, conclusions are drawn based on the findings of this chapter and the detection results of the FIR pedestrian

81 Chapter 5. Region of Interest Classification 82 detection system described in this thesis is compared to other works.

5.2 Region of Interest Feature Extraction

This section discusses the extraction of HOG, LBP, and HOG-LBP features from isolated ROIs. These features are used by the classifier to determine if the isolated ROI contains a pedestrian or non-pedestrian object.

5.2.1 Histogram of Oriented Gradients

To generate a HOG feature vector for a ROI, the ROI must first be scaled to a fixed size, in this case 20x40 pixels [83] prior to generating its HOG feature vector.

(a) (b) (c) (d)

Figure 5.1: Images from the stages of generating a Histogram of Oriented Gradients (HOG) feature vector: (a) pedestrian ROI, scaled to 20x40 pixels, (b) gradient image, (c) ROI divided into cells of 5x5 pixels, resulting in 4 cells x 8 cells, (d) HOG descriptor for the ROI showing the gradient orientation histograms in each cell.

The gradient image of the ROI is first calculated by performing a convolution with horizontal and vertical gradient kernels and combining the results. An example of a gradient image is shown in Figure 5.1(b). The gradients used are: Chapter 5. Region of Interest Classification 83

Gx = [−1, 0, 1] ∗ A (5.1)

T Gy = [−1, 0, 1] ∗ A (5.2)

where A is the current ROI. The resultant gradient image is divided into cells of 5x5 pixels, as shown in Figure 5.1(c). A histogram of edge orientations is computed for each cell, where each pixel is placed in a histogram bin based on the magnitude of the gradient at that pixel. Each bin effectively represents the “strength” of an edge through 9 orientations from 0◦ to 180◦ for a total of 9 histogram bins. When all of the histograms have been computed, a single feature vector descriptor is formed by concatenating all of the histograms. Contrast normalisation must be applied to ensure good performance. The resulting feature vector is normalised with L2-norm [144]. The L2-norm of an un-normalised vector (v) containing histograms of a block (u) is:

u v = (5.3) p 2 2 ||u||2 + e

where e is a small constant (e=0.001) whose purpose is to prevent division by 0. The value for v is limited to a maximum value of 0.2 [48, 144]. This is referred to by Dalal and Triggs as L2-Hys normalisation [83]. This results in a descriptor vector that describes a greyscale ROI of 20x40 pixels. A HOG feature is generated for each 20x40 FIR image (an example of a HOG descriptor is shown in Figure 5.1(d)). A summary of the parameters used for HOG feature generation are shown in Table 5.1.

5.2.2 Local Binary Patterns

Local Binary Patterns (LBP) were originally proposed as a method of texture classification by [145], but has been expanded in recent years to applications such Chapter 5. Region of Interest Classification 84

Table 5.1: Parameters used in HOG feature vector generation.

Parameter Value

Image Height 40 pixels

Image Width 20 pixels

Cell Width 5 pixels

Cell Height 5 pixels

No. of Orientations 9 (0-180◦)

Overlap 0.5

Normalisation Method L2-Hys as face detection and day-time pedestrian detection where it has shown promising results. Multiple structuring elements can be used to construct a LBP feature vector. The authors in [145] claim that a large structuring element resulted in a higher detection rate than small structuring elements on images with a resolution of 256x256 pixels. Examples of LBP structuring elements are shown in Figure 5.2.

(a) (b) (c)

Figure 5.2: Examples of structuring elements used in LBP feature vector generation, x denotes the current pixel, circles denote a sampling point: (a) 3x3 (b) 5x5 (c) 7x7. Chapter 5. Region of Interest Classification 85

5.2.3 Computation of LBP Feature Vector

(a) (b) (c)

Figure 5.3: Images from the stages of generating a Local Binary Pattern (LBP) fea- ture vector: (a) Pedestrian ROI, scaled to 20x40 pixels with 3x3 structuring element highlighted in red, (b) Outer values of LBP 3x3 structure thresholded against cen- ter value (206), (c) 8-bit binary number (14) generated from structuring element converted to decimal for storage in histogram bin.

Local Binary Pattern features are determined for a ROI by traversing the ROI one pixel at a time with a structuring element, e.g. the authors of [145] used a structuring element of 3x3 pixels. In essence, the LBP value for a structuring element is calculated by thresholding the outer pixels against the centre pixel of the structuring element (Figure 5.3(b)) and the resultant 8-bit binary number is then converted to an integer (Figure 5.3(c)). Once the image has been traversed and integers have been computed for all pixels, a histogram is generated. The structuring element can be extended to use differing radii, the authors in [145] found that a larger number of sampling points and radii provided higher detection rates in classifying textures in images with a resolution of 256x256 pixels.

This thesis uses the notation “LBPP,R” to denote a LBP structure, where P is the number of sampling points used and R is the radius of the structuring element. When a sampling point does not fall on integer co-ordinates (edge pixels), the pixel Chapter 5. Region of Interest Classification 86

value at that point is bilinearly interpolated. The LBP label for the centre pixel (p(x, y)) of the structuring element is denoted by:

N=1 X 1, x ≥ 0 LBP p(x, y) = s(n − n )2i, s(x) = (5.4) P,R i c 0, x < 0 i=0 To reduce the total number of values generated by the LBP stage, uniform patterns are employed. In [145] the authors state that uniform patterns make up approximately 90% of textures when a 3x3 structuring element is used for LBP feature generation. A uniform pattern is achieved when the number of spatial transitions in the output is less than or equal to 2. Examples of uniform patterns are “00110000” and “11110111”, examples of various LBP patterns are shown in Figure 5.4. Uniform patterns are shown in Figure 5.4(a) and 5.4(b), non-uniform patterns are shown in Figures 5.4(c) and 5.4(d). The total number of uniform patterns that can be achieved with a LBP8,1 structuring element is 59. When calculating LBP patterns for an FIR frame, 58 histogram bins are used for the various uniform patterns and 1 extra bin is used for storing the total number of non-uniform patterns within the image. The number of uniform patterns generated for larger structuring elements scales according to the number of sampling points used. The parameters used for LBP feature generation are shown in Table 5.2.

Table 5.2: Parameters used in LBP Feature Vector Generation. Parameter Value

Window Size 3x3 pixels

Radius 1

Points 8 Chapter 5. Region of Interest Classification 87

(a) (b)

(c) (d)

Figure 5.4: Examples of uniform and non-uniform local binary patterns, gray circles denote a ‘0’ and white circles denote a ‘1’: (a) - (b) Uniform patterns, number of spatial transitions ≤ 2. (c) - (d) Non-uniform patterns, number of spatial transitions ≥ 3.

5.2.4 HOG-LBP Features

When the HOG and LBP features have been calculated for the ROI, they are concatenated to form a single HOG-LBP feature vector. The fusion of HOG features with LBP features as training data for a SVM has been utilised for pedestrian de- tection in the visible spectrum where increased detection rates have been achieved compared with a SVM trained purely using HOG vectors [143]. Chapter 5. Region of Interest Classification 88

5.2.5 Training Data

For training and validation purposes, a database of greyscale images was generated by manually extracting and annotating ROI from on-road FIR video data captured in urban, sub-urban and rural environments. A total of 2,000 ROI were extracted and manually labelled (1,000 pedestrian, 1,000 non-pedestrian). A variety of pedestrian poses with differing levels of clothing distortion were used to provide a representa- tive sample of real-world pedestrian targets. Figure 5.5 illustrates some examples of pedestrian and non-pedestrian database entries. Both HOG and LBP feature vectors were calculated for each image in the database. These features were then concatenated and used to form HOG-LBP features to train the SVM classifier.

(a) (b) (c) (d) (e) (f) (g) (h)

(i) (j) (k) (l) (m) (n) (o) (p)

Figure 5.5: Examples of pedestrian and non-pedestrians used for training the SVM classifier. All images are resized to 20x40 pixels for training purposes: (a)-(h) pedes- trians, (i)-(p) non-pedestrians. Chapter 5. Region of Interest Classification 89

5.2.6 Support Vector Machine Classifier

A SVM classifier is used in conjunction with the combined HOG-LBP feature vectors for classification of isolated ROI. SVMs have been used in a range of fields including facial recognition [146, 147], gesture recognition [148], breast cancer detec- tion [149, 150] and pedestrian detection [14, 52, 62]. A SVM operates by calculating the optimal separating hyperplane between classes in higher dimensional space. In this work, a SVM is used to classify ROI isolated during the previous stage as ei- ther pedestrian or non-pedestrian based on their HOG-LBP features. A Radial Basis Function (RBF) has been used as a kernel (K ) for SVM classification in this thesis:

2 K(x, y) = e−γ||x−y|| (5.5)

where x and y are input feature vectors and γ determines the width of the bell curve the RBF kernel maps feature vectors to in higher-dimensional space. A RBF kernel is well suited to training with a small number of features. A grid search was used to determine the optimal SVM parameters, cost (C ) and gamma (γ) based on the suggestions in [151].

5.3 Pedestrian Tracking

Tracking of pedestrians between frames adds robustness to the FIR pedestrian de- tection system. Information extracted from previous frames can be used to refine the search parameters for previously detected pedestrians in the current frame. Tracking can also be used to estimate the location of a pedestrian if classification temporarily fails due to partial or full occlusion. A Kalman filter [93] has been used in this work to track pedestrians in sequential frames. Kalman filters have been used in automotive applications such as pedestrian detection [14, 94], lane departure warning [11] and vehicle detection [8, 9, 152]. Tar- gets are tracked using four parameters associated with the pedestrian bounding box: Chapter 5. Region of Interest Classification 90 x-position, y-position, width and height, and these parameters are used to form a mea- surement vector (z). Predictions of the state vector (ˆx) and state error covariance matrix (P−) are generated for a ROI at time k:

− xˆk = Axˆk−1 (5.6)

− T P k = AP k−1A + Q (5.7)

where A is the state transition matrix and Q is the process noise covariance matrix. The role of A is to relate the state at the previous time step (k - 1) to the current time step (k), in the absence of either a driving function or process noise. For the proposed system, A is:

  1 0 0 0 1 0 0 0     0 1 0 0 0 1 0 0     0 0 1 0 0 0 1 0     0 0 0 1 0 0 0 1 A =   (5.8)   0 0 0 0 1 0 0 0     0 0 0 0 0 1 0 0     0 0 0 0 0 0 1 0   0 0 0 0 0 0 0 1 The resulting updated state vector (ˆx) is: Chapter 5. Region of Interest Classification 91

  xk−1 + ∆xk−1      yk−1 + ∆yk−1      wk−1 + ∆wk−1     −  hk−1 + ∆hk−1  xˆ =   (5.9) k    ∆xk−1       ∆yk−1       ∆wk−1    ∆hk−1

The resulting state vector predictions are used to associate targets in previous frames with detections in the current frame. The measurement matrix (H ) at time k for the proposed system is:

  1 0 0 0 0 0 0 0     0 1 0 0 0 0 0 0 Hk =   (5.10)   0 0 1 0 0 0 0 0   0 0 0 1 0 0 0 0

The measurement noise covariance matrix R of the Kalman filter determines the sensitivity of the tracker to updates. Higher values of measurement covariance will result in smoother movement and less weighting on detections in the current frame, while a small value will result in a more responsive tracker and heavier weighting on the current measurements. However a value that is too small can cause the tracker to become unstable during detection failures. For the video data used in this research (320x240 pixels, 25Hz frame rate) a value of 0.1 for the elements of R was found to be suitable for ensuring the tracker was responsive and remained stable in the presence of noise caused by variations in the road surface. For the proposed pedestrian detection system R is: Chapter 5. Region of Interest Classification 92

  0.1 0 0 0      0 0.1 0 0  R =   (5.11)    0 0 0.1 0    0 0 0 0.1

The co-ordinates of each classified pedestrian are stored between frames, the pre- vious frame’s co-ordinates are then compared to classified ROI in the current frame. If there is a low correlation between the current pedestrian co-ordinates and the co-ordinates found within the previous frame, a new tracker object is defined. If there is a high degree of correlation between the current set of pedestrian position co-ordinates being analysed and co-ordinates found in the previous frame, a counter is incremented to keep track of the number of times a pedestrian has been classified in a series of consecutive frames. If a ROI is classified in more than 5 consecutive frames, the system then creates a tracker object for the ROI and draws a bounding box around the ROI on the in-vehicle display. Tracking has been found to remove a large number of false positives from the system, as false positives tend to occur for only a short duration, (generally 1-2 frames at a time). If a target goes undetected for more than 10 frames then it is deemed out of view of the camera and its associated tracker object is discarded, this allows for regions to go undetected for a short period of time without discarding them. This method of tracking pedestrians allows for more robust system performance and has been shown to generate fewer false positives than a system without tracking [14].

5.4 Classification Results

FIR video has been captured with an FIR microbolometer sensor at a rate of 25fps and a resolution of 320x240 pixels (QVGA). The sensor was mounted on the front of a vehicle. The majority of video was captured in low temperature (−3◦C to Chapter 5. Region of Interest Classification 93

8◦C) winter environments, since pedestrians are more likely to be wearing insulating clothing, particularly challenging for the morphological clothing compensation algo- rithm. Video footage has been captured in urban, sub urban and rural environments at speeds ranging from 0-100km/h. The HOG, LBP, and HOG-LBP trained classi- fiers were tested on the database of 2,000 FIR images extracted from recorded FIR video; the database is similar in size to the databases used in [51, 52]. Testing was also performed on 15,000 frames of recorded FIR video, the testing streams of FIR video are independent of the captured video used to create the database. The performance of the SVM classifier is presented in two subsections: the first section focuses on detection of pedestrians in individual frames. The second section focuses on the performance of the classifier on streams of FIR footage. All of the data within the FIR streams is independent of the data used to train the SVM classifier.

5.4.1 Performance Metrics

A range of approaches have been used in the literature to quantify performance in FIR pedestrian detection algorithms. A performance evaluation of pedestrian detec- tion systems was presented in [153] noting that Receiver Operating Characteristics (ROC) curves are an effective tool for quantifying performance. Results are pre- sented in the form of a detection rate with/without tracking and a false positive rate. The detection rate is the proportion of frames in which a pedestrian is successfully detected (d), out of the total number of frames in which they are present (n):

d Detection rate = (5.12) n The detection with tracking rate is the proportion of frames in which a pedestrian is successfully detected or tracked (t), out of the total number of frames in which they are present: Chapter 5. Region of Interest Classification 94

t Detection with tracking rate = (5.13) n The recording and measuring of the number of false positives is an important factor in a pedestrian classification system. The false positive rate is a measure of the number of regions the classifier falsely determines to be a pedestrian (f ) out of the total number of frames:

f F alse positive rate = (5.14) n

5.4.2 Classifier Performance

The combination of multiple structuring elements has been shown to provide bet- ter detection rates for texture classification [145]; therefore, a range of structuring elements have been used. The results of tests with detection using multiple struc- turing elements is presented in Table 5.3. These results show that combinations of different scales of LBP structuring elements yield no improvement in detection rates

in this application. The structuring elements that yield the best results were LBP8,1 and LBP16,2. These are fused with HOG features to create HOG-LBP8,1 and HOG-

LBP16,2 features to determine the arrangement that yields the best detection rates in

SVM training. The larger structuring element LBP24,3 does not perform as well as the smaller structuring elements since image dimensions of the structuring element exceed that of the scaled image, i.e. 20x40 pixels. Cross validation has been used to test the performance of the SVM when trained with each of the various feature vector sets. Cross validation has been performed with K =10, which divided the database of images used for training into 10 groups. Nine groups were used to train the SVM and one group for testing, this process is repeated with different groups until every group has been classified. Cross-validation increases the time it takes to train the classifier but it ensures that each image is Chapter 5. Region of Interest Classification 95

Table 5.3: Confusion matrix displaying detection rates for multiple LBP structuring elements. Structuring Element Predicted Class

LBP8,1 Pedestrian 0.91(TP) 0.17(FP) Non-Ped 0.09(FN) 0.83(TN)

LBP16,2 Pedestrian 0.90(TP) 0.09(FP) Non-Ped 0.10(FN) 0.91(TN)

LBP24,3 Pedestrian 0.72(TP) 0.12(FP) Non-Ped 0.28(FN) 0.88(TN)

LBP8,1 +16,2 Pedestrian 0.90(TP) 0.11(FP) Non-Ped 0.10(FN) 0.89(TN)

LBP8,1 +24,3 Pedestrian 0.72(TP) 0.09(FP) Non-Ped 0.28(FN) 0.91(TN)

LBP16,2 +24,3 Pedestrian 0.73(TP) 0.09(FP) Non-Ped 0.27(FN) 0.91(TN)

LBP8,1 +16,2 +24,3 Pedestrian 0.76(TP) 0.08(FP) Non-Ped 0.24(FN) 0.92(TN) used for both training and testing. This gives a more comprehensive characterisation of classifier performance on unseen data. The ROC curve for each type of feature vector is presented in Figure 5.6(a), the upper left hand corner of the curve is enlarged for clarity in Figure 5.6(b). The ROC curves presented are generated based on the performance of the classifier on the database of 2,000 images. The confusion matrix for the SVM classifier is presented in Table 5.4. The oper- ating point for the confusion matrix has been chosen at a detection rate of 0.98 and a false positive rate of 0.01 for the HOG-LBPP,R + SVM classifier. This point gave the best trade-off between detection rates and false positive rates. The SVM parameters Chapter 5. Region of Interest Classification 96

(a)

(b)

Figure 5.6: ROC Curves for a selection feature vectors: (a) ROC curve containing results of cross-validation tests for HOG, LBP and HOG-LBP, (b) enlarged view of upper left portion of ROC curve. Chapter 5. Region of Interest Classification 97

that yielded this performance were a cost (C ) of 2048 and RBF kernel parameter of (γ) 10.

Table 5.4: Confusion matrix for SVM pedestrian classifier trained with various feature vectors on a database of FIR imagery. True positive (TP), false positive (FP), true negative (TN) and false negative (FN) detection rates are all displayed Predicted Class

Pedestrian Non-Ped

HOG Pedestrian 0.94(TP) 0.02(FP) Non-Ped 0.06(FN) 0.98(TN)

LBP8,1 Pedestrian 0.90(TP) 0.16(FP) Non-Ped 0.10(FN) 0.84(TN)

HOG- Pedestrian 0.98(TP) 0.01(FP)

LBP8,1 Non-Ped 0.02(FN) 0.99(TN)

HOG- Pedestrian 0.87(TP) 0.01(FP)

LBP16,2 Non-Ped 0.14(FN) 0.99(TN)

The detection rates presented in this work illustrate that a SVM using a fusion of HOG and LBP feature vectors provides more accurate detection results in FIR imagery than individual HOG or LBP with SVM. The true positive detection rate of 98% is a significant improvement over the HOG-only based classifier that demon- strated a detection rate of 94% and the LBP classifier with a detection rate of 90%. The false positive rate of the HOG-LBP classifier is 1%; lower than the HOG only classifier rate of 2% and the LBP only classifier rate of 16%. These figures show that the combination of HOG and LBP feature vectors result in more accurate pedestrian detection in automotive FIR imagery, than using HOG and LBP in isolation. Chapter 5. Region of Interest Classification 98

5.4.3 Classifier Performance on FIR Video

The classification results on real-world FIR footage show that HOG-LBP feature vectors yield significantly higher detection rates and lower false positive rates than just HOG or LBP trained classifiers alone. The performance of the classifier is shown in Table 5.5. A large number of pedestrians are detected at close range and are clas- sified correctly with an SVM trained with HOG-LBP feature vectors. It is difficult to classify pedestrians that are a great distance from the camera (>40m) [75], mainly due to the relatively low resolution of FIR cameras, which results in very little tex- tural information. Examples of pedestrians classified with all three feature sets are presented in Figure 5.7. An expanded number of frames classified using HOG-LBP features are shown in Figure 5.8.

Table 5.5: Results of pedestrian classification on streams of FIR footage. Feature Number of Frames Detection Rate (%) False Positive Rate (%) HOG 15,000 94.6 0.017

LBP8,1 15,000 91.6 0.14

HOG-LBP8,1 15,000 98.41 0.008 Chapter 5. Region of Interest Classification 99

(a) (b) (c)

(d) (e) (f)

(g) (h) (i)

Figure 5.7: Sample frames of output of SVM classifier trained with different feature vectors: (a) - (c) HOG + SVM. (d) - (f) LBP8,1 + SVM. (g) - (i) HOG-LBP8,1 + SVM. True positives are outlined in blue and false positives are outlined in red. Chapter 5. Region of Interest Classification 100

(a) (b) (c)

(d) (e) (f)

(g) (h) (i)

(j) (k) (l)

(m) (n) (o)

Figure 5.8: Sample frames of output of SVM classifier trained with HOG-LBP feature vectors: (a) - (c) close range. (d) - (f) long range. (g) - (i) very little or non-existent clothing distortion. (j) - (l) Multiple pedestrians present in frame. (m) - (o) False positives and a failed detection. True positives are highlighted with a blue bounding box, false positives are outlined with a red bounding box. Chapter 5. Region of Interest Classification 101

5.5 Execution Time

The execution times of the classification stage and tracking stage on the CPU were determined using the Linux gprof profiler [129] and are displayed in Table 5.6. The combined worst case execution time of the classification and tracking stages is 20ms; equivalent to a frame rate of 50fps. This execution time allows for real- time implementation of the FIR pedestrian detection algorithm on the embedded ADAS when combined with the 8.1ms total execution time of the CDC and hardware accelerated SRG stages presented in Chapter 4.

Table 5.6: Execution times of ROI classification and pedestrian tracking stages on the CPU. Times shown are in milliseconds (ms). Execution Time (ms) Minimum Maximum Average HOG-LBP Classification 5 10 6 Kalman Tracking 4 10 5 Total 9 20 11

5.6 Summary and Conclusion

Three methods of classifying isolated ROI were presented in this chapter: HOG, LBP, and a combination of HOG and LBP feature vectors, each feature was calculated for an isolated ROI and classified by a SVM. HOG and LBP features are calculated for each segmented region and concatenated to form a single HOG-LBP feature vector. The feature is passed to a SVM that has been trained with a database of 2,000 pedestrian and non-pedestrian images. The feature vector is classified as either pedestrian or non-pedestrian by the SVM. If an ROI has been classified as a pedestrian in multiple consecutive frames, it is tracked Chapter 5. Region of Interest Classification 102 with a Kalman filter. The location of the ROI can then be estimated in frames where detection may have failed due to occlusion of the pedestrian or a false classification by the SVM. The SVM trained with HOG-LBP features achieves higher detection rates than systems in the literature that utilised either HOG or LBP feature vectors alone to train a SVM classifier for pedestrian classification in FIR imagery. The detection rates of the best pedestrian detection algorithms from the literature are compared with the proposed HOG-LBP system and are presented in Table 5.7, the detection rate accomplished with HOG-LBP vectors clearly outperforms previous methods. The HOG-LBP classifier achieves a true positive detection rate of 98% which is an improvement of 4% on the classifier trained with HOG features alone (94% TP) and an improvement of 7% on a classifier trained with LBP features alone (91% TP). The HOG-LBP trained SVM also achieved a lower false positive rate. The detection rate of the HOG-LBP classifier is also an improvement on previous literature that achieved a maximum detection rate of 95% [14, 48, 87]. The execution time of the classification and tracking stages on the CPU also allows for real-time performance when combined with the ROI isolation stage of the algorithm. The combination of this improved real-time pedestrian detection ADAS with more stringent driver testing, road infrastructure and policing could significantly reduce the number of road fatalities that occur during the hours of darkness resulting in a much safer road environment for pedestrians. In the next chapter final conclusions will be drawn, the main contributions of this work will be stated and directions for possible future work will be discussed. Chapter 5. Region of Interest Classification 103 Tracking Rate Gradients + SVM - Oriented Gradients + - Haar-Based AdaBoost Threshold Mean Shift Thresholding Gradients + Head Detection Histogram of Oriented - Disparity Information Silhouette Matching Distance Estimation Filtering Local Binary Patterns + SVM Table 5.7: Summary of pedestrian detection methods used in FIR spectrum MonoMono DynamicMono Dynamic BinaryMonoMono Seeded Region GrowingMonoMono SVM Histogram Histogram of of Oriented Oriented Gradients SUSAN Kalman SURF + Kalman 95% Kalman, 93% 95% Haar-based Features Intensity Self Similarity + + SVM - + SVM - 95% 97% Stereo Dynamic Binary Threshold +Stereo Silhouette Matching - - 90% Histogram of + SVM - 91% Mono Seeded Region Growing Histogram of Oriented Gradients Kalman 98% Config. Sensor ROI Isolation Object Classification Tracking Detection et al. (2005) [70] Authors Bertozzi Xu et al. (2005) [14] Bertozzi et al. (2007) [51] Ge et al. (2009) [154] O’Malley et al. (2010) [48] Sun et al. (2011) [87] Miron et al. (2012) [49] Hurney et al. Chapter 6

Conclusions and Future Work

6.1 Project Summary and Conclusions

This thesis describes the development, testing, and application of an embedded Advanced Driver Assistance System to detect pedestrians in real-time in FIR video. The research was motivated largely by the fact that a disproportionate number of pedestrian fatalaties occur during night-time hours where visibility is a key factor. Providing a system that warns the driver if a pedestrian is within the vehicle’s path could significantly reduce the number of on-road fatalities and serious injuries. A review of methods used to automatically detect pedestrians in FIR video was presented, and based on this review it was concluded that there was a need for a low power system that could perform in real-time while maintaining high detection rates. To achieve this goal, an embedded ADAS containing an Intel Atom N270 mi- croprocessor and an Altera Arria GX II FPGA was proposed. The FPGA accelerates low level pixel operations used to isolate ROI within the current frame. The CPU is used to classify the isolated ROI as either pedestrians or non-pedestrians and to track successfully classified pedestrians using a Kalman filter. Captured FIR frames are first subject to morphological closing to remove distor- tion found on the pedestrian caused by insulating clothing. The ROI isolation stage

104 Chapter 6. Conclusions and Future Work 105 was profiled in software using a Linux profiler where it was discovered that the SRG stage of the algorithm caused a significant processing bottleneck in the system. To decrease the execution time and allow for real-time performance the SRG algorithm was implemented on a dedicated FPGA co-processor. The architecture reduced the execution time of the seeded region growing stage from a worst case of 390ms to 5.1ms; this is a 98.8% decrease in execution time compared to the software imple- mentation on the CPU. Comparisons with previous methods in the literature that used static thresholds prior to labelling revealed that the hardware accelerated SRG method proposed in thesis is more adept at isolating ROI than previous efforts. A method to reduce the number of regions resulting from the ROI isolation stage was proposed. This method filtered regions based on the dimensions of the bounding box encompassing the region and the location of the region within the frame relative to a defined AOI. This approach eliminated a large number of regions prior to clas- sification such as on-street lighting and hot vehicle components that can potentially contribute to false positives within a night-time pedestrian detection system. A novel feature to improve the accuracy of a SVM based classifier for pedestrians in FIR images has been proposed and described: both HOG and LBP features are extracted from a database of labelled pedestrian and non-pedestrian images to train a SVM classifier. Isolated ROI are classified as either pedestrian or non-pedestrian objects by the classifier based on their respective features. Successfully classified pedestrians are tracked using a Kalman filter, this extrapolates the pedestrians po- sition during periods where detection has temporarily failed and can estimate the future position of the target based on information from previous frames. Detection rates of 98%, and false positive rates of 1% have been achieved by the HOG-LBP trained classifier; this is a 3% improvement on previously reported detec- tion rates and indicates that the proposed combination of HOG and LBP features can lead to increased detection rates in a FIR pedestrian detection system when compared to systems that utilise HOG features exclusively. The embedded ADAS functions in Chapter 6. Conclusions and Future Work 106 real-time at a rate of 25fps (the native frame rate of the microbolometer). The total power consumption of the system is approximately 3.6W; this power consumption is significantly lower than previous real-time pedestrian detection systems targeted for low-light conditions. The results of the pedestrian detection ADAS described in this thesis validate the use of a low power CPU and FPGA for an embedded system to perform image processing algorithms in real-time in the automotive domain. Using hardware accel- eration to significantly decrease the execution time of the ROI isolation stage and the CPU to classify and track regions and control data transfer within the system. The system could be extended to function in other automotive processes such as vehicle tail-light detection, lane detection, or any other image processing operation that relies on low level pixel operations to extract ROI followed by a stage to determine if the isolated region belongs to a certain class. The pedestrian detection ADAS presented in this thesis could be used to signifi- cantly reduce the number of pedestrian fatalities. The system highlights pedestrians in real-time and presents them to the driver of the vehicle allowing them to take a safe approach to avoid the pedestrian. Chapter 6. Conclusions and Future Work 107

6.2 Primary Contributions

An embedded ADAS to detect pedestrians in FIR video in real-time has been presented in this thesis. The primary contributions of this thesis can be defined as:

1. A low-power embedded system that can be used in the development of vision- based ADAS. The system contains an x86 based Intel N270 CPU and an Altera Arria GX II FPGA. The devices communicate over a PCI-express lane by DMA transfer. (Real-time Detection of Pedestrians in Far-Infrared Imagery with an Embedded CPU-FPGA System - Springer Journal of Real-Time Image Process- ing (JRTIP) - April 2015 )

2. A seeded region growing architecture has been developed to isolate ROI from the background imagery in FIR video in real-time on FPGA hardware. The architecture is robust to variations in pedestrian pose and appearance between frames. (FPGA Hardware Acceleration of Seeded Region Growing for Night- Time Pedestrian Detection - IEEE Transactions on Circuits and Systems for Video Technology - September 2016 )

3. A method of filtering isolated ROI based on their location within the current frame has been proposed. This significantly reduces the number of ROI that need to be classified. (Detection of Pedestrians in Night-time Environments with HOG-LBP Vectors - IET Intelligent Transportation Systems - April 2014 )

4. Investigation of LBP feature vectors using a range of sampling points and radii for pedestrian classification in FIR video. (Detection of Pedestrians in Night- time Environments with HOG-LBP Vectors - IET Intelligent Transportation Systems - April 2014 )

5. A feature vector for accurate classification of pedestrians in FIR video has been proposed. The feature vector is created by calculating the HOG and LBP Chapter 6. Conclusions and Future Work 108

features of an isolated ROI and concatenating them into a single feature vector. (Detection of Pedestrians in Night-time Environments with HOG-LBP Vectors - IET Intelligent Transportation Systems - April 2014 )

6. A comparison of HOG, LBP, and HOG-LBP feature vectors to determine their suitability for pedestrian detection in FIR imagery was performed. (Detection of Pedestrians in Night-time Environments with HOG-LBP Vectors - IET In- telligent Transportation Systems - April 2014 )

7. A pedestrian detection system that performs in real-time on the proposed em- bedded ADAS has been developed. (Real-time Detection of Pedestrians in Far- Infrared Imagery with an Embedded CPU-FPGA System - Springer Journal of Real-Time Image Processing (JRTIP) - April 2015 )

6.3 Suggestions for Future Work

There are several areas that could be investigated in future work:

1. Detecting the path of a pedestrian and fusing this data with the vehicles tra- jectory. This would allow the system to distinguish between pedestrians that are going to step into the path of the vehicle, and pedestrians that are walking away from the vehicle.

2. One of the main challenges with pedestrian detection is occlusion caused by groups of pedestrians. Head detection could be a viable option, as pedestri- ans heads are rarely occluded in groups. This is a very complex and difficult problem.

3. Convolutional Neural Networks (CNN) could be investigated as a method of classifying pedestrians in FIR video. This would provide a benchmark to com- pare with the SVM classifier used in this thesis. Chapter 6. Conclusions and Future Work 109

4. The detection of other VRUs present on roads such as cyclists and other possi- bilities should be considered. Adapting the ROI isolation stage to filter various types of ROI based on features and using dedicated classifiers could result in a system that detects all VRUs.

5. Finally, developing a method to detect forward facing and tail lights of vehicles in FIR frames in addition to pedestrians. Fusing this data with the pedestrian detection algorithm described in this work would create a detection system capable of detecting far more road-users leading to a safer road environment during hours of low-light. Bibliography

[1] World Health Organisation, “Top 10 Causes of Death Worldwide,” http://www.who.int/mediacentre/factsheets/fs310/en/ Last accessed April 2016.

[2] World Health Organisation, “Road Traffic Injuries,” http://www.who.int/mediacentre/factsheets/fs358/en/ Last accessed April 2016.

[3] J.R. Crandall, K.S. Bhalla and N.J. Madeley, “Designing road vehicles for pedestrian protection,” British Medical Journal, vol. 324, pp. 1145–1148, May 2002.

[4] W. Jones, “Building safer ,” Spectrum, IEEE, vol. 39, pp. 82–85, January 2002.

[5] L. Vlacic, M. Parent, and F. Harashima, “Preface,” in Intelligent Vehicle Tech- nologies (L. V. P. Harashima, ed.), Automotive Engineering Series, pp. xiii–xv, Oxford: Butterworth-Heinemann, 2001.

[6] A. Shaout, D. Colella, and S. Awad, “Advanced driver assistance systems - past, present and future,” in Computer Engineering Conference (ICENCO), 2011 Seventh International, pp. 72–82, Dec 2011.

[7] D. Geronimo, A. Lopez, A. Sappa, and T. Graf, “Survey of pedestrian detec-

110 Bibliography 111

tion for advanced driver assistance systems,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 32, pp. 1239–1258, july 2010.

[8] R. O’Malley, M. Glavin, and E. Jones, “Vision-based detection and tracking of vehicles to the rear with perspective correction in low-light conditions,” Intel- ligent Transport Systems, IET, vol. 5, pp. 1–10, March 2011.

[9] R. O’Malley, E. Jones, and M. Glavin, “Rear-lamp vehicle detection and track- ing in low-exposure color video for night conditions,” Intelligent Transportation Systems, IEEE Transactions on, vol. 11, pp. 453–462, June 2010.

[10] V. D. Nguyen, T. T. Nguyen, D. D. Nguyen, S. J. Lee, and J. W. Jeon, “A fast evolutionary algorithm for real-time vehicle detection,” Vehicular Technology, IEEE Transactions on, vol. 62, pp. 2453–2468, July 2013.

[11] E. Shang, J. Li, X. An, and H. He, “A real-time lane departure warning sys- tem based on fpga,” in Intelligent Transportation Systems (ITSC), 2011 14th International IEEE Conference on, pp. 1243–1248, October 2011.

[12] D. O. Cualain, C. Hughes, M. Glavin, and E. Jones, “Automotive standards- grade lane departure warning system,” Intelligent Transport Systems, IET, vol. 6, no. 1, pp. 44–57, 2012.

[13] D. Cualain, M. Glavin, and E. Jones, “Multiple-camera lane departure warning system for the automotive environment,” Intelligent Transport Systems, IET, vol. 6, pp. 223–234, September 2012.

[14] F. Xu, X. Liu, and K. Fujimura, “Pedestrian detection and tracking with ,” Intelligent Transportation Systems, IEEE Transactions on, vol. 6, pp. 63–71, March 2005.

[15] H. Nanda and L. Davis, “Probabilistic template based pedestrian detection in Bibliography 112

infrared videos,” in Intelligent Vehicle Symposium, 2002. IEEE, vol. 1, pp. 15– 20, June 2002.

[16] Road Safety Authority, “Irish Road Safety Statistics,” http://www.rsa.ie/RSA/Road-Safety/Our-Research/Deaths-injuries-on-Irish- roads/ Last accessed April 2016.

[17] EU Road Safety, “European Road Safety Statistics,” Last accessed April 2016.

[18] National Highway Traffic Safety Administration (NHTSA), “Traffic safety facts, 2012 data, pedestrians,” tech. rep., 2012.

[19] A. K. Oya, H. and H. Kanoshima, “A research on interrelation between illumi- nance at intersections and reduction in traffic accidents,” Journal of Light and Visual Environment, vol. 26, pp. 29–34, March 2002.

[20] U. Meis, W. Ritter, and H. Neumann, “Detection and classification of obstacles in night vision traffic scenes based on infrared imagery,” in Intelligent Trans- portation Systems, 2003. Proceedings. 2003 IEEE, vol. 2, pp. 1140–1144, Octo- ber 2003.

[21] C. Fors and S.-O. Lundkvist, “Night-time traffic in urban areas - a literature review on road user aspects,” Tech. Rep. Rapport 650Av, VTI, 2009.

[22] EU Road Safety Council, “Towards a european road safety area: policy orien- tations on road safety 2011-2020,” European Union, 2011.

[23] D. Zuby, “Advance Notice of Proposed Rulemaking; 49 CFR Part 571 Federal Motor Vehicle Safety Standards, Rearview Mirrors,” Safety Insurance Institute For Highway 2009.

[24] National Highway Traffic Safety Administration, US Department of Transportation, “Vehicle Backover Avoidance Technology Study - Report to Congress,” 2006. Bibliography 113

[25] Audi Parking Aid Advanced, http://www.audi.com/aola/brand/en lc/tools/advice/glossary/audi parking aid advanced.browser.html Last ac- cessed April 2016.

[26] Volkswagen Parking Assistance System, http://www.volkswagen.co.uk/technology/ parking-and-manoeuvring/park-assist Last accessed April 2016.

[27] Toyota Parking Assistance System, http://www.toyota- global.com/innovation/safety technology/safety technology/parking/ Last accessed April 2016.

[28] Ford Parking Assistance System, http://www.ford.ie/Technology/ParkingAid Last accessed April 2016.

[29] S. Shreejith, S. Fahmy, and M. Lukasiewycz, “Reconfigurable computing in next-generation automotive networks,” Embedded Systems Letters, IEEE, vol. 5, pp. 12–15, March 2013.

[30] O. Tsimhoni, T. Minoda, M.J. Flanagan, “Pedestrian detection with night vi- sion systems enhanced by automatic warnings,” Proceedings 50th Annual Conf. Human Factors Ergonomics Soc., pp. 2443–2447, 2006.

[31] J. M. Sullivan and M. J. Flanagan, “Characteristics of pedestrian risk in darkness,” Proceedings 50th Annual Conf. Human Factors Ergonomics Soc., pp. 2443–2447, 2001.

[32] E. Hollnagel and J. Kallhammer, “Effects of a night vision enhancement system (nves) on driving: Results from a simulator study,” Proceedings of the Second International Driving Symposium on Human Factors in Driver Assessment, Training and Vehicle Design, 2003.

[33] B. N. Schenkman and K. Brunnstrom, “Camera position and presentation scale Bibliography 114

for infrared night vision systems in cars,” Human Factors and Ergonomics in Manufacturing, vol. 17, no. 5, pp. 457–473, 2007.

[34] B. N. Mahlke, K. Rsler, K. Seifert, K. Krems and M. Thuring, “Evaluation of six night vision enhancement system: Qualitative and quantitative support for intelligent image processing,” Human Factors and Ergonomics in Manufactur- ing, vol. 49, pp. 670–677, September 2010.

[35] M. Green, “How long does it take to stop? - methodological analysis of driver perception- times,” Transportation Human Factors, vol. 2, no. 3, pp. 195– 216, 2000.

[36] S. Plainis, I. J. Murray, and I.G. Pallikaris, “Road traffic casualties: under- standing the night-time death toll.,” Injury Prevention, vol. 12, pp. 125–128, April 2006.

[37] T. Akerstedt, G. Kecklund, and L.-G. Horte, “Night driving, season, and the risk of highway accidents.,” Sleep, vol. 24, pp. 401–406, June 2001.

[38] Y. Chen, K. Shen, and S. Wang, “Forward collision warning system considering both time-to-collision and safety braking distance,” in Industrial Electronics and Applications (ICIEA), 2013 8th IEEE Conference on, pp. 972–977, June 2013.

[39] J. T. Andre, “Visual functioning in challenging conditions: Effects of alcohol consumption, luminance, stimulus motion, and glare on contrast sensitivity.,” Journal of Experimental Psychology: Applied, vol. 23, pp. 250–269, July 1996.

[40] Y. Ou, T. Liu, Z. Zhao, Z. Ma and Y. Wang, “Modeling the impact of frame rate on perceptual quality of video,” in Image Processing, 2008. ICIP 2008. 15th IEEE International Conference on, pp. 689–692, October 2008.

[41] A. Doshi, S. Y. Cheng, and M. M. Trivedi, “A novel active heads-up display Bibliography 115

for driver assistance,” IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), vol. 39, pp. 85–93, Feb 2009.

[42] Jaguar Land Rover, “Jaguar Land Rover vibration based warning system,” Last accessed April 2016.

[43] R. Gade and T. Moeslund, “Thermal cameras and applications: a survey,” Machine Vision and Applications, vol. 25, no. 1, pp. 245–262, 2014.

[44] J. F. Hurnik, S. D. Boer, and A. B. Webster, “Detection of health disorders in dairy cattle utilizing a thermal infrared scanning technique,” Canadian Journal of Animal Science, vol. 64, no. 4, pp. 1071–1073, 1984.

[45] L.E. Yanmaz, Z. Okumus and E. Dogan., “Instrumentation of thermography and its applications in horses,” Journal of Animal and Veterinary Advances, vol. 6, pp. 858–862, 2007.

[46] D. Iwaszczuk, L. Hoegner, and U. Stilla, “Matching of 3d building models with ir images for texture extraction,” in Urban Remote Sensing Event (JURSE), 2011 Joint, pp. 25–28, April 2011.

[47] B. Sirmacek, L. Hoegner, and U. Stilla, “Detection of windows and doors from thermal images by grouping geometrical features,” in Urban Remote Sensing Event (JURSE), 2011 Joint, pp. 133–136, April 2011.

[48] R. O’Malley, E. Jones, and M. Glavin, “Detection of pedestrians in far-infrared automotive night vision using region-growing and clothing distortion compen- sation,” Infrared Physics & Technology, vol. 53, no. 6, pp. 439–449, 2010.

[49] A. Miron, B. Besbes, A. Rogozan, S. Ainouz, and A. Bensrhair, “Intensity self similarity features for pedestrian detection in far-infrared images,” in Intelligent Vehicles Symposium (IV), 2012 IEEE, pp. 1120–1125, June 2012. Bibliography 116

[50] D. Xia, H. Sun, and Z. Shen, “Real-time infrared pedestrian detection based on multi-block LBP,” in Computer Application and System Modeling (ICCASM), 2010 International Conference on, vol. 12, pp. 139–142, October 2010.

[51] M. Bertozzi, A. Broggi, M. Del Rose, M. Felisa, A. Rakotomamonjy, and F. Suard, “A pedestrian detector using histograms of oriented gradients and a support vector machine classifier,” in Intelligent Transportation Systems Con- ference, 2007. ITSC 2007. IEEE, pp. 143–148, October 2007.

[52] F. Suard, A. Rakotomamonjy, A. Bensrhair, and A. Broggi, “Pedestrian detec- tion using infrared images and histograms of oriented gradients,” in Intelligent Vehicles Symposium, 2006 IEEE, pp. 206–212, June 2006.

[53] M. Gereon, “Smart systems for safe, sustainable and networked vehicles,” in Advanced Microsystems for Automotive Applications 2012 (M. Gereon, ed.), Springer, 2012.

[54] M. Hanqvist, “An object detection system,” May 2008, Patent No. WO 2008/057042 A1, Autoliv Department AB.

[55] A. E. Maadi and X. Maldague, “Outdoor infrared video surveillance: A novel dynamic technique for the subtraction of a changing background of IR images,” Infrared Physics & Technology, vol. 49, no. 3, pp. 261–265, 2007. Workshop on Advanced Infrared Technology and Applications.

[56] R. Walczyk, A. Armitage, and T. Binnie, “An embedded real-time pedestrian detection system using an infrared camera,” in Signals and Systems Conference (ISSC 2009), IET Irish, pp. 1–6, June 2009.

[57] G. Bauer, F. Homm, L. Walchshusl, and D. Burschka, “Multi spectral pedes- trian detection and localization,” in Advanced Microsystems for Automotive Bibliography 117

Applications 2008 (J. Valldorf and W. Gessner, eds.), VDI-Buch, pp. 21–35, Springer Berlin Heidelberg, 2008. 10.1007/978-3-540-77980-3 3.

[58] M. Bertozzi, A. Broggi, S. Ghidoni, and M. Del Rose, “Pedestrian shape extrac- tion by means of active contours,” in Field and Service Robotics (C. Laugier and R. Siegwart, eds.), vol. 42 of Springer Tracts in Advanced Robotics, pp. 265–274, Springer Berlin / Heidelberg, 2008. 10.1007/978-3-540-75404-6 25.

[59] Y. Fang, K. Yamada, Y. Ninomiya, B. Horn, and I. Masaki, “A shape- independent method for pedestrian detection with far-infrared images,” Ve- hicular Technology, IEEE Transactions on, vol. 53, pp. 1679–1697, November 2004.

[60] G. Bauer, F. Homm, L. Walchshusl, and D. Burschka, “Multi spectral pedes- trian detection and localization,” in Advanced Microsystems for Automotive Applications 2008 (J. Valldorf and W. Gessner, eds.), VDI-Buch, pp. 21–35, Springer Berlin Heidelberg, 2008. 10.1007/978-3-540-77980-3.

[61] M. Bertozzi, A. Broggi, S. Ghidoni, and M. Del Rose, “Pedestrian shape extrac- tion by means of active contours,” in Field and Service Robotics, pp. 265–274, Springer, 2008.

[62] R. O’Malley, M. Glavin, and E. Jones, “An efficient region of interest generation technique for far-infrared pedestrian detection,” in Consumer Electronics, 2008. ICCE 2008. Digest of Technical Papers. International Conference on, pp. 1–2, January 2008.

[63] M. Bertozzi, A. Broggi, C. Caraffi, M. D. Rose, M. Felisa, and G. Vezzoni, “Pedestrian detection by means of far-infrared stereo vision,” Computer Vision and Image Understanding, vol. 106, no. 23, pp. 194–204, 2007. Special issue on Advances in Vision Algorithms and Systems beyond the Visible Spectrum. Bibliography 118

[64] M. Yasuno, S. Ryousuke, N. Yasuda, and M. Aoki, “Pedestrian detection and tracking in far infrared images,” in Intelligent Transportation Systems, 2005. Proceedings. 2005 IEEE, pp. 182–187, September 2005.

[65] M. Bertozzi, A. Broggi, P. Grisleri, T. Graf, and M. Meinecke, “Pedestrian detection in infrared images,” in Intelligent Vehicles Symposium, 2003. Pro- ceedings. IEEE, pp. 662–667, June 2003.

[66] M. Bertozzi, A. Broggi, A. Fascioli, T. Graf, and M.-M. Meinecke, “Pedestrian detection for driver assistance using multiresolution infrared vision,” Vehicular Technology, IEEE Transactions on, vol. 53, pp. 1666–1678, November 2004.

[67] S. Smith and J. Brady, “SUSAN-a new approach to low level image process- ing,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 23, pp. 45–78, July 1997.

[68] B. Besbes, A. Rogozan, and A. Bensrhair, “Pedestrian recognition based on hierarchical codebook of SURF features in visible and infrared images,” in In- telligent Vehicles Symposium (IV), 2010 IEEE, pp. 156–161, June 2010.

[69] S. Krotosky and M. Trivedi, “A comparison of color and infrared stereo ap- proaches to pedestrian detection,” in Intelligent Vehicles Symposium, 2007 IEEE, pp. 81–86, June 2007.

[70] M. Bertozzi, A. Broggi, A. Lasagni, and M. Rose, “Infrared stereo vision-based pedestrian detection,” in Intelligent Vehicles Symposium, 2005. Proceedings. IEEE, pp. 24–29, June 2005.

[71] S. J. Krotosky and M. M. Trivedi, “Mutual information based registration of multimodal stereo videos for person tracking,” Computer Vision and Image Understanding, pp. 270–287, December 2007. Bibliography 119

[72] X. Liu and K. Fujimura, “Pedestrian detection using stereo night vision,” Ve- hicular Technology, IEEE Transactions on, vol. 53, pp. 1657–1665, November 2004.

[73] M. Bertozzi, A. Broggi, M. Felisa, G. Vezzoni, and M. Del Rose, “Low-level pedestrian detection by means of visible and far infra-red tetra-vision,” in In- telligent Vehicles Symposium, 2006 IEEE, pp. 231–236, June 2006.

[74] T. Tsuji, H. Hattori, M. Watanabe, and N. Nagaoka, “Development of night- vision system,” Intelligent Transportation Systems, IEEE Transactions on, vol. 3, pp. 203–209, September 2002.

[75] M. Bertozzi, A. Broggi, S. Ghidoni, and M. Meinecke, “A night vision module for the detection of distant pedestrians,” in Intelligent Vehicles Symposium, 2007 IEEE, pp. 25–30, June 2007.

[76] A. Leykin and R. Hammoud, “Robust multi-pedestrian tracking in thermal- visible surveillance videos,” in Computer Vision and Pattern Recognition Work- shop, 2006. CVPRW ’06. Conference on, pp. 136–136, June 2006.

[77] N. Cristianini and J. Shawe-Taylor, An Introduction to Support Vector Machines and Other Kernel-based Learning Methods. Cambridge Univ. Press, 2000.

[78] R. Miyamoto, H. Sugano, and Y. Nakamura, “Pedestrian recognition suitable for night vision systems,” IJCSNS International Journal of Computer Science and Network Security, 2007.

[79] D. Gavrila and J. Giebel, “Shape-based pedestrian detection and tracking,” in Intelligent Vehicle Symposium, 2002. IEEE, vol. 1, pp. 8–14, June 2002.

[80] L. Zhao and C. Thorpe, “Stereo and neural network-based pedestrian de- tection,” in Intelligent Transportation Systems, 1999. Proceedings. 1999 IEEE/IEEJ/JSAI International Conference on, pp. 298–303, 1999. Bibliography 120

[81] D. Olmeda, A. De la Escalera, and J. Armingol, “Detection and tracking of pedestrians in infrared images,” in Signals, Circuits and Systems (SCS), 2009 3rd International Conference on, pp. 1–6, November 2009.

[82] J. Davis and M. Keck, “A two-stage approach to person detection in thermal imagery,” in In Proc. Workshop on Applications of Computer Vision, January 2005.

[83] N. Dalal and B. Triggs, “Histograms of oriented gradients for human detec- tion,” in Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, vol. 1, pp. 886–893, June 2005.

[84] P. Hurney, P. Waldron, F. Morgan, E. Jones, and M. Glavin, “Night-time pedes- trian classification with histograms of oriented gradients-local binary patterns vectors,” Intelligent Transport Systems, IET, vol. 9, no. 1, pp. 75–85, 2015.

[85] C. Papageorgiou, M. Oren, and T. Poggio, “A general framework for object de- tection,” in Computer Vision, 1998. Sixth International Conference on, pp. 555– 562, January 1998.

[86] Y. Benezeth, B. Emile, H. Laurent, and C. Rosenberger, “A real time human detection system based on far infrared vision,” in Image and Signal Processing (A. Elmoataz, O. Lezoray, F. Nouboud, and D. Mammass, eds.), vol. 5099 of Lecture Notes in Computer Science, pp. 76–84, Springer Berlin Heidelberg, 2008.

[87] H. Sun, C. Wang, and B. Wang, “Night vision pedestrian detection using a forward-looking infrared camera,” in Multi-Platform/Multi-Sensor Remote Sensing and Mapping (M2RSM), 2011 International Workshop on, pp. 1–4, 2011. Bibliography 121

[88] C. M. Bautista, C. A. Dy, M. I. Maalac, R. A. Orbe, and M. Cordel, “Convolu- tional neural network for vehicle detection in low resolution traffic videos,” in 2016 IEEE Region 10 Symposium (TENSYMP), pp. 277–281, May 2016.

[89] Z. Zhu, D. Liang, S. Zhang, X. Huang, B. Li, and S. Hu, “Traffic-sign detection and classification in the wild,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2110–2118, June 2016.

[90] D. Tom, F. Monti, L. Baroffio, L. Bondi, M. Tagliasacchi, and S. Tubaro, “Deep convolutional neural networks for pedestrian detection,” Signal Processing: Im- age Communication, vol. 47, pp. 482 – 489, 2016.

[91] A. Angelova, A. Krizhevsky, V. Vanhoucke, A. Ogale, and D. Ferguson, “Real- time pedestrian detection with deep network cascades,” in Proceedings of BMVC 2015, 2015.

[92] V. John, S. Mita, Z. Liu, and B. Qi, “Pedestrian detection in thermal images using adaptive fuzzy c-means clustering and convolutional neural networks,” in 2015 14th IAPR International Conference on Machine Vision Applications (MVA), pp. 246–249, May 2015.

[93] G. Welch and G. Bishop, “An introduction to the kalman filter,” Tech. Rep. TR95-041, University of North Carolina at Chapel Hill, Department of Com- puter Science, 2003.

[94] L. Guo, L. Li, Y. Zhao, and M. Zhang, “Study on pedestrian detection and tracking with monocular vision,” in Computer Technology and Development (ICCTD), 2010 2nd International Conference on, pp. 466–470, November 2010.

[95] M. Mahlisch, M. Oberlander, O. Lohlein, D. Gavrila, and W. Ritter, “A multiple detector approach to low-resolution fir pedestrian recognition,” in Intelligent Vehicles Symposium, 2005. Proceedings. IEEE, pp. 325–330, June 2005. Bibliography 122

[96] X. Wang and Z. Tang, “Modified particle filter-based infrared pedestrian track- ing,” Infrared Physics & Technology, vol. 53, no. 4, pp. 280–287, 2010.

[97] R. Dwivedi, N. Kandpal, and A. Shukla, “Adaptive suspension system,” in Information and Financial Engineering (ICIFE), 2010 2nd IEEE International Conference on, pp. 694–697, September 2010.

[98] G. Macario, M. Torchiano, and M. Violante, “An in-vehicle infotainment soft- ware architecture based on google android,” in Industrial Embedded Systems, 2009. SIES ’09. IEEE International Symposium on, pp. 257–260, July 2009.

[99] B.-F. Wu, H.-Y. Huang, C.-J. Chen, Y.-H. Chen, C.-W. Chang, and Y.-L. Chen, “A vision-based blind spot warning system for daytime and nighttime driver assistance,” Computers & Electrical Engineering, vol. 39, no. 3, pp. 846– 862, 2013. Special issue on Image and Video Processing Special issue on Recent Trends in Communications and Signal Processing.

[100] OpenCV, “OpenCV,” http://opencv.willowgarage.com/wiki/, Last accessed April 2016.

[101] Intel, “Intel Integrated Performance Primitives,” http://software.intel.com/en- us/articles/intel-ipp/, Last accessed February 2016.

[102] Intel, “Intel Atom CPU,” http://www.intel.com/content/www/us/en /processors/atom/atom-processor.html, Last accessed April 2016.

[103] P. Aby, A. Jose, L. Dinu, J. John, and G. Sabarinath, “Implementation and op- timization of embedded face detection system,” in Signal Processing, Communi- cation, Computing and Networking Technologies (ICSCCN), 2011 International Conference on, pp. 250–253, July 2011.

[104] L. Codrescu, W. Anderson, S. Venkumanhanti, M. Zeng, E. Plondke, C. Koob, A. Ingle, C. Tabony, and R. Maule, “Hexagon DSP: An architecture optimized Bibliography 123

for mobile multimedia and communications,” Micro, IEEE, vol. 34, pp. 34–43, March 2014.

[105] S. Agarwala, A. Rajagopal, A. Hill, M. Joshi, S. Mullinnix, T. Anderson, R. Damodaran, L. Nardini, P. Wiley, P. Groves, J. Apostol, M. Gill, J. Flo- res, A. Chachad, A. Hales, K. Chirca, K. Panda, R. Venkatasubramanian, P. Eyres, R. Veiamuri, A. Rajaram, M. Krishnan, J. Nelson, J. Frade, M. Rah- man, N. Mahmood, U. Narasimha, S. Sinha, S. Krishnan, W. Webster, D. Bui, S. Moharii, N. Common, R. Nair, R. Ramanujam, and M. Ryan, “A 65nm c64x+ multi-core dsp platform for communications infrastructure,” in Solid- State Circuits Conference, 2007. ISSCC 2007. Digest of Technical Papers. IEEE International, pp. 262–601, February 2007.

[106] H.-Y. Lin, L.-Q. Chen, Y.-H. Lin, and M.-S. Yu, “Lane departure and front collision warning using a single camera,” in Intelligent Signal Processing and Communications Systems (ISPACS), 2012 International Symposium on, pp. 64– 69, November 2012.

[107] M. Turturici, S. Saponara, L. Fanucci, and E. Franchi, “Low-power dsp system for real-time correction of fish-eye cameras in automotive driver assistance ap- plications,” Journal of Real-Time Image Processing, vol. 9, no. 3, pp. 463–478, 2014.

[108] A. Chavan and S. Yogamani, “Real-time DSP implementation of pedestrian detection algorithm using hog features,” in ITS Telecommunications (ITST), 2012 12th International Conference on, pp. 352–355, November 2012.

[109] T. Wilson, M. Glatz, and M. Hodlmoser, “Pedestrian detection implemented on a fixed-point parallel architecture,” in Consumer Electronics, 2009. ISCE ’09. IEEE 13th International Symposium on, pp. 47–51, May 2009. Bibliography 124

[110] A. Piriyakumar Douglas, M. Prasad, B. Sunil Gowtham, A. Kalyansundar, V. Swaminathan, and R. Chattopadhyay, “An efficient DSP implementation of real-time stationary vehicle detection by smart camera at outdoor conditions,” in Image Processing, 2006 IEEE International Conference on, pp. 3273–3276, October 2006.

[111] NVIDIA, “NVIDIA CUDA,” http://www.nvidia.com/object/cuda home new.html Last accessed April 2016.

[112] A. Gepperth, M. Ortiz, and B. Heisele, “Real-time pedestrian detection and pose classification on a gpu,” in Intelligent Transportation Systems - (ITSC), 2013 16th International IEEE Conference on, pp. 348–353, October 2013.

[113] R. Benenson, M. Mathias, R. Timofte, and L. Van Gool, “Pedestrian detec- tion at 100 frames per second,” in Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pp. 2903–2910, June 2012.

[114] C. Beleznai, D. Schreiber, and M. Rauter, “Pedestrian detection using gpu- accelerated multiple cue computation,” in Computer Vision and Pattern Recog- nition Workshops (CVPRW), 2011 IEEE Computer Society Conference on, pp. 58–65, June 2011.

[115] NVIDIA, “NVIDIA Gefroce GTX 470,” http://www.geforce.com/hardware/desktop- gpus/geforce-gtx-470/specifications Last accessed April 2016.

[116] T. P. Cao, G. Deng, and D. Mulligan, “Implementation of real-time pedes- trian detection on FPGA,” in Image and Vision Computing New Zealand, 2008. IVCNZ 2008. 23rd International Conference, pp. 1–6, November 2008.

[117] S. Martelli, R. Marzotto, A. Colombari, and V. Murino, “FPGA-based robust ellipse estimation for circular road sign detection,” in Computer Vision and Bibliography 125

Pattern Recognition Workshops (CVPRW), 2010 IEEE Computer Society Con- ference on, pp. 53–60, June 2010.

[118] Xilinx, “Xilinx Power Estimator (XPE),” http://www.vcipl.okstate.edu/otcbvs/ bench/ Last accessed September 2014.

[119] Altera, “Altera Arria GX II FPGA Specifications,” http://www.altera.com/devices/fpga/arria-fpgas/arria-ii-gx/aiigx-index.jsp Last accessed April 2016.

[120] Intel Atom In-vehicle Infotainment, “Intel Atom In-vehicle Infotainment,” Last accessed April 2016.

[121] Ubuntu, “Ubuntu Operating System,” http://www.ubuntu.com/ Last accessed April 2016.

[122] J. Teich, “Hardware/software codesign: The past, the present, and predicting the future,” Proceedings of the IEEE, vol. 100, pp. 1411–1430, May 2012.

[123] F. Stein, “The challenge of putting vision algorithms into a car,” in Computer Vision and Pattern Recognition Workshops (CVPRW), 2012 IEEE Computer Society Conference on, pp. 89–94, June 2012.

[124] Altera, “Altera Avalon Specification,” http://www.altera.com/literature/manual /mnl avalon spec.pdf, Last accessed April 2016.

[125] A. Rosenfeld and J. L. Pfaltz, “Sequential operations in digital picture process- ing,” J. ACM, vol. 13, pp. 471–494, October 1966.

[126] P. Borges, “Pedestrian detection based on blob motion statistics,” Circuits and Systems for Video Technology, IEEE Transactions on, vol. 23, pp. 224–235, February 2013. Bibliography 126

[127] B. Senthilkumar, G. Umamaheswari, and J. Karthik, “A novel region growing segmentation algorithm for the detection of breast cancer,” in Computational Intelligence and Computing Research (ICCIC), 2010 IEEE International Con- ference on, pp. 1–4, December 2010.

[128] L. Arbach, A. Stolpen, and J. Reinhardt, “Classification of breast mri lesions using a backpropagation neural network (bnn),” in Biomedical Imaging: Nano to Macro, 2004. IEEE International Symposium on, pp. 253–256 Vol. 1, April 2004.

[129] Linux gprof profiler, https://sourceware.org/binutils/docs/gprof/ Last accessed April 2016.

[130] K. Suzuki, I. Horiba, and N. Sugie, “Linear-time connected-component labeling based on sequential local operations,” Computer Vision and Image Understand- ing, vol. 89, no. 1, pp. 1–23, 2003.

[131] Q. Gu, T. Takaki, and I. Ishii, “Fast FPGA-based multiobject feature ex- traction,” Circuits and Systems for Video Technology, IEEE Transactions on, vol. 23, pp. 30–45, January 2013.

[132] F. Chang, C.-J. Chen, and C.-J. Lu, “A linear-time component-labeling algo- rithm using contour tracing technique,” Computer Vision and Image Under- standing, vol. 93, no. 2, pp. 206–220, 2004.

[133] C. Johnston and D. Bailey, “Fpga implementation of a single pass connected components algorithm,” in Electronic Design, Test and Applications, 2008. DELTA 2008. 4th IEEE International Symposium on, pp. 228 –231, January 2008.

[134] M. Jablonski and M. Gorgon, “Handel-C implementation of classical component Bibliography 127

labelling algorithm,” in Digital System Design, 2004. DSD 2004. Euromicro Symposium on, pp. 387–393, August 2004.

[135] R. Haralick, “Some neighborhood operations,” Real Time/Parallel Computing Image Analysis, pp. 11–35, 1981.

[136] D. Crookes and K. Benkrid, “FPGA implementation of image component la- beling,” in Photonics East’99, pp. 17–23, International Society for Optics and Photonics, 1999.

[137] K. Benkrid, S. Sukhsawas, D. Crookes, and A. Benkrid, “An FPGA-based image connected component labeller,” in Field Programmable Logic and Application (P. Y. K. Cheung and G. Constantinides, eds.), vol. 2778 of Lecture Notes in Computer Science, pp. 1012–1015, Springer Berlin Heidelberg, 2003.

[138] M. Klaiber, D. Bailey, S. Ahmed, Y. Baroud, and S. Simon, “A high-throughput FPGA architecture for parallel connected components analysis based on label reuse,” in Field-Programmable Technology (FPT), 2013 International Confer- ence on, pp. 302–305, December 2013.

[139] A. Moga and M. Gabbouj, “Parallel image component labelling with watershed transformation,” Pattern Analysis and Machine Intelligence, IEEE Transac- tions on, vol. 19, pp. 441–450, May 1997.

[140] N. Ma, D. Bailey, and C. Johnston, “Optimised single pass connected compo- nents analysis,” in ICECE Technology, 2008. FPT 2008. International Confer- ence on, pp. 185–192, December 2008.

[141] M. Huang, C. Wang, and Y. Liu, “High-speed video transfer and real-time in- frared spots detection based on FPGA,” in Computer Science and Engineering, 2009. WCSE ’09. Second International Workshop on, vol. 2, pp. 154–159, Oct 2009. Bibliography 128

[142] R. Walczyk, A. Armitage, and T. Binnie, “Fpga implementation of hot spot detection in infrared video,” in Signals and Systems Conference (ISSC 2010), IET Irish, pp. 233–238, June 2010.

[143] X. Wang, T. X. Han, and S. Yan, “An HOG-LBP human detector with par- tial occlusion handling,” in Computer Vision, 2009 IEEE 12th International Conference on, pp. 32–39, October 2009.

[144] D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” International Journal of Computer Vision, vol. 60, pp. 91–110, 2004. 10.1023/B:VISI.0000029664.99615.94.

[145] T. Ojala, M. Pietikainen, and T. Maenpaa, “Multiresolution gray-scale and rotation invariant texture classification with local binary patterns,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 24, pp. 971– 987, July 2002.

[146] Z. Niu and X. Qiu, “Facial expression recognition based on weighted principal component analysis and support vector machines,” in Advanced Computer The- ory and Engineering (ICACTE), 2010 3rd International Conference on, vol. 3, pp. 174–178, August 2010.

[147] B. Heisele, P. Ho, and T. Poggio, “Face recognition with support vector ma- chines: global versus component-based approach,” in Computer Vision, 2001. ICCV 2001. Proceedings. Eighth IEEE International Conference on, vol. 2, pp. 688–694 vol.2, 2001.

[148] J. Molina, M. Escudero-Violo, A. Signoriello, M. Pard s, C. Ferrn, J. Bescs, F. Marqus, and J. Martnez, “Real-time user independent hand gesture recog- nition from time-of-flight camera video using static and dynamic models,” Ma- chine Vision and Applications, pp. 1–18, 2011. 10.1007/s00138-011-0364-6. Bibliography 129

[149] K. Polat and S. Gne, “Breast cancer diagnosis using least square support vector machine,” Digital Signal Processing, vol. 17, no. 4, pp. 694–701, 2007.

[150] M. F. Akay, “Support vector machines combined with feature selection for breast cancer diagnosis,” Expert Systems with Applications, vol. 36, no. 2, Part 2, pp. 3240–3247, 2009.

[151] C-W. Hsu, C-C. Chang and C-J. Lin, “A practical guide to SVM,”

http://www.csie.ntu.edu.tw/∼cjlin/papers/guide/guide.pdf/ Last accessed April 2016.

[152] S. Teoh and T. Brunl, “Symmetry-based monocular vehicle detection system,” Machine Vision and Applications, pp. 1–12, 2011. 10.1007/s00138-011-0355-7.

[153] M. Bertozzi, A. Broggi, P. Grisleri, A. Tibaldi, and M. Rose, “A tool for vi- sion based pedestrian detection performance evaluation,” in Intelligent Vehicles Symposium, 2004 IEEE, pp. 784–789, June 2004.

[154] J. Ge, Y. Luo, and G. Tei, “Real-time pedestrian detection and tracking at nighttime for driver-assistance systems,” Intelligent Transportation Systems, IEEE Transactions on, vol. 10, no. 2, pp. 283–298, 2009. Appendix A

Seeded Region Growing Documentation

A.1 Introduction

This appendix contains the documentation relevant to the FPGA based seeded region growing implementation. Functional partitions and corresponding data dictio- naries are provided for each section of the SRG algorithm. A detailed overview of the SRG system is shown in Figure A.2.

130 Appendix A. Seeded Region Growing User Guide 131 Figure A.1: Top level block diagram of FPGA Seeded Region Growing Design Appendix A. Seeded Region Growing User Guide 132 Figure A.2: Seeded Region Growing functional partition Appendix A. Seeded Region Growing User Guide 133

A.2 Binary Thresholding

A functional partition of the binary thresholding architecture is shown in Figure A.3.

Figure A.3: Functional partition of binary thresholding architecture

A.2.1 Data Dictionary

The signals relevant to the binary thresholding section of the design are shown in Table A.1. Appendix A. Seeded Region Growing User Guide 134

Signal Name Direction Size Description

pixel in I [7:0][7:0] Pixel data from CPU host, contains 8 greyscale 8-bit pixels

clk in I 1 On-board PCI reference clock (125MHz)

thresh en I 1 Enable signal for thresholding oper- ation

binary pixel O 1 Output of binary thresholding oper- ation

pixel in sel - [2:0] Selects the current pixel to be thresholded from the pixel in signal

thresh val sel - [2:0] Current threshold value to use in threshold operation

Table A.1: Data dictionary for binary thresholding architecture Appendix A. Seeded Region Growing User Guide 135

A.3 Connected Component Labelling

The CCL stage contains four modules:

• Label Assign

• Row Length Buffer

• Region Parameter Storage Appendix A. Seeded Region Growing User Guide 136

A.3.1 Label Assignment

A.3.1.1 Data Dictionary

The signals relevant to the Label Assignment portion of the CCL design are shown in Table A.2. Appendix A. Seeded Region Growing User Guide 137

Pin Name Direction Size Description

clk I 1 125MHz clock

binary pixel I 1 Binary representation of input pixel

ccl en I 1 Enable signal for CCL stage

A I [4:0] Neighbour of current pixel (x-1, y-1)

B I [4:0] Neighbour of current pixel (x, y-1)

C I [4:0] Neighbour of current pixel (x+1, y- 1)

D I [4:0] Neighbour of current pixel (x-1, y)

currLabel O [4:0] Indicates that a new region was found merger data O [4:0] Lower value of the conflicting labels

merger addr wr O [4:0] Higher value of the conflicting labels newLabel O 1 Indicates that a new region was found existLabel O 1 Indicates that a neighbouring pixel was encountered mergerLabel O 1 Indicates a merger scenario has oc- curred newRow O 1 Asserted at the beginning of a new row x coor O [8:0] Current x co-ordinate newRow O 1 Asserted HIGH the end of the cur- rent row y coor O [7:0] Current y co-ordinate frame done O 1 Asserted HIGH the end of the cur- rent frame

Table A.2: Label Assignment data dictionary. Appendix A. Seeded Region Growing User Guide 138 Figure A.4: Functional partition of the Label Assignment stage. Appendix A. Seeded Region Growing User Guide 139

A.3.2 Row Length Buffer

A.3.2.1 Data Dictionary

The signals relevant to the Row Length Buffer portion of the CCL design are shown in Table A.3.

Pin Name Pin Direction Size Description

clk I 1 125MHz clock current label I [4:0] Label assigned to current pixel thresh ccl en I 1 Enable signal for CCL design merger label 0 I [4:0] Lower value of the conflicting labels merger label 1 I [4:0] Higher value of the conflicting labels A O [4:0] Neighbour of current pixel (x-1, y-1) B O [4:0] Neighbour of current pixel (x, y-1) C O [4:0] Neighbour of current pixel (x+1, y- 1) D O [4:0] Neighbour of current pixel (x-1, y) row storage select - [2:0] FIFO select bits merger table select - [2:0] Select current merger table FIFO data in (0..7) - [4:0] Label input to current FIFO FIFO data out (0..7) - [4:0] Output of FIFO containing labels for current row label select - [2:0] Select which A, B, C values to be sent to Label Assignment block merger table(0..7) out - [4:0] Output of merger table

Table A.3: Row Length Buffer data dictionary. Appendix A. Seeded Region Growing User Guide 140 Figure A.5: Functional partition of Row Length Buffer stage. Appendix A. Seeded Region Growing User Guide 141

A.3.3 Region Parameter Storage

A functional partition of the duplicate removal architecture is shown in Figure A.6.

A.3.3.1 Data Dictionary

The signals relevant to the Region Parameter Storage portion of the CCL design are shown in Table A.4. Appendix A. Seeded Region Growing User Guide 142

Pin Name Direction Size Description

clk I 1 125MHz clock current label I [4:0] The label assigned to the current pixel merger label 0 I [4:0] The lower value of the conflicting labels merger label 1 I [4:0] The higher value of the conflicting labels newLabel I 1 Indicates that a new region was found existingLabel I 1 Indicates that a neighbouring pixel was encountered mergerLabel I 1 Indicates a merger scenario has occurred newRow I 1 Asserted at the beginning of a new row curr x coor I [8:0] Current x co-ordinate curr y coor I [7:0] Current y co-ordinate dr rd address I [4:0] Address to read data from for Duplicate Removal process data out O [31:0] Data read by Duplicate Removal process data table addr A - [4:0] Address to write data to in data storage BRAM data table addr B - [4:0] Address to write data to in data storage BRAM data table addr sel - 1 Address to write data to in data storage BRAM data table data in A - [31:0] Data to write to BRAM data table data in B - [31:0] Data to write to BRAM data table wrEn - 1 Write enable signal for Data Storage BRAM data table out 0 - [4:0] Address to read data from for Duplicate Removal process data table out 1 - [4:0] Address to read data from for Duplicate Removal process data out sel - [4:0] Select bit for multiplexer on output of data storage BRAM temp label data A - [4:0] Temp data after exist or merge condition temp label data B - [4:0] Temp data after exist or merge condition

Table A.4: Region Parameter Storage data dictionary. Appendix A. Seeded Region Growing User Guide 143 Figure A.6: Functional partition of Region Parameter stage Appendix A. Seeded Region Growing User Guide 144

A.4 Duplicate Removal

A functional partition of the duplicate removal architecture is shown in Figure A.7.

A.4.1 Data Dictionary

The signals relevant to the Duplicate Removal section of the design are shown in Table A.5. Appendix A. Seeded Region Growing User Guide 145

Pin Name Direction Size Description clk I 1 125MHz clock dr en I 1 Duplicate removal enable signal dr data in 0 I [31:0] Input co-ordinate data from CCL pro- cess from BRAM T0 to DR stage dr data in (1-63) I [31:0][5:0] Input co-ordinate data from CCL pro- cess from BRAMs T1 -T63 dr read addr (0-63) O [4:0] Address to read from in CCL BRAM T0 -T63 dr comp O 1 Duplicate removal completion signal ROI coor out O [31:0] Co-ordinates of isolated ROI temp dr data - [31:0] Result of bounding box comparison op- eration temp dr wr addr - [4:0] Address to write result of bounding box comparison operation temp wren - 1 Write enable for storage of temporary comparison result temp comp data - [31:0] Temporary comparison data temp comp rd addr 0 [4:0] Address to read temporary comparison data dr data A filt - [31:0] Data from CCL BRAM T0 filtered based on AR and location dr data A sel - 1 Select current ROI A data for ROI bounding box comparator data in A - [31:0] Current ROI A bounding box data mem sel - [5:0] Select CCL BRAM T1-T63 to read ROI B bounding box data data in B filt - [31:0] Data from CCL BRAM T1-T63 filtered based on AR and location

Table A.5: Duplicate removal data dictionary. Appendix A. Seeded Region Growing User Guide 146 Figure A.7: Functional partition of Duplicate Removal architecture Appendix A. Seeded Region Growing User Guide 147

A functional partition of the Bounding Box Comparator stage of the Duplicate Removal stage is shown in Figure A.8. The signals relevant to the Bounding Box Comparator stage of the Duplicate Removal stage are shown in Table A.6.

Pin Name Direction Size Description clk I 1 125MHz clock dr en I 1 Duplicate removal enable signal data in A filt I [31:0] Input co-ordinate data from CCL pro- cess from BRAM T0 to DR stage

data in B filt I [31:0] Input co-ordinate data from CCL pro- cess from BRAM T1-T63 to DR stage

dr done O 1 Duplicate removal completion signal temp dr data O 1 Result of bounding box comparison temp dr wr addr O [4:0] Address to write result of comparison temp wren O 1 Enable signal for storing comparator re- sult addr rd A O [4:0] Address to read from in CCL BRAM (T0 ) addr rd B O [4:0] Address to read from in CCL BRAM (T1 -T63 ) A match B - 1 Asserted if bounding box A and B are a match mem sel - [5:0] Selects the current bounding box to be examined in memory mem sel rst - 1 Reset memory select counter to 0 addr rd A rst - 1 Reset addr rd A to 0 addr rd B rst - 1 Reset addr rd B to 0

Table A.6: Bounding Box Comparator data dictionary. Appendix A. Seeded Region Growing User Guide 148 Figure A.8: Functional partition of Duplicate Removal architecture Appendix B

DMA Engine Overview

An in-depth explanation of the DMA transfer process is provided in this appendix. The DMA engine in this work allows the hardware FPGA to access memory indepen- dently of the CPU.

B.1 Linux device driver overview

Linux manages the embedded system’s hardware. The kernel, and its device drivers, form a bridge or interface between the end-user and the hardware. Any subroutines or functions forming part of the kernel (modules and device drivers) are considered to be part of kernel space. End-user programs, like the FIR pedestrian detection algorithm proposed in this thesis reside in the user space of the Linux Kernel. These applications need to interact with the system’s hardware. However, they dont do so directly, but through the kernel supported functions.

149 Appendix B. DMA Engine Overview 150

B.2 DMA Engine Parameters

The parameters used to configure the DMA engine using Altera’s megawizard tools are shown in tables B.1, B.2 and B.3.

PCIe Core Type Hard IP

PHY Type Arria II GX

Lanes x1

Port Type Native Endpoint

Xcvr ref-clk 100MHz

PCI Express Version 2.0

Test Out Width 9 bits

PCIe Reconfig Disable

Table B.1: PCIe System Settings

BAR BAR Type BAR Size

0 32-bit Non- 256MBytes - 28 bits prefetchable 1 32-bit Non- 256KBytes - 18 bits prefetchable 2 32-bit Non- 256KBytes - 18 bits prefetchable

Table B.2: PCIe Address Registers Appendix B. DMA Engine Overview 151

Device ID 0xE001

Vendor ID 0x1172

Subsystem ID 0x1410

Subsystem vendor ID 0xA106

Revision ID 0x01

Class code 0xFF0000

Table B.3: PCIe Read Only Registers

B.3 Transfer Procedure

The transfer of data is controlled by a Linux device driver developed for the 2.6.32- 33 Linux kernel. The device driver communicates with the user space where the main functions of algorithm are present. There are 5 stages associated with the DMA engine, these are:

• Load the DMA control module in the Linux Kernel

• Initialise the DMA engine

• Write data to the FPGA from the Host

• Read data from the FPGA to the Host

• Close the DMA engine

B.3.1 Initialise DMA engine

The FPGA is assigned a PCI channel. Virtual memory is allocated to the device driver for data transfers. Appendix B. DMA Engine Overview 152

The FPGA memory regions are mapped into kernel virtual address space after verifying their sizes. The virtual addresses reserved for the DMA operation

B.3.2 Write Data to FPGA from Host

The location of the current frame in memory is recorded and the address pointer is passed from the user space to the device driver. The pointer to the data is placed in a DMA descriptor table. A DMA header is then created detailing the locations of the descriptors in host memory. The DMA packet is then transferred to the DMA engine on the FPGA and the transfer begins. When the operation is finished the device raises a hardware interrupt to notify the host that the data transfer is completed.

B.3.3 Read Data from FPGA to Host

Memory is reserved in the host CPUs RAM. The address of this memory is passed to the driver along with a DMA write request and the amount of data to write back to the host; in this case 32 labels storing the ROI co-ordinates. The DMA descriptors are then created based on the data passed to the device driver. When the operation is finished the device raises a hardware interrupt to notify the host that the data transfer is completed.

B.3.4 Close DMA Engine

To prevent errors and memory leaks the DMA engine must be closed before ending the task. This function deallocates the memory assigned to the virtual addresses and notifies the device that the operation is completed. Appendix B. DMA Engine Overview 153

B.4 DMA Descriptor Table

As seen in Chapter 3, a descriptor table containing all of the relevant frame in- formation must be created. This lets the DMA engine know where to read data from on the host, and where to transfer the memory on the FPGA.

B.5 DMA Engine Testing

The engine was tested by performing a loopback test. A virtual buffer was allo- cated in host memory. Half of the buffer was filled data and passed to the device. The data was then read back from the FPGA endpoint memory and stored in the second half of the buffer. The data was compared to the original to determine if it was identical. Appendix C

Publications

The publications related to the work detailed in this thesis are presented in this chapter. Copies of published articles are included for completeness, while other work relating to this research can be found at http://car.nuigalway.ie/students/phurney/index.html.

C.1 Journals

C.1.1 Published

• Patrick Hurney, Peter Waldron, Fearghal Morgan, Edward Jones, Martin Glavin, “Night-time Pedestrian Classification with HOG-LBP Vectors”, IET Intelligent Transportation Systems, Vol 9., Issue No. 1, pp. 75-85, Published April 2015.

• Patrick Hurney, Peter Waldron, Fearghal Morgan, Edward Jones, Martin Glavin, “Review of Pedestrian Detection Techniques in Automotive Far-Infrared Video”, IET Intelligent Transportation Systems, Vol. 9., Issue No. 8, pp.824- 832, Published October 2015

154 Appendix C. Publications 155

C.1.2 In Submission

• Patrick Hurney, Peter Waldron, Fearghal Morgan, Edward Jones, Martin Glavin, “Real-time Detection of Pedestrians in Far-Infrared Imagery with an Embedded CPU/FPGA System”, Springer Journal of Real-Time Image Pro- cessing. Submitted April 2015

• Patrick Hurney, Peter Waldron, Martin Glavin, Edward Jones, Fearghal Mor- gan, “Hardware Acceleration of Feature Based Seeded Region Growing”, IEEE Transactions on Circuits and Systems for Video Technology. Submitted Septem- ber 2016

C.2 Conferences

C.2.1 Published

• Patrick Hurney, Peter Waldron, Fearghal Morgan, Edward Jones, Martin Glavin “Embedded CPU/FPGA Architecture for Pedestrian Detection in Automotive Infrared Images”, 20th IET Irish Signals and Systems Conference, National University of Ireland, Maynooth, May 2012 156

Please, don’t worry so much. Because in the end, none of us have very long on this Earth. Life is fleeting... - Robin Williams