Motion Tracking on Embedded Systems -

Vision-based Vehicle Tracking using

Image Alignment with Symmetrical Function

CHEUNG, Lap Chi

A Thesis Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Philosophy in Computer Science and Engineering

© The Chinese University of Hong Kong August 2007

The Chinese University of Hong Kong holds the copyright of this thesis. Any person(s) intending to use a pari or whole of the materials in the thesis in a proposed publication must seek copyright release from the Dean of the Graduate School. Iti^Hj u 謹 ITY y^/j ^XLIBRARY SYSTEM^纷 Thesis/Assessment Committee

Professor WONG Kin Hong (Chair) Professor MOON Yiu Sang (Thesis Supervisor) Professor JIA Jiaya Leo (Committee Member) Professor FAIRWEATHER Graeme (External Examiner) i

Abstract

The large number of rear end collisions due to driver inattention has been identified as a major automotive safety issue in Intelligent Transportation Systems (ITS), part of whose goals is to apply automotive technology to improve driver safely with efficiency. In this thesis, we describe a 3-phase vehicle tracking methodology with a single moving camera mounted on the driver's automobile as input for detecting rear vehicles on highways and city streets for the purpose of preventing potential rear-end collisions. In the first phase, a rear vehicle is detected by symmetrical measurement and Haar transform. A template is also created for tracking the vehicle using image

alignment techniques in the second phase. The template is continuously monitored

and updated to handle abrupt changes in surrounding conditions in the third phase.

Different from most previous methods which simply delect vehicles in each frame

without correlating the successive frames, our methodology gives precise tracking, an

essential requirement for estimating the distances between the rear vehicles and the

driver's car using the newly proposed formula. To implement our methodology in

real-time embedded systems, we further enhance the efficiency of our algorithms with

the aid of symmetrical function and simplified image alignment techniques.

Preliminary testing results show that our system acquires a successful tracking rate

over 97%. ii

論文摘要

由於駕駛者的低警覺性造成的車輛尾端碰撞意外已成爲智能運輸系統(Intelligent

Transportation Systems)的一大議題°智能運輸系統就是透過應用自動化技術來提

高駕駛安全及效率。在本論文中,我們會描述一個三階段車輛追蹤法,以設於駕

駛者車輛上的單鏡攝像頭作爲系統輸入來偵測後逐車輛’防止尾端碰撞的發生。

在第一階段,我們使用對稱測量(Symmetrical Measurement)及哈爾轉換(Haar

Transform)兩方法來偵測後逐車輛,並創造模板供第二階段的圖像拼接法(Image

Alignment)作物體追蹤。其間模板會不斷受到監察,一旦遇到環境突變而影響追

蹤效能,第三階段會即時作出模板更新以保持精確的車輛追蹤。新提出的追蹤方

法著重於影像片間之交互作用’有別於現有的簡單單一影像片偵測法。透過應用

三階段車輛追縱法及最新提出的車輛相互距離計算方程式,便能對後逐車輛的距

離作出準確的運算°在對稱測量及簡化圖像拼接法(Simplified Image Alignment)

的幫助下,這實時多車輛追蹤法更能在嵌入式系統上實現,而初步的實驗追蹤率

更維持在97個巴仙以上。 iii

To my dearest family and friends iv

Acknowledgement

I wish to thank my supervisor, Prof. Moon Yiu Sang, for his continuous guidance and supervision, for supporting me to attend academic conferences and for all his teachings in research, learning and living. Also, gives me many opportunities being the leader of projects and competition. I wish to express my gratitude to Prof. Wong

Kin Hong and Prof. JIA Jiaya Leo for being members of my thesis committee. I would also like to thank Prof. G. Fairweather for being my external marker.

Special thanks are due to the Robotics Institute at the Carnegie Mellon University for making their image alignment algorithm implementation publicly available. Without their contribution, this research work would not have such a good start. I gratefully acknowledge helpful discussions with Mr, Chen Jiansheng, Mr. Chan Ka Cheong, Mr.

Chan Fai and Mr. Tang Xinmin for sharing their research insight and ideas with me.

My sincere thanks to Wong Kwong Wa and Tse Wing Cheong for doing the initial study of the vehicle tracking system. I must thank Tony, Kim, Oscar, Simon and other technical staff for their patience with me and excellent work in maintaining a stable

computing environment. I would like to thank the Chinese University of Hong Kong

for providing such a beautiful campus, a resourceful library and excellent recreational

facilities and Shaw College for providing such a wonderful college life and

comfortable living place.

Then there is the list of friends and colleagues that deserve my deepest gratitude.

Many thanks to Kuen, Sunny, Siu Ming, Felix, Yu Tsz and Fung, for bringing me the V

memorable joy in Cup competition including working overnight at room 1024, trip lo Shanghai and Beijing and all of the follow-up events; Ming, Brittle, Kit, Brian,

Cheong, Ocean, CJ, Ma Fai, Kevin, Lo Wing and Chan Fai, for creating such enjoyable and relaxing atmosphere; all of the friends in room 1003, for letting me know what is meant by efficient and hardworking; all of the hostel tutors in Student

Hostel II,Shaw College, for working together to organize joyful and meaningful

hostel activities.

And, my heartfelt thanks to those who have helped me one way or the other, even

though both you and I may not be aware of it. I wish to express my deepest gratitude

to Ming, Chan Fai, Cyrus and Peng Xiang for giving me humble advice and chatting

with me during trouble limes. Thanks Ming and Chan Fai, my TA partners, for

sharing the workload when I am busy in research. You both are enthusiastic in

leaching! My special thanks goes to Peng Xiang, who gave me motivation to finish

this thesis by telling me thai he had already finished his in April. To the researchers in

SPIE and MVA conferences, thanks your comments and suggestions which help me

keep improving my thesis.

I would like to thank the world for being such a wonderful place to live in, to

appreciate and to explore. Thanks the Department of Computer Science and

Engineering, the Chinese University of Hong Kong. I am glad to study here for 5

years of undergraduate and postgraduate study.

And most of all, thanks to my family members, Mum, Father, Sister, Brother and

Ching for giving me love, support and concern during trouble limes. vi

Table of Contents

1. INTRODUCTION 1

1.1. BACKGROUND I

1.1.1. Introduction to Intelligent Vehicle 1

1.1.2. Typical Vehicle Tracking Systems for Rear-end Collision Avoidance 2 1.1.3. Passive VS Active Vehicle Tracking 3 1.1.4. Vision-hased Vehicle Tracking Systems 4 1.1.5. Characteristics of Computing Devices on Vehicles 5

1.2. MOTIVATION AND OBJECTIVES 6

1.3. MAJOR CONTRIBUTIONS 7

1.3.1. A 3-phase Vision-based Vehicle Tracking Framework 7 7.3.2. Camera-to-vehicle Distance Measurement by Single Camera 9 1.3.3. Real Time Vehicle Detection 10 1.3.4. Real Time Vehicle Tracking using Simplified Image Alignment JO

1.4. EVALUATION PLATFORM 11

1.5. THESIS ORGANIZATION 11

2. RELATED WORK 13

2.1. STEREO-BASED VEHICLE TRACKING 13

2.2. MOTION-BASED VEHICLE TRACKING 16

2.3. KNOWLEDGE-BASED VEHICLE TRACKING ] 8

2.4. COMMERCIAL SYSTEMS 19

3. 3-PHASE VISION-BASED VEHICLE TRACKING FRAMEWORK 22

3.1. INTRODUCTION TO THE 3-PHASE FRAMEWORK 22

3.2. VEHICLE DETECTION 23

3.2.1. Overview of Vehicle Detection 23 3.2.2. Locating the Vehicle Center — Symmetrical Measurement 25 3.2.3. Locating ihe Vehicle Roof and Bottom 28 3.2.4. Locating the Vehicle Sides 一 Over-complete Haar Transform 30

3.3. VEHICLE TEMPLATE TRACKING - IMAGE ALIGNMENT 37

3.3.5. Overview of Vehicle Template Tracking 57 3.3.6. Goal of Image Alignment 41 3.3.7. Allernative Image Alignment - Compositional Image Aligmnent 42

3.3.8. Efficient Image Alignment - Inverse Compositional AIgorithm 43 vii

3.4. VEHICLE TEMPLATE UPDATE 46

3.4.1. Situation of Vehicle lost 46 3.4.2. Template Fitting by Updating the posiliom of Vehicle Features 48

3.5. EXPERIMENTS AND DISCUSSIONS 49

3.5.1. Experiment Setup 49 3.5.2. Successful Tracking Percentage 50

3.6. COMPARING WITH OTHER TRACKING METHODOLOGIES 52

3.6.1. 1-phase Vision-based Vehicle Tracking 52 3.6.2. Image Correlation 54 3.6.3. Confinuously Adapiive Mean Shift 58

4. CAMERA-TO-VEHICLE DISTANCE MEASUREMENT BY SINGLE CAMERA 61

4.1. THE PRINCIPLE OF LAW OF PERSPECTIVE 61

4.2. DISTANCE MEASUREMENT BY SINGLE CAMERA 62

5. REAL TIME VEHICLE DETECTION 66

5.1. INTRODUCTION 66

5.2. TIMING ANALYSIS OF VEHICLE DETECTION 66

5.3. SYMMETRICAL MEASUREMENT OPTIMIZATION 67

5.3.1. Diminished Gradieni Image for Symmelrical Measurement 67 5.3.2. Replacing Division hy Multiplication Operations 77

5.4. OVER-COMPLETE HAAR TRANSFORM OPTIMIZATION 73

5.4.1. Char acl eristics of Over-complete Haar Transform 73 5.4.2. Pre-compntation of Haar block 74

5.5. SUMMARY 77

6. REAL TIME VEHICLE TRACKING USING SIMPLIFIED IMAGE ALIGNMENT 78

6.1. INTRODUCTION 78

6.2. TIMING ANALYSIS OF ORIGINAL IMAGE ALIGNMENT 78

6.3. SIMPLIFIED IMAGE ALIGNMENT 80 6.3.1. Reducing the Number of Parameters in AJfine Transformation 80 6.3.2. Size Rednclion of Image A ligmneni Matrixes 85

6.4. EXPERIMENTS AND DISCUSSIONS 85

6.4.1. Successful Tracking Percentage 86 6.4.2. Timing Improvement 87

7. CONCLUSIONS 89

8. BIBLIOGRAPHY 91 viii

List of Tables

Table 1. Specification of the evaluation system - Intel ECX platform 11 Table 2. Setting of camera mounted on the testing vehicle 50 Table 3. Information and tracking percentage of road image sequences 51 Table 4. The successful tracking percentage of 3-phase structure versus 1 -phase structure 53 Table 5: Experiment information of the adaptive correlation vehicle tracking 56 Table 6. Timing information of the vehicle detection 67 Table 7. Timing information of the vehicle detection after optimization 77 Table 8. Information of the image sequence in the timing analysis 79 Table 9. Timing information of the original vehicle template tracking per frame of one vehicle 79 Table 10. Timing information of the original vehicle tracking methodology per frame 80 Table 11. Parameters involved in each basic affine transformation 81 Table 12. The visualization of matrixes in the original the simplified image alignment 85 Table 13: Tracking percentage of simplified image alignment 86 Table 14. Timing of the simplified vehicle template tracking per frame of one vehicle 88 Table 15. Timing of the simplified vehicle tracking methodology per frame 88 ix

List of Figures

Figure 1. The 8 major problem areas in Intelligent Transportation Systems 1 Figure 2. Typical vehicle tracking system for rear-end collision avoidance 3 Figure 3. An example of ROI (Region of Interesting) in vehicle tracking 5 Figure 4. Intel ECX platform embedded with Intel M 6(){)MHz processor 6 Figure 5. Testing vehicle used in the project "Machine Vision Based Vehicle Guidance" 13 Figure 6. Stereo image pair and its corresponding disparity maps 14 Figure 7. Objects above and on the road surface detected by the disparity map 15 Figure 8. Vehicles detected and tracked by the stereo-based vehicle tracking method 16 Figure 9. Flow field and object segmentation computed by ASSET : 17 Figure 10. Vehicle tracking result of ASSET 17 Figure 11. Result of "Real-Time Multiple Vehicle Detection and Tracking from a Moving Vehicle" ...19 Figure 12. Tracking result of EyeQ with distance information 20 Figure 13. Visualization of the 3-phase framework 23 Figure 14. Segmentation of the vehicle part from the whole image 24 Figure 15. Symmetrical measurement using gradient image with horizontal and vertical edges 27 Figure 16. Symmetrical measurement using gradient image with horizontal edges 28 Figure 17. Searching window with the symmetrical line of the vehicle as the centerline 29 Figure 18. Locating the car roof and car bottom within the searching window 30 Figure 19. The good vehicle template versus the bad template 31 Figure 20. An example of vehicle images on road environment : 32 Figure 21. Multi-scaling Haar transformed car image 33 Figure 22. Steps of estimating the location of the vehicles, sides 34 Figure 23. The standard Haar Transform Framework 35 Figure 24. The over-complete Haar Transform Framework 36 Figure 25. Vertical coefficient of the vehicles, side image 37 Figure 26. The four basic transformations in affme transformation 38 Figure 27. Image warping in vehicle tracking 40 Figure 28. Visualization of the image alignment tracking process 42 Figure 29. The framework of the inverse compositional algorithm for vehicle template tracking 46 Figure 30. The situation of vehicle tracking lost 47 Figure 31. The framework of Vehicle Template Update 49 Figure 32. An example of wrongly detected vehicle 52

Figure 33: Vehicle tracking using adaptive correlation-based matching 57 Figure 34: Visual difference between the vehicle template and the sub-image 58 V

Figure 35. Vehicle tracking using CAMShift 59 Figure 36. Schematic diagram of the imaging geometry of inter-vehicles 61 Figure 37. Schematic diagram of the imaging geometry of vehicles with different height 62 Figure 38. Newly proposed formula for computing the camera-to-vehicle distance 63 Figure 39. The distance computed by ® formula and the newly proposed formula 63 Figure 40. Result of vehicle tracking images with distance information 64 Figure 41. The steps involved in diminished gradient image for symmetrical measurement 68 Figure 42. The comparison of symmetrical measurement with original diminished image input 70 Figure 43. Vehicle Template Update correcting the initial lateral position error 71 Figure 44. Wavelet computation in over-complete Haar Transform 74 Figure 45. Overlapping relation between Haar blocks 75 Figure 46. Average intensity of overlapping region shared by two blocks 75 Figure 47. Division and reuse of sub-Haar blocks 76 Figure 48. Variation of affine transformation parameters 84 Figure 49. Comparison of computed distances of simplified and original tracking methodology 87 xi

List of Algorithms

Algorithm �.Origina implementatiol n of symmetrical measurement 72 Algorithm 2. Implementation of symmetrical measurement without division 73 Chapter 1. Introduction 1

1. •ntroduction

1.1. Background

1.1.1. Introduction to Intelligent Vehicle

Despite advances over the past 100 years, no aspect of automotive technology has

ever tried to accomplish what the human driver does with his or her own eyes.

Providing drivers with additional in-vehicle information is a complex endeavor

that unless technologies are carefully designed may even compromise driver

safety and efficiency. For this reason, Intelligent Transportation Systems (ITS) has

become an active research area over the last decade. The ITS is something like an

intelligent navigator, thai can, in place of a human navigator sitting on the next

seal, gives the driver appropriate advice on safe and efficient driving.

f Rear-End Collision \ � Avoidance J

V y V Avoidance J

广 / Intelligent \ (Driver Condition \ Transportation A Road Departure \ V Warning / _ V Collision Avoidance / \ Systems j

( Vision \ I Enhancement J

Figure �.Th 8e major problem areas in Intelligent Transportation Systems Chapter 1. Introduction 2

Researchers have carefully selected 8 major problem areas [1] that they consider

“prime candidates” for improving driver performance because they improve safety

or may impact safety. Figure 1 shows the 8 major areas. Among the major areas,

rear-end collisions account for one in four crashes or over 1.5 million crashes a

year. New technologies will be used to sense the presence and speed of vehicles

behind and provide warnings to avoid collisions. Early versions will use

extensions of adaptive cruise control capabilities to delect and classify stationary

objects and to determine the level of threat from vehicles in rear. It results in the

moving vehicle detection being the important component in an ITS that can be

used in driver assistance system to give warning signals to drivers in case of

potential collisions.

1.1.2. Typical Vehicle Tracking Systems for Rear-end Collision Avoidance

Generally, rear-end collisions occur when the drivers become unaware or aware

too late of the potential dangers on the road. The vehicles are closer than they

appear in convex-type side mirrors so that the driver may underestimate the

approach of rear vehicles. Vehicle tracking on automobiles aims to provide an

automatic and continuous supervision on road situation aiming to alert a driver

about driving environments, and possible collision with rear-end vehicles have

attracted a lot of attention. A typical vehicle tracking system for rear-end collision

avoidance includes capturing sensor input, recognizing the current traffic situation

and generating appropriate driving advice. Figure 2 shows the architecture of a

typical vehicle tracking for rear-end collision avoidance. The road scene

recognition subsystem recognizes the current traffic situation using the rear-end

sensor mounted on the driver vehicle as input. The advice generation subsystem Chapter 1. Introduction 3

generates appropriate advice and gives it to the driver. The driver may control the

vehicle according to the given advice.

Vehicle Tracking System for Rear-end Collision Avoidance

Road Scene H;。" Advice Recognition _ Generation Subsystem Subsystem

Rear-end Advice or Sensor Input j� Warning

Figure 2. Typical vehicle tracking system for rear-end collision avoidance

1.1.3. Passive VS Active Vehicle Tracking

In the vehicle tracking system, the first task is to identify the rear vehicles. One of

the common approaches to vehicle detection is using active sensors such as lasers,

light detection and ranging, or millimeter-wave radars as the sensor input. They

are called active because they detect the distance of an object by measuring the

travel time of a signal emitted by the sensors and reflected by the object. Their

main advantage is that they can measure certain quantities (e.g. distance) directly

requiring limited computing resources. Prototype vehicles employing active

sensors have several drawbacks, such as low spatial resolution, and slow scanning

speed [2]. Moreover, when a large number of vehicles are moving simultaneously

in the same direction, interference among sensors of the same type poses a big

problem. Chapter 1, Introduction 4

Optical sensors, such as normal cameras, are usually referred to as passive sensors

because they acquire data in a non-intrusive way. One advantage of passive

sensors over active sensors is cost. Optical sensors can be used to track more

effectively cars entering a curve or moving from one side of the road to another.

Also, visual information can be very important in a number of related applications,

such as lane detection,traffic sign recognition, or object identification (e.g.,

pedestrians, obstacles), without requiring any modifications to road infrastructures.

On the other hand, vehicle detection based on optical sensors is very challenging

due to huge within class variability. For example, vehicles may vary in shape, size,

and color. Vehicle appearance depends on its pose and is affected by nearby

objects. Illumination changes, complex outdoor environments (e.g. illumination

conditions), unpredictable interactions between traffic participants, and cluttered

background are difficult to control.

1.1.4. Vision-based Vehicle Tracking Systems

The passive vehicle tracking system becomes popular because of the physical

drawbacks like low spatial resolution, slow scanning speed and interference of

active approaches and the advancing of vision-based vehicle tracking algorithm.

Therefore, it is worthwhile to investigate completely vision-based approaches for

vehicle tracking. The primary stage of detection and tracking is the moving

objects segmentation which is a class of pattern recognition problem. An accurate

segmentation of the vehicles is required in order to obtain an accurate tracking.

ROI (Region of Interesting) extraction is first done to segment candidate vehicle

regions, called vehicle template, of which the commonly used method includes

stereo-based, motion-based and knowledge-based. Figure 3 shows an example of

I Chapter 1. Introduction 5

ROI in vehicle tracking. The aim of vehicle tracking is to locate the updated

vehicle location over the image sequence by minimizing the image difference

between the extracted ROI and the current images.

Mm jw^

Jteajikessr^^C'

Figure 3. An example of ROI (Region of Interesting) in vehicle tracking

1.1.5. Characteristics of Computing Devices on Vehicles

Limited power supply, small space and closed environment restrict the

computational resources available on automobiles. Although the technology keep

advancing, common computers are not suitable in the automobile environment

due to high power consumption, large size and high heal dissipation. Typical

processors for computation on vehicles such as the Intel ECX (Embedded

Compact Extended) platform, one of its design purposes being the in-vehicle

technology, has low power consumption, compact size and fanless capacity.

Figure 4 shows the Intel ECX platform.

However, the low operating frequency makes the real-time image processing

applications, such as real-time vehicles tracking, difficult to be run on this type of Chapter 1. Introduction 6

platform. For example, the operating frequency of Intel ECX platform is 600 MHz.

Therefore, dedicated efforts must be made to optimize the performance of

real-time tracking algorithm.

�.i :; I I i ; _AA_U Ur_.. iHliff^l^fflfflBm^

^BlLll^^B^H 酬瞧i ihililniHiiiinijilllli ffi^HHHQl^^U

Figure 4. Intel ECX platform embedded with Intel Celeron M 6{)()MHz processor

1.2. Motivation and Objectives

The large number of rear end collisions due to driver inattention has been

identified as a major automotive safely issue. Even a short advance warning can

significantly reduce the number and severity of the collisions. We have developed

a real-time rear vehicle detection system, which can give an advance warning to a

driver when the relative distance between a rear vehicle and the driver's car is too

small, for the purpose of prevention of rear-end collisions.

Our research is focused on developing a real-time moving vision-based vehicle

detection system for operation in outdoor scenes thai employs only a single

moving camera mounted on the driver's automobile as input, for use in delecting Chapter 1. Introduction 7

multiple rear vehicles on highways and city streets. Several facts make this task

rather challenging:

1) Real-time performance requires efficient algorithms

2) Cameras mounted in a moving vehicle are not static, which means most

methods used by surveillance systems can not be applied here

3) Scenes are dynamic and so, less prior knowledge can be utilized

4) Illuminalion and contrast change, even in the same scene at different times

5) Objects' shape and color have broad ranges

111 this thesis, we will propose a 3-phase vision-based framework for robust and

precise vehicle tracking in outdoor scenes with camera-to-vehicle distance

measuremenl using single moving camera. Our goal is to show thai real-time

multiple vehicle tracking can be achieved on computing devices on automobiles

with high successful tracking percentage.

1.3. Major Contributions

1.3.1. A 3-phase Vision-based Vehicle Tracking Framework

In our proposal, the vehicle tracking is divided into three phases: vehicle detection,

vehicle template tracking and vehicle template update. The vehicle is first located

by symmetrical function using the special feature of vehicles, symmetry, in the

images. The lateral centers of vehicles in images are located with high

symmetrical measurement. With the aid of symmetrical function, the region of

interest in the image can be greatly reduced. Moreover, it makes the methodology

becomes relatively simple to implement using embedded system technology in the

automobile environment. By locating the car features like car roof, car bottom and Chapter 1. Introduction 8

car sides using the techniques of horizontal edge response, horizontal pixel

intensity response and over-complete Haar transform respectively, the vehicle is

delected in the image and the segmented region, called vehicle template, is passed

to the vehicle template tracking phase.

The 3-phase structure is used in our methodology instead of simple detection of

vehicles in each frame, or a 1 -phase vehicle detection framework, because

traditional vehicle detection techniques for tracking cannot distinguish between

changes in viewpoint or configuration of the vehicles relative to the camera [3].

Such a structure aims to determine the image configuration of a target region of a

vehicle as it moves through a camera's field of view and estimate the steropsis and

motion of vehicles. Its objective is to determine,in a global sense, the movement

of an entire vehicle template over a long sequence of images. The information of

change in viewpoint and position varies in the tracking system, leading to clue for

estimation of the relative distance or the contact time of a rear vehicle.

After the vehicles have been located, and segmented in the image, the segmented

area is used as a template for object tracking employing image alignment

technique consisting of moving, and possibly deforming the template lo minimize

the difference between the template and the current image. Since the first use of

the Lucas-Kanade optical flow algorithm [4], image alignment has become one of

the most widely used techniques in computer vision. There are many image

alignment algorithms and the inverse compositional algorithm is adopted in our

methodology because of its high efficiency whose keys include switching the role

of the image and the template and the pre-computation of template gradient,

Jacobian, steepest descent images and Hessian matrix. Chapter 1. Introduction 9

What makes vehicle tracking difficult is the potential variability in the images of

vehicles over time. This variability arises from two principal sources: variation in

vehicle pose or deformations and variation in illumination [5]. When ignored, the

variability is enough to cause a tracking algorithm to lose its target. This result

calls for the template update in phase 3.

The proposed 3-phase framework is robust as it can tolerate frequent illumination

and contrast changes under different lighting environments. In addition, it can

handle vehicles with broad ranges of shapes and colors as well as multiple

numbers of vehicles simultaneously.

1.3.2. Camera-to-vehicle Distance Measurement by Single Camera

Our methodology is to provide an advance warning to a driver when the relative

distance between a rear vehicle and the camera on the driver's car is below a

certain threshold. The relative distance from the target vehicle to the camera on

the driver's car can be found by the newly proposed formula based on the

equation proposed by an intelligent transportation system provider, Mobileye® [6]

using the law of perspective with the height of vehicles in the image. The

continuously distance information is provided by tracking the vehicles, and also

the change of height, in the image.

The newly proposed formula relaxes the assumption that the top of the vehicle

must be at the same level of the camera, independent of vehicle types. Chapter 1. Introduction 10

1.3.3. Real Time Vehicle Detection

To enable real time performance, several computational techniques are adopted in

our system. Firstly, it is not practical to search the ROI of vehicles from the entire

image. In our case, symmetrical measurement is employed to reduce the searching

area. Such measurement is further optimized using techniques of diminishing

image input and replacing division operations so that more computational

resources could be reserved for vehicle tracking and other advanced intelligent

vehicle features. Secondly, we also optimize the over-complete Haar transform by

pre-computation.

1.3.4. Real Time Vehicle Tracking using Simplified Image Alignment

The original image alignment tracking methodology is loo computational

intensive to run on embedded systems in real-time. To realize the real-time

tracking, we have simplified the computation by minimizing the number of image

alignment parameters. The idea of simplifying image alignment stems from the 4

basic affme transformations: translation, scaling, shearing and rotation. From

experience and experiment results, it is evident that the cases of shearing and

rotation are inappropriate to apply for successive images for the same vehicle

moving on the road and would not assist the vehicle tracking. Therefore, we

propose to exclude the shearing and rotation transformations so that the real-time

vehicle tracking can be achieved on embedded systems. Chapter 1. Introduction 11

1.4. Evaluation Platform

Limited power supply, small space and closed environment restrict the

computational resources available on automobiles. Although the technology keep

advancing, common desktop computers are not suitable in the automobile

environment due to high power consumption, large size and high heat dissipation.

The Intel ECX (Embedded Compact Extended) platform which has low power

consumption, compact size and fanless capacity is ideal for in-vehicle applications.

An ECX development board is used for evaluation. Table 1 shows its specification.

All of the experiments mentioned in this thesis are tested on this platform.

Processor Intel Celeron M 600MHz

Memory 256MB DDR 266MHz

OS Linux (kernel version 2.6.9)

Compiler gcc (version 3.4.2)

Table 1 • Specification of the evaluation system - Intel ECX platform

1.5. Thesis Organization

This chapter provides an overview of intelligent vehicle and vision-based vehicle

tracking system. A summary of the objectives and contributions of our work is

also given. The remaining chapters of this thesis are organized as follows: Chapter

2 briefly reviews the related work in vehicle tracking with different approaches.

Chapter 3 presents the framework of 3-phase vision-based vehicle tracking

including the method of vehicle detection method, the procedure of the vehicle

template tracking using inverse compositional image alignment and the principle

of vehicle template update. Then, the technique of finding the relative distance Chapter 1. Introduction 12

from the target vehicle is introduced in Chapter 4. Chapter 5 and 6 present how

the vehicle detection and vehicle tracking can be optimized to run in real-time.

Finally, Chapter 7 summarizes the research performed, and describes some

challenges encountered and possible directions for future research. Chapter 2. Related Work �3

2. Related Work

Vision-based vehicle tracking is often classified into 3 categories: stereo-based,

motion-based and knowledge-based [2]. In this chapter, a brief description and

some sample tracking results will be given for all categories. Most of the previous

works are different from our 3-phase structure since they simply detect vehicles in

each frame of image sequence. Although some tracking results are for front-end

tracking, the methodologies still can be applied for rear-end tracking.

2.1. Stereo-based Vehicle Tracking

Stereo vision is widely used in 3D reconstruction application. By computing the

difference in the left and right images between corresponding pixels, 3D world

information can be gained. The difference is called disparity and the disparities of

all the image points form the so-called disparity-map. If the parameters of the

stereo rig are known, the disparity map can be converted into a 3D map of the

viewed scene.

I stereo camera mounted on the test vehide |

Figure 5mm. Testing vehicle used in the project "Machine Vision Based Vehicl e Guidance’, Chapter 2. Related Work �14

The project "Machine Vision Based Vehicle Guidance” [7] is one of the projects

thai use the stereo vision technique for vehicle tracking. Figure 5 shows the testing

vehicle mounted with a stereo camera rig [7]. The correspondences of image

features in the left and right view are first resolved by computing the stereo

disparity. Then obstacles are mapped to points of non-zero disparity making Ihem

detectable. See Figures 6 where (a) and (b) are a stereo image pair, and (c) and (d)

show points of interest on the corresponding images. Significant disparities

correspond to the obstacles.

I m

(a) (b)

,,_ �V. l " “� ,._

/ . ';^' “ ?> . 't �:. ‘ s -

(C) (d)

Figure 6. Stereo image pair and its corresponding disparity maps

From the residual disparity in an image location we can obtain the 3D location of

the points of interest. Residual disparities, which appear in the image after the

ground plane disparity has been mapped to zero, indicate objects which appear

above the ground plane. A simple threshold is used to distinguish between features Chapter 2. Related Work �15

lying on the ground plane and features due to objects lying above the ground plane.

Figure 7 shows the result on a single frame. Objects detected as above the ground

plane are colored red and those on the ground plane are colored green. Black

regions indicate regions for which the disparity could not be accurately recovered.

im (a) (b) Figure 7. Objects above and on the road surface detected by the disparity map

Finally, neighboring points with similar disparity values are connected together to

form the objects using the clustering technique. These connected components

form the basis of potential vehicles which are to be tracked with lime. If the same

object appears in two consecutive frames, the object is determined as a vehicle.

Figure 8 shows the objects found by the proposed method in this project. The

result shows that the vehicle can be tracked approximately. However, the vehicle

roof and bottom cannot be located and tracked precisely. This can lead to

inaccurate supervision of the traffic on the road and distance measurements. The

most important drawback is that computing the disparity map is very consuming.

The synchronization and heavy I/O of two cameras also makes this approach

unsuitable for embedded systems. Chapter 2. Related Work �16

Figure 8. Vehicles detected and tracked by the stereo-based vehicle tracking method

2.2. Motion-based Vehicle Tracking

In stereo-based vehicle tracking as well as the knowledge-based method described

later, spatial features are used to distinguish between vehicles and their

background. An important cue thai can also be used is the relative motion obtained

via the calculation of optical flow. Approaching vehicles from opposite directions

produce a diverging flow, which can be quantitatively distinguished from the flow

caused by the car ego-motion [8]. On the other hand, departing or overtaking

vehicles produce a converging flow.

The “ASSET: A Scene Segmenter Establishing Tracking’,project is a

representative work employing the motion-based method for vehicle tracking [9].

ASSET starts by tracking 2D image features as they move across the picture over

time, to get a sparse image flow field. The flow field is then segmented into

clusters which have internally consistent flow variation and which have different

flow to each other and to the background. The green shapes in Figure 9 are

examples of initial segmentation. The cluster shapes are filtered over time to give Chapter 2. Related Work �17

improved accuracy and robustness.

Figure 9. Flow field and object segmentation computed by ASSET

(a) Frame 1

(b) Frame 3 晒 (c) Frame 9

Figure 10. Vehicle tracking result of ASSET

From Figure 10, the objects tracked by ASSET seem to be unstable. The optical

flow computation is very sensitive to noise. Given the difficulties faced by

moving camera scenario, getting a reliable dense optical flow is not an easy task. Chapter 2. Related Work �18

Besides, the vehicle roof and bottom cannot be located and tracked precisely as in

the stereo-based vehicle project discussed before.

2.3. Knowledge-based Vehicle Tracking

Knowledge-based methods employ a-priori knowledge to hypothesize vehicle

locations in an image. The vehicle information includes symmetry, color, shadow,

comers, horizontal and vertical edges, texture, and vehicle lights.

The project "Real-Time Multiple Vehicle Detection and Tracking from a Moving

Vehicle" [10] is one of the projects applying knowledge-based method for vehicle

tracking. From the image sequence captured by the camera input, the approximate

positions of vehicles in the images are first located by the temporal difference.

The vehicle tracking regions are then further refined by locating the vehicle lights

and comers. Figure 11 shows the sample tracking result of this project.

The advantage of knowledge-based vehicle tracking is that the vehicle features,

uniquely distinguished from the background objects, can often be identified easily.

Nevertheless, searching a particular vehicle feature from the entire image is often

inefficient. Chapter 2. Related Work �19

(a) Frame 7 (b) Frame 25

:。:.:,:,:—Hej™™™™™

(c) Frame 173 (d) Frame 202

Figure 11. Tracking result of "Real-Time Multiple Vehicle Detection and Tracking from a Moving Vehicle"

2.4. Commercial Systems

Mobileye® [11] has developed an advance warning system on a family of SoC

(System-on-Chip) called EyeQ. The EyeQ supports vehicle detection, lane

detection and pedestrian recognition.

The core of the vehicle detection process lies in the classification. The system

scans the whole image captured by the camera mounted on driver's vehicle for

rectangular shaped regions at all positions and all sizes. Each region of interest is

given a score that represents the likelihood of thai region to be a vehicle. EyeQ

uses several classification schemes throughout the system and they can be rather

degenerate such as the nearest neighbor approach and integration via a cascaded Chapter 2. Related Work �20

classifier such as the hierarchical SVM approach [12]. Figure 12 shows the

sample tracking result of EyeQ.

lobi1Eye. version Nov?9-200?

(a)

^^ilEye. version MovZ9~Zp9||||H

(b) MobilEye. version NovZ9-2002

(c)

Figure 12. Tracking result of EyeQ with distance information

Mobileye® has not disclosed their approach used for vehicle tracking in details.

From Figure 12, we can see that the EyeQ SoC achieves a precise vehicle tracking Chapter 2. Related Work �21

result with distance measurement between the camera and targeted vehicles.

However, there are 11 computing processors working simultaneously at 332 MHz.

It seems that the vehicle tracking algorithm running on EyeQ is extremely

computational intensive.

In this chapter, we have summarized several vision-based vehicle tracking projects

with different approaches. Most of the projects can track multiple vehicles

approximately but not precisely. Such results are not helpful for making accurate

camera-to-distance estimations. Some of the projects are even loo computationally

intensive to be run practically on embedded systems. The interest of a relatively

simple, practical and inexpensive system for real-lime multiple vehicles tracking

becomes necessary. Chapter 3. 3-phase Vision-based Vehicle Tracking Framework 22

3. 3-phase Vision-based Vehicle Tracking Framework

3.1. Introduction to the 3-phase Framework

The 3-phase framework is used in our methodology instead of simply detecting

vehicles in each frame, or a 1 -phase vehicle detection framework, because

traditional vehicle detection techniques for tracking treat an image region simply

as moving "stuff and hence cannot distinguish between changes in viewpoint or

configuration of the vehicles and changes in position relative to the camera. The

information of change in viewpoint and position is valuable in the adaptive cruise

control systems since it gives the clue to the relative distance or the contact time

between a rear vehicle and the camera on the driver's car and thus appropriate

driving advice or advance warning. The 3-phase framework vehicle tracking aims

to determine the image configuration of a target region of a vehicle (The template

segmented in phase 1) as it moves through a camera's field of view (Tracking the

template in phase 2). Vehicle tracking arises in stereopsis and motion estimation

of vehicles. It differs, however, in thai the goal is not to determine the exact

correspondence for every vehicle location in a pair of images, but rather to

determine, in a global sense, the movement of an entire vehicle template over a

long sequence of images.

The visualization of the 3-phase framework is shown in Figure 13. In the first

phase, the vehicle is detected by symmetrical measurement and the over-complete

Haar transform in the image. The segmented region, called vehicle template, is

passed to the vehicle template tracking phase. In the second phase, the vehicle

template is used for object tracking applying image alignment which consists of Chapter 3. 3-phase Vision-based Vehicle Tracking Framework 23

moving and deforming a template to minimize the difference between the

template and an image. The images of vehicles have the potential variation in the

pose and illumination which make the tracking difficult. This results in the

necessity of template update in phase 3 where the template is repositioned for best

tracking. We will show how the proposed framework increases the successful

tracking percentage.

BffllBWffRfflBl • » /"> ;/

— L..i 一

"••••••..� V, 1 國

m-immmmmmmmmmm •濕

Figure 13. Visualization of the 3-phase framework

3.2. Vehicle Detection

3.2.1. Overview of Vehicle Detection

Vehicle detection is the first phase of the 3-phase vehicle tracking methodology.

Our first task is to detect and locate the candidate vehicle in an image and then

pass the segmented image as a template to the vehicle template tracking phase.

Vehicle detection, also interpreted as the vehicle template segmentation, aims to

segment the vehicle part from the whole image as shown in Figure 14. The

locating procedure is also named as Hypothesis Generation (HG) [2] where the Chapter 3. 3-phase Vision-based Vehicle Tracking Framework 24

locations of potential vehicles in an image are hypothesized.

mm

.•, 1 " ‘

Figure 14. Segmentation of the vehicle part (The red rectangle) from the whole image (The image is taken by the camera mounted on the driver's automobile)

A review of on-road vehicle detection using optical sensors is given by Sun, Bebis

and Miller [2] where Hypothesis Generation is classified into one of the following

three categories: (1) stereo-vision based, (2) motion-based, and (3)

knowledge-based. Stereo-vision based method uses stereo information provided

by the two cameras mounted on driver's automobile. By mapping the features

from the image taken by the right camera to the image taken by the left camera,

the contours of vehicles can be found [13]. In general, stereo-vision based

methods are accurate if the stereo parameters have been estimated accurately,

which is really hard to guarantee in the on-road scenario. Optical flow calculation

of vehicles is the representative method in motion-based category. Approaching

vehicles at an opposite direction produce a diverging flow, which can be

quantitatively distinguished from the flow caused by the car ego-motion [8]. Three

factors causing poor performance of the optical flow approach were summarized

in [8]: (a) displacement between consecutive frames, (b) lack of textures, and (c)

shocks and vibrations. Knowledge-based methods employ a-priori knowledge to Chapter 3. 3-phase Vision-based Vehicle Tracking Framework 25

vehicles like color, shadow, comers, horizontal/vertical edges, texture, and vehicle

lights. Among which locating the roof of vehicles by horizontal edge detection

[10,14-20] and locating the bottom using shadow information as a sign pattern of

vehicles [14,21,22] are popular techniques since these features, which are quite

unique from background objects, act as the horizontal boundaries for the

segmentation of vehicles. However, it is not practical to search the roof and

bottom of vehicles from the entire image. In this thesis, we propose using

symmetrical measurement to reduce the searching area of roof, bottom and sides

of vehicles.

3.2.2. Locating the Vehicle Center - Symmetrical Measurement

By minimizing the searching area of image, the compulation time of finding the

interesting objects could be decreased. In our proposal, the vehicle is first located

by applying a symmetrical function to the image. To find a reliable measure of the

degree of symmetry, we begin with an elementary theory of one dimensional

function [23], namely, a function /(x) can be divided in two parts: a purely

symmetric function/^(x) and a purely asymmetric fundi on/as(jc). that is

/Y� ( \ r ( \ \ with - JAx)

Moreover, we have the following properties:

f w = /M-/(--). • 'as 2 Chapter 3. 3-phase Vision-based Vehicle Tracking Framework 26

Now, a measure of the degree of symmetry is introduced. It seems logical

to compare the contribution of the symmetrical part of the function with the entire

contribution of the function, as follows:

U'si4 (2� _一 y-r V~ r ( V r ( V ()

With this definition, has the following properties : S{f{x)}e[0,\] 5'{/(x)} = 1 if /(x)is purely symmetric ^{/(x)} = 0 if /(x)is purely antisymmetric

In order to make relationship between the symmetrical measurement and the

intensity value of pixels, we now introduce the reflectional correlation coefficient:

C{/(x)}.

./(X) /(-x) dx C{f{x)h j/W dx

^[fMlIM \fsi4- fasiA^

where f(x) is the intensity value of pixels in grayscale at lateral position x of

image. Once we have calculated C, we can calculate S, as S = .

As the camera and target vehicles move, the illumination variation may cause the

target vehicles to appear unlike a symmetrical object in grayscale intensity. The

centre of the taxi (The position of the dash line) and the other position (the

position of the thick line) in the background have high symmetrical measurement

in the image simultaneously in Figure 15. To reduce this kind of influence, Chapter 3. 3-phase Vision-based Vehicle Tracking Framework 27

gradient images are input for symmetrical measurement. In order to highlight the

measurements surrounding the vehicles, the symmetrical measurement is further

enhanced using horizontal edges only as input for vehicle detection. The result is

shown in Figure 16 in which the center of the vehicle is marked by a dotted line.

I

一 i

;經:l|vf 1 .驗

-

JHHHIHI ‘u I 十 ~ 3UU JjiJ •b,“. b'.li.i

Lateral position of image

Figure 15. Symmetrical measurement using gradient image (a) with horizontal and vertical edges and its corresponding symmetrical measurement. The centre of the taxi (The position of the dash line) and the other position (The position of the thick line) in the background have high symmetrical measurement in the image simultaneously Chapter 3. 3-phase Vision-based Vehicle Tracking Framework 28

I I I I

0.0

U . 4 W^^^EBjj^K^B^^^E^^^^^^S^^^^K^H^m 0.3 0.2

% 1 100 200 300 4(jo 500 ^ I Lateral position of image

Figure 16. Symmetrical measurement using gradient image with horizontal edges as input. The center of the vehicle is marked by the dotted line

3.2.3. Locating the Vehicle Roof and Bottom

The region of interesting vehicles is approximately located after the symmetrical

measurement. The vehicle roof and the vehicle bottom can be located in a

relatively small searching window instead of searching in the whole image which

is shown in Figure 17. We have set the searching window width to about 1/10 of

the image width. Chapter 3. 3-phase Vision-based Vehicle Tracking Framework 29

Hfeii •L

liiM

Original Inagei j ^^^ Symmetri^l line |

Searching

e^^HHH:

% lUU 2UU 3UU Jlju � aUU BLiU 7ULI I Lateral position olMniage

Figure 17. Searching window with the symmetrical line of the vehicle as the cenlerline

In the searching window, the vehicle roof is located as the first horizontal position

when searching from the top to the bottom with high horizontal edge response. On

the other hand, the vehicle bottom is located as the first horizontal position when

searching from the bottom to the top with low horizontal pixel intensity response.

The idea is illustrated in Figure 18. The roof and bottom features are simply

located since they are quite unique from the background within the searching

window. Chapter 3. 3-phase Vision-based Vehicle Tracking Framework 30

first horizontal edge: mmThe location of car moP ^BIIHb ••••••Searching window •

(a)

mm

, ‘ Tlie first dark horizontal pixel line: ^^1*^� = . I The location of car boHom

(b)

Figure 18. Locating the (a) car roof by horizontal edge response and (b) car bottom by horizontal pixel intensity response within the searching window

3.2.4. Locating the Vehicle Sides - Over-complete Haar Transform

The outcome of vehicle segmentation greatly affects the quality of template used

in vehicle tracking that will be mentioned in the next part. If the segmentation is

not done well, the vehicle template may include some background pixels. This Chapter 3. 3-phase Vision-based Vehicle Tracking Framework 31

makes the tracking is not done on the exact vehicle objects. Eventually, the

tracking of vehicles may be loss. Figure 19 shows the comparison of the good

vehicle template and bad vehicle template.

I • -- ! ^ 务: PI

The whole vehicle is segmented well in the image The vehicle template includes some background (Good vehicle template) (Bad vehicle template) Figure ] 9. The good vehicle template versus the bad template

In the previous parts, we have shown how the roof and the bottom of vehicles can

be found. The remaining task, which is much more difficult, is to locate the sides

of vehicles. One of the common techniques is finding the most obvious vertical

edges by searching from the center of the vehicle to the sides of the image. The

technique has three main weaknesses which make it difficult for practical usage.

First, it is hard to define the degree of obviousness of vertical edges. Also, the

sides of vehicles usually are not strictly vertical in two dimensional images. The

existence of vertical edges in the road background makes the sides of vehicles

difficult to be located.

The moving and variant-illumination conditions make the ordinary pixel

difference method unsuitable to represent the sides of vehicles clearly. In Figure

20, it should be easy to see that the pixel-based representation have a significant

amount of variability that may lead to difficulties in segmenting the vehicles from

the images. As a result, a compact representation of the vehicles in the images Chapter 3. 3-phase Vision-based Vehicle Tracking Framework 32

using Haar wavelet transform proposed by C. P. Papageorgiou [24] for vehicle

learning is used.

I —I —laBli 趣 IWMT ••I•國• Figure 20. An example of vehicle images on road environment

Haar wavelet was proposed by Alfred Haar [25] and nowadays, it is widely used

in image compression by the advantages of the simple image scaling and the

multi-orientation representation [26]. The SVM (Support Vector Machine) based

car image classifier proposed by C. P. Papageorgiou takes these advantages where

multi-scaling images transformed by three oriented wavelets — vertical, horizontal,

and diagonal are used to train the classifier as shown in Figure 21. To provide a

compact representation, Papageorgiou uses an over-complete dictionary of Haar

wavelets in which there is a large set of features that respond to local intensity

differences.

In our approach, we have not taken the advantage of multi-scaling property of

Haar wavelet transform. Based on the work done by Papageorgiou, the wavelet

transform is not computed for the whole image because of the complex

computation and only the over-complete vertical Haar wavelet transform is

applied to the estimated location of the vehicles' sides. There is no training

process in our methodology and our purpose is to locate the sides of vehicle from

the compact representation of vehicle images. We present an overview of the Chapter 3. 3-phase Vision-based Vehicle Tracking Framework 33

vertical Haar wavelet transform representation here. Details can be found in

[27-28].

BS

Original vehicle image

_ I 基 f I if m ft

(a) (b) (c) (d) (e) . (f)

Figure 21. Multi-scaling Haar transformed car image. Positions with stronger wavelet feature response are darker, those weaker response are lighter, (a)-(c) are vertical, horizontal, and diagonal wavelets at scale 32x32 pixels, (d)-(f) are the vertical, horizontal, and diagonal wavelets at scale 16x16 pixels

For a given pattern, the wavelet transform computes the responses of the wavelet

filters over the image. By the clue of the symmetrical line, roof and bottom of

vehicles, the location of the sides of vehicles can be estimated by the reasonable

height-to-width ratio of vehicles as shown in Figure 22

In our implementation, only vertical orientation, which gives high response to the

sides of vehicles, is computed. In the traditional wavelet tTansform, the wavelets

do not overlap; they are shifted by the size of the support of the wavelet inx and y.

The standard Haar transform framework is shown in Figure 23. The resized image

around the sides of the vehicle is divided into blocks and then passed to the Haar

wavelet for transformation. The normalized vertical coefficient table, in which the

higher coefficient value represents the existence of much vertical feature, shows

the normalized vertical response of the intensity around the sides of the vehicle.

The lateral position with the highest coefficient value is the location of vehicles' Chapter 3. 3-phase Vision-based Vehicle Tracking Framework 34

sides. In Figure 23, the coefficient data is too rough to figure out the location of

vehicles' sides.

i _ I ^^^^ - I Symmetrical line of the vehicle

BH \Jmm

^^mm-4^1 f…-耀 ri^^^�

Estimated location of / \ \ Estimated location of the left side / | \ the right side

mm 謝

Resized image to 16x16 in order / \ Resized image to 16x16 in order to average the illumination / \ to average the illumination

rr3 ^ -m

^HHH HH^

Figure 22. Steps of estimating the location of the vehicles, sides Chapter 3. 3-phase Vision-based Vehicle Tracking Framework 35

standard vertical Haar wavelet transfor� m

The wavelet do not overlap Haar wavelet 1 I 1

, ,—_ _I I ^ I X 1 + X -1 j Averaging the intensity “ * � i \ I 1 I / \ .'-'Averaging the intensity ^ \ \ .z 1 \ ••• Z 11——I \ Z I II I / I I I I \ Normalized vertical o-f oT 0 [T • • V X1 + x-1 j . coefficients_

gjg ^^ � �� �� @02)6 0.42

“ 0 33 0 05 0.26 iilnnM £ • k'^ _ : 1门 / '•*•��-• ��’• / m^mmmmm , , \ —J �…“ (• |xi +1 x-1 j /

-1

Figure 23. The standard Haar Transform Framework

To achieve better spatial resolution and a richer set of features, Oren and

Papageorgiou suggest shifting the transform by 1/4 of the size of the support of

each wavelet [29], yielding an over-complete (quadruple density) dictionary of

wavelet features. The framework of the over-complete Haar transform is shown in

Figure 24. The vertical coefficients are more detailed and compact. We propose

using the resulting high dimensional vertical feature table to locate the sides of

vehicles. By summarizing and averaging the coefficients of each column, the

vertical coefficient at each lateral position is shown in Figure 25. The lateral

position with the highest vertical coefficient is the location of the vehicle's side.

With the same size of vehicle sides image (16x16 pixels), the vertical coefficient

table computed by the over-complete Haar transform provides 4-time resolution

higher than that computed by the standard Haar transform along the lateral

direction. The high-resolution vertical coefficients of the over-complete Haar Chapter 3. 3-phase Vision-based Vehicle Tracking Framework 36

transform along the lateral direction make the vehicle sides being located

precisely.

Now, the vehicle is segmented from the image as the template for vehicle tracking.

The height-to-width ratio of vehicles computed is useful for template update in the

future. By the above techniques, the vehicle is detected in the image and the

segmented region, called vehicle template, is passed to the vehicle template

tracking phase.

Haar wavelet 1

,,-----• -4 I— -----• ( XI + X-1 I Averaging the mlensny .1 墨 • 0 1 \ • • /: ,••::::.•••••Avoragingth e intensity -1 /7 -攀攀 1 :

• • ,厂-rn 丨(厂门、

…+ -0 �-/ V Lj XI+U x-11

; : .1 / / :

: 1门 / / :

—一 — • I ~� ^^Tm ^ ••”• ~ 0 ~ 1 /•“-"•/ / {\ L^ i X1 + I—I x-1 /J. ‘•• 丨丨• • « ! • H/ ; / •• /V !/^ // Normalized vertica•• l coefficients y * i 0.22 0.04 0.00 0j04 0.30 0.39 0.39 0 52 0.43 0.43

^'3^0.43 0.26 0.04 0.04 ,6.09 0.13 0.26 017 0.03 0.39 0.03 � / 0.48 0 52 0.30 0.09 0.^ 0.35 0.17 0.13 0.26 0.30 0.30 0.09

0.43 0.35 0.13 0.22 (^43 0.48 0.09 0,17 0.30 0.43 0.17 0.04 Over-complete vertical /

Maar wavelet transform 0 22 0.13 0.09 0.1 y 0.30 0.30 0.00 0.22 0.22 0.39 0.00 0.09

“ 0.04 0.17 0.09 oie 0.22 0.04 0.04 0.00 0,04 0 00 0.22 0.04 rv^ ••• 0.09 0.30 0.13 /).13 0.04 0.26 0.09 0 30 0.36 0,48 0.30 0.09 I 1 0.00 0.26 0.Q6 0.09 0.04 0.22 0.48 0.65 0.65 074 0.30 0.09 The transform shifted by % of the /

size of the support of each wavelet 0.04 0.22 y.17 0.17 0.17 0.04 0.74 0.87 0.87 1.00 0.30 0.13

1 0.00 0.0a''0.09 0.13 0.22 0.30 0.87 0.96 0.83 0.83 0.26 0.17

0.09 0/3 0.13 0.17 0.30 0.39 0.70 0.70 0.57 0.52 0.17 0.17

0.09^.13 0.17 0.17 0.30 0.35 0.43 0.39 0.35 0.26 0.13 0.17

.0^0.17 0.17 0.17 0.22 0.22 0.22 0.17 0.09 0.09 0.09 0.13

Figure 24. The over-complete Haar Transform Framework Chapter 3. 3-phase Vision-based Vehicle Tracking Framework 37

I Vehicle's side found by I Vofod Coeffidon I over-complete Haar transform I 0.5「 I

: /:i\ :-^/v 7 I \ liiiHiiibii^diteii^ Mnif I \

0.1 J^^^^HHBl QQ£j _ ,!,JHPM_P‘||-.••.丨!_,.丨丨'11 I “ I

0 I I I I I 1 I I I 1 1 j

1 2 3 4 5 6 7 8 9 |lO 11 12

Lateral position in the njsizad image I

Figure 25. Vertical coefficient of the vehicles" side image

3.3. Vehicle Template Tracking - Image Alignment

3.3.5. Overview of Vehicle Template Tracking

Our ultimate goal is to track the vehicles in a continuous sequence of images with

distance information provided. After the vehicles are located, and segmented in

the image as described in the previous vehicle detection stage, the segmented area

is used as the template for vehicle tracking. In this section, we propose to use

image alignment as the technique of vehicle tracking.

Image alignment consists of moving, and possibly deforming, a template to

minimize the difference between the template and an image. Since the first use of

image alignment in the Lucas-Kanade optical flow algorithm [4], image alignment Chapter 3. 3-phase Vision-based Vehicle Tracking Framework 38

has become one of the most widely used techniques in computer vision. The

applications of image alignment include tracking [5, 30], parametric and layered

motion estimation, mosaic construction and medical image registration. The

unifying framework is proposed and various algorithms are described in the

unifying framework of Lucas-Kanade algorithm [31].

Theoretical representation Transformation example in vehicles

y

ff…夢 1 1 (a) Translation - /SSm^ �/r—,/ I 1

Mnwi

ol ol y

厂7 /?…, (b) Scaling ^ -

^ ol y y

厂q 厂]一巧 (c) Shearing / /

-o| M ''''

(d) Rotation / / ^^^ /

o| —o|

Figure 26. The four basic transformations in affine transformation: (a) Translation, (b) Scaling, (c) Shearing and (d) Rotation. The red polygons represent the original polygons while the blue polygons represent the transformed polygons. The sample transformations in vehicle tracking are shown in the second row. Chapter 3. 3-phase Vision-based Vehicle Tracking Framework 39

The moving and deforming process is essentially the affme transformation

consisting of translation, scaling, shearing, rotation or combinations of them.

Figure 26 shows how a polygon transforms in a 2D space. Let W(x; p) denote the

parameterized set of allowed warps, where p = (po, ... pn-i)^ is a vector of

parameters. The warp W(x; p) takes the pixel x in the coordinate frame of the

template T and maps it to the sub-pixel location W(x; p) in the coordinate frame

of the image I. If we are tracking an image patch, for example, the vehicle image

patch,we may consider employing the set of affme warps:

M" � � A)•义+ + (Po Pi PA fV(x;p)= = y (3) V

described by 6 parameters p = (po,pi, pi, P3, P4,Ps)^ as what has been done in [32].

These 6 parameters are enough to describe the 4 basic tTansformations and their

combinations in affme transformation of the image patch in 2D space. Image

alignment aims to compute the updated parameters for transformation in order to

minimize the differences among the templates (T) in consecutive images (I).

Before explaining the details of image alignment, we should first describe how we

apply image alignment in vehicle tracking. In a sequence of images, image

alignment is to minimize the difference between the template obtained in image n

and the images n+1, n+2, n+3 Using vehicle tracking as an example, the size

and location of the vehicles in the image are changing from image to image. Once

a vehicle template is detected in image n, the template needs to be moved or

deformed in order to minimize the difference between it and its correspondence in

the subsequent images. Figure 27 shows how image alignment is done in vehicle

tracking. The aligned position and the aligned image region are usually called Chapter 3. 3-phase Vision-based Vehicle Tracking Framework 40

warped position and warped image respectively.

蓋^^ I

^^灣 j^�

A vehicle is detected in image n (The template is marked with red rectangle) The template got in image n

I Warped image: Resulting aligned image region K computed by the current frame � A 欢arpec/ position: The newly position [3

Kfflil^ i “ . 5〈忍

Warped position computed W / / T ' ' T T A A in the previous frame

The template in image n (marked with red rectangle) is moved and deformed to match with the vehicle in image n+1 (marked with blue rectangle) Figure 27. Image warping in vehicle tracking

We will follow the unifying framework of image alignment proposed and

described in the Lucas-Kanade algorithm [31 ] and apply it for vehicle template

tracking. Chapter 3. 3-phase Vision-based Vehicle Tracking Framework 41

3.3.6. Goal of Image Alignment

The goal of the image alignment is to minimize the sum of square error between

two images, the template T and the image I warped back onto the coordinate

frame of the template and thus achieve the template tracking. The first version of

image alignment is the Lucas-Kanade algorithm where the error is represented by:

Y,[imx;p))-T{x)Y �

.V

where p are the affine transformation parameters.

Warping I back to compute /(W(x;p)) requires interpolating the image I at the

sub-pixel locations W(x; p). The minimization of the expression in the above

equation is performed with respect to p and the sum is performed over all of the

pixels X in the template image T{\). The Lucas-Kanade algorithm assumes that a

current estimate of p is known and then iteratively solves for increments to the

parameters Ap; i.e. the following expression is (approximately) minimized:

Y\I{W{x•p + ^p))-T{x)\ (5) v

with respect to Ap, and then the parameters are updated:

p�p+A p (6)

These two steps are iterated until the estimates of the parameters p converge or set

the target number of iterations.

Subsequently, the image tracking process can be described as:

1) Aligning the warped position and updating the parameters of the warped

position; performing the first order Taylor expression of the image alignment

equation (5) and minimize its value by the least-squares method. Chapter 3. 3-phase Vision-based Vehicle Tracking Framework 42

2) Minimizing the image alignment equation by iterating and updating the

warped parameters p.

The visualization of the image alignment tracking process is shown in Figure 28.

Updated warp parameters: p = (pi,p2, p^, P4, )丁

I I I Current warped image: l(W(x: p))"| | Template image: T

StaP^^ « » JL^jl^ Affine Transformation ^^sS&iU^^L ,T'T 1?. ‘ Ifi"',, ^gftirn I iVi ‘ gssgjBwWI^ ^^^^^^^ j Lucas-Kanade minimizing equation: 辟职起殺》—.厂码“

X

Figure 28. Visualization of the image alignment tracking process

3.3.7. Alternative Image Alignment 一 Compositional Image Alignment

The Lucas-Kanade algorithm approximately minimizes the expression (5) with

respect to Ap and then updates the estimates of the parameters by equation (6). In

fact, iterating these two steps is not the only way to minimize the expression of

Z j(W(x- p)) - T(x)f

^^ . The alternative to the Lucas-Kanade algorithm is the

compositional algorithm [33]. The compositional algorithm approximately

minimizes: Chapter 3. 3-phase Vision-based Vehicle Tracking Framework 43

with respect to Ap in each iteration and then updates the estimate of the warp as:

W(x; p)

i.e. the compositional approach iteratively solves for an incremental warp W(x;Ap)

rather than an additive update to the parameters A p. The compositional and

additive approaches are proved to be equivalent lo first order in Ap in the

unifying framework of Lucas-Kanade algorithm [31]. Throughout the vehicle

tracking we concentrate on the inverse compositional algorithm, an efficient

algorithm which works without any significant loss of efficiency [34].

3.3.8. Efficient Image Alignment - Inverse Compositional Algorithm

As mentioned by Baker and Matthews [31], one of the key points of efficiency is

switching the role of the image and the template, where an exchange of variables

is made to switch or invert the roles of the template and the image [5]. To

distinguish the previous algorithms from the new ones, we will refer to the

original algorithms as the forwards additive (i.e. Lucas-Kanade) and iht forwards

compositional algorithm. The corresponding algorithm after the inversion that we

used for vehicle tracking is called inverse compositional algorithm.

The proof of equivalence between the inverse compositional algorithm and the

original algorithm is in the unifying framework of Lucas-Kanade algorithm [31].

The result is that the inverse compositional algorithm minimizes:

^[TiWix-,Ap))-I{W{x;p))f (9) Chapter 3. 3-phase Vision-based Vehicle Tracking Framework 44

with respect to A p (note that the roles of I and T are reversed) and then updates

the warp:

W(x; p)

Performing a first order Taylor expansion on the expression of the inverse

compositional algorithm gives:

r OTT7- Z nw{xm+vT—Ap-i{w{x- p)) (11)

Assuming without loss of generality that W(x; 0) is the identity warp, the solution

to this least-squares problem is:

厂 I?" O 71/ = [/�(I;尸))—n^ ] (12)

卞L ^p�

where H is the Hessian matrix:

X L 办 J L ^p.

^ JTT and the Jacobian ——is evaluated at (x; 0). The steps involved for the image

alignment of one image using the inverse compositional algorithm are shown as

followings: Chapter 3. 3-phase Vision-based Vehicle Tracking Framework 45

Pre-compute: (1) Evaluate the gradient VI of the template T(x)

(2) Evaluate the Jacobian —at (x; 0)

(3) Compute the steepest descent images VT — dp (4) Compute the Hessian matrix H Iterate: (5) Warp/with W(x; p) to compute I(W(x; p)) (6) Compute the error image I(W(x; p)) - T(x) � -iT ” (7) Compute the steepest descent parameter V vr— ll(fF(x-p)) - T(x)

- -ir (8) Compute Ap by = VT— [/(ff(x;;?))- ^L 尔」

(9) Update the warp W(x; p)

until \\Ap\\ <£ OT after a constant number of iterations. Typically, it is set to 10.

(The corresponding steps are shown in Figure 29)

The efficiency of the inverse compositional algorithm comes from the

pre-computation of template gradient, Jacobian, steepest descent images and

Hessian matrix. As a result, all of the computationally demanding steps are

performed once to avoid performing them in each iteration. The main algorithm

simply consists of image warping (Step 5),image differencing (Step 6),image dot

product (Step 7), multiplication with the inverse of the Hessian (Step 8),and the

update to the warp (Step 9). The framework of the inverse compositional image

alignment algorithm for vehicle template tracking is shown in Figure 29. The

parameters are updated iteratively in order to align the tracking region to the new

warped position with minimized difference between the template and the current

image. Chapter 3. 3-phase Vision-based Vehicle Tracking Framework 46

In this section, we have shown that how vehicle tracking is achieved by image

alignment technique in an image sequence. By tracking the rear vehicles, with the

vehicle height, an advance warning to a driver can be given to a driver when the

relative distance computed by the newly proposed formula introduced in Chapter

4 is below a certain threshold.

Pre "Computed

二 jfr^iL 遍•聽 Hhm

Templale T(k) ^^^^^ ^I^BjjH^ Template gradienty Ttmpl«t9gr«d*»nl x J»cob»ao

• Steepest DMcem I巾ages

Warp PArAm»tAnI i Parameter Updates ^^^^^^^^ ^^^^^^^^ I ii^s •“ I ^a^^H m^sji 二 . ^p^—I

i^B^^BB SD Patamettti Updat»s

� ^ff^fi"?^^^ r . . • ^ Erroi irn»gs ‘“—~‘广 l(W<)r. p)) - T(x) Figure 29. The framework of the inverse compositional image alignment algorithm for vehicle template tracking. 3.4. Vehicle Template Update

3.4.1. Situation of Vehicle lost

Variation in vehicle pose or deformations and variation in illumination can cause

the tracking algorithm fail. They can lead lo the necessity of updating the template

in order to let the tracking process remain effective. Figure 30 shows the situation Chapter 3. 3-phase Vision-based Vehicle Tracking Framework 47

of tracking discontinued.

I Best frt tracking I Warped tracking image

_ H 舊{ 譯 % IP « “ "S » t » t » ^ \ L • “ ‘ \ L “ ‘ gtt^x^^

Template \ Template \ Template dJ! \ \ f^j)

• 、.力fc :-• 畜 \ • ‘ 凡 \ fc ^mm. ^ ; Applying the previous >u ^mmrnmi^]' Applying the previous • i^TfT*! l/l f flff*^ Applying the previous * UHU&iflBB ' warping (tracking) position BS&Ji^fiSBff, fe warping (tracking) position warping (tracking) position ^^Zg^^flfc 碎ffin \ to the current HUl^HBii^y •�MBBMH ^ to the image j |

Warped Image Warped ln«ge 广…; � … Tracking lost

Figure 30. The situation of vehicle tracking lost

Recalling the goal of image alignment, we are going to minimize the difference

between the template and the warped image by the inverse compositional

algorithm:

^[TiW{x;Ap))-IiW{x;p))Y X

As the illumination and the size of vehicles change from image to image, the

template and the warped image become unlike and make the error between them

gradually increases. As a result, it is difficult to align the template to the vehicle in

the current image and the error will accumulate along the tracking and eventually

the warped image will totally lose the track of vehicles. The error is usually

measured as the root mean square error between the template and the warped

image. Once the error exceeds a threshold value, the template should be updated. Chapter 3. 3-phase Vision-based Vehicle Tracking Framework 48

3.4.2. Template Fitting by Updating the positions of Vehicle Features

Updating the template for tracking is simple by "Making the new template fitting

the vehicle in the current image”. The procedure is similar to vehicle detection but

the searching is limited lo the image area around the currently warped image. The

update is composed of updating the new positions of vehicle symmetrical line,

roof and bottom. The update includes:

1) Symmetrical line: Locating the new symmetrical line with the highest

symmetrical response within the current warped image.

2) Car roof: Locating the new horizontal position of the car roof by locating the

longest horizontal edge within the searching window.

3) Car bottom: Locating the new horizontal position of the car bottom by locating

the darkest horizontal pixel line within the searching window.

The frequency of updating the tracking template will greatly affect the

performance of the vehicle tracking. If the updating frequency is too high (error

threshold too low), the updating process will waste the computational resource. If

the updating frequency is too low (error threshold too high), the update process

will not be completed well since the vehicle is out of the searching windows of the

car roof and car bottom. The warped position will tend to the background image

as shown in Figure 30.

The framework of the template update is shown in Figure 31. In order to lower the

computational loading, the new car roof and car bottom are located within a small

searching window, instead of the whole image using previous knowledge. Since

the height-to-width ratio is constant, without Haar wavelet transform, the sides of Chapter 3. 3-phase Vision-based Vehicle Tracking Framework 49

the vehicles can be located based on the updated positions of the symmetrical line,

car roof, car bottom and the ratio found in the vehicle detection part.

Tracking lost: The warped image is shifting lo the background image instead of fitting the vehicle

FcuiT^t warped image | ^

•••• …/ -

It^y I Synimetriualline update | „ IH^ I Car Roof Update | .. '~Car BotSm Update J ...龙

^^XJS ^^u^irJS ^^Vs^rdtiity window ^•HB^BBIBUB^ —— i • : • d-^v \ \ .zz \ • ... , 、\ The height of the vehicle is updated j ratio to^d ] 、 J in the part of vehicle detection | \ ^ i .一-•--,一•••••一

、 The width and sides of the vehicle is updated / B updated warped image | ^

j^M I,

ShiiisM Ihhi^^A^ updated template

Figure 31. The framework of Vehicle Template Update

3.5. Experiments and Discussions

3.5.1. Experiment Setup

We have conducted an experiment to analyze the reliability of the proposed

3-phase vehicle tracking methodology. There are 4 grayscale image sequences of

the road traffic which is taken by the camera mounted on our car (testing vehicle). Chapter 3. 3-phase Vision-based Vehicle Tracking Framework 50

All of the sequences are taken in the daytime with duration from 4 seconds to 68

seconds. The camera setting is shown in Table 2.

Camera model: Logitech webcam 4000 Frame rate: 15 frames / second Frame size: 320 x 240 pixels Table 2. Setting of camera mounted on the testing vehicle

3.5.2. Successful Tracking Percentage

By observing the number of frame that the tracking is discontinued, the reliability

of our methodology can be analyzed. Table 3 shows the information of the 4

grayscale image sequences and their successful tracking percentage. The

successful tracking percentage is defined as [35]:

Successful tracking percentage = Total number of frame - Number of tracking lost frame Total number of frame

where "Number of tracking lost frame" is the number of frames where ground

truth contains at least one vehicle, while there is at least one tracked vehicle not

falling within the bounding box of any ground truth object.

The result shows that the proposed 3-phase methodology is able to track the

vehicles over 98% of frames. Failures occur when vehicles are running on

winding roads and when the appearance of the target vehicles deforms frequently

due to their unstable movement.

Besides, there is wrong vehicle detection in Sequence A. One of the background is

wrongly detected as a vehicle because its upper and lower boundary are wrongly

regarded as the roof and bottom of vehicle respectively as shown in Figure 32. Chapter 3. 3-phase Vision-based Vehicle Tracking Framework 51

Along the image sequences captured by the camera facing behind of our car,

horizontal features of background objects always move from the upper edge to the

center of images. It is different from the moving behavior of vehicle roof. To

avoid background objects wrongly detected as vehicles, we propose the horizontal

features moving in this direction in the images should be first filtered out. To

further lower the false positive rate in vehicle detection, Sun, Bebis and Miller [2]

suggest adding the second step, Hypothesis Verification (HV), after the step of

Hypothesis Generation (HG) in order to verify the correctness of a hypothesis.

The representative methods of Hypothesis Verification include Principal

Component Analysis (PCA), Local Orientation Coding (LOC) and Neural

Network (NN) based classifier. Adding the verification step can also filter out the

background objects with much horizontal features while the uniqueness of the

moving direction of horizontal features in background loses when the camera

faces ahead.

Sequence A SequenceB SequenceC SequenceD

I^M mM ^^taS^ ^^^fe� ’ ^HHHB r •liliiiMlr^W''.-'' ••--• ••••HKBHIH t � .,

Sequence A SequenceB SequenceC SequenceD

Road type City streets City streets Highway Highway Duration 49 seconds 4 seconds 68 seconds 7 seconds Total number of frame (NF) 731 frames 60 frames 1001 frames 101 frames Number of tracking lost frame (LF) 9 frames 0 frames 13 frames 0 frame

Successful tracking percentage 98..8% 100.0% 98.7% 100.0% =NF-LF NF

Number of wrong vehicle detection 10 0 0 Table 3. Information and tracking percentage of road image sequences Chapter 3. 3-phase Vision-based Vehicle Tracking Framework 52

In this chapter, we have achieved multiple rear vehicles tracking using a single

camera in outdoor scenes with high successful tracking percentage. However, the

original image alignment tracking methodology introduced in this section is

computational intensive enough so that it is not possible to be implemented in a

real time embedded system based on a processor used for mobile applications. To

realize the real-time tracking, we have simplified the above computation by

minimizing the number of image alignment parameters which will be explained in

Chapter 6.

^^^W^^^^^HMIii^lBK^IM^fctfhJ^^^Bi^ Wrongly detected vehidT"]

Figure 32. An example of wrongly detected vehicle

3.6. Comparing with other tracking methodologies

3.6.1. 1-phase Vision-based Vehicle Tracking

As mention at the beginning of this chapter, our methodology is based on the

3-phase structure: vehicle detection, vehicle template tracking and vehicle

template update. However, most of previous works mentioned like the projects

mentioned in Chapter 2 is simply detecting vehicles in each frame, or based on the

a 1 -phase vehicle detection structure. These techniques treat an image region

simply as moving “stuff’ and hence cannot distinguish between changes in

viewpoint or configuration of the vehicles and changes in position relative to the

camera. Chapter 3. 3-phase Vision-based Vehicle Tracking Framework 53

111 order to exhibit the cooperation power and essentiality of the 3 phases, we have

conducted an experiment to observe the performance of the 1 -phase structure. It

means the last 2 phases: vehicle template tracking and vehicle template update are

disabled and the vehicles are detected using the techniques in the first phase from

frame to frame individually. Table 4 shows the successful tracking percentage of

the 1 -phase structure comparing with the 3-phase image alignment-based

methodology. The result shows that the 1 -phase structure can only track the

vehicles in about 70% of frames. Failure occurs when there is sudden illumination

and contrast change of the vehicle region or the targeted vehicles moving far away

from the camera when the vehicle roof, bottom and appearance become blurred.

Sequence A Sequence B SequenceC SequenceD

l^^^jM mM .“�

Sequence A Sequence B SequenceC SequenceD

3-phase 1-phase 3-phase 1-phase 3-phase 1-phase 3-phase 1-phase

Road type City streets City streets Highway Highway

Duration 49 seconds 4 seconds 68 seconds 7 seconds

Total number of 731 frames 60 frames 1001 frames 101 frames frame (NF)

Number of tracking 9 188 0 18 13 227 0 2

lost frame (LF)

Successful tracking ‘ NF TF 98..8% 74.3% 100.0% 70.0% 98.7% 77.3% 100.0% 98.0% percentage: i� -r ir NF Table 4. The successful tracking percentage of 3-phase structure versus 1 -phase structure

This experiment shows the robustness of the 3-phase structure in vehicle tracking.

Our methodology greatly increases the successful tracking percentage because of Chapter 3. 3-phase Vision-based Vehicle Tracking Framework 54

the proper division of functionality of 3 phases. The tracking will be ignited once

the vehicle is delected by phase 1 (Vehicle Detection) along the image sequence

and let the phase 2 (Vehicle Template Tracking) keep tracking. Since the image

alignment is robust enough, the tracking would be continuing once the vehicles

are delected at any one of the frames. While there is sudden change in

illuminalion or contrast making large difference between vehicle template and the

current image, the 3-phase structure would not be interrupted. Instead, it will let

phase 3 (Vehicle Template Update) to update the vehicle template for best fitting.

3.6.2. Image Correlation

The correlation-based algorithm is extensively adopted in real tracking systems

because of its high recognition ability which enables it to track objects reliably in

complicated environment with low S/N [36].

Generally, tracking methods based on image correlation work in such a way: an

object template is pre-stored as the basis of recognition and position, then it is

matched with each sub-image of real-time images until we find out the one that

best matches the template. Normalized correlation function is one of the common

matching criterions [36]:

m ./•) = , M 广’ M N J(z2:k,>,")f)(iz[nm,")]2) I ni=l n=l m=l n=l

where T denotes the template image (whose size is M x N) and S denotes the

sub-image of scene images. The best matches can be found as global maximums

ofR(i,j). Chapter 3. 3-phase Vision-based Vehicle Tracking Framework 55

It's quite difficult to implement the correlation-based tracking in real-time because

of its large data and computational cost for correlation matching, which degrades

the practicability greatly. The computational cost is mainly dependent on the

search strategy that is adopted to find the optimal matching position.

Unfortunately most of current correlation-based algorithms employ some traversal

strategy, which makes il difficult to reduce the computation substantially.

Regardless of the computational complexity, the tracking robustness and

adaptability of correlation looks doubtful because the matching between the

template and the current image is in fixed scale. It means the size of the template

keeps constant throughout the tracking process, unlike other tracking

methodologies such as image alignment where the template is warped. This

greatly restricted the usability of correlation-based tracking in applications with

moving camera and target objects since the size of target objects is varied along

the image sequence. From the view of affme transformation, correlation-based

tracking can only provide basic transformation in translation but not in scaling,

shearing and rotation.

For the applications like vehicle tracking, where target vehicles are tracked by the

camera mounted on driver's car, correlation-based tracking seems not to be a good

solution. There are previous works that use adaptive correlation tracking of targets

with changing scale [37] where templates at different scales are provided for

matching. The adaptive matching selects the best matched template so that the

template scale is varied along the image sequence. We have implemented this

method in OpenCV [38] for the vehicle tracking. Table 5 shows the experiment

information of the adaptive correlation vehicle tracking in OpenCV. Chapter 3. 3-phase Vision-based Vehicle Tracking Framework 56

OpenCV matching function cvMatchTemplatc Parameters CV TM CCORR NORMED (Normalized correlation) Image sequence Sequence A in Table 3 (Totally 731 frames) W/

Template image i 他‘ , gSBBiikSSSSS' mmmmf Number of template in difference size 5 Table 5: Experiment information of the adaptive correlation vehicle tracking

The tracking result of the adaptive correlation-based matching is shown in Figure

33. The tracking result at the beginning of the sequence is acceptable but becomes

unstable al Frame 120. The tracking even lost after 4 frames at Frame 124. The

poor performance demonstrates that this approach is not suitable for vehicle

tracking.

There are 2 reasons for the poor performance of adaptive correlation. Comparing

with image alignment, the number of template in different size is finite in adaptive

correlation approach. For example, there are 5 templates in different size in our

experiment. One may argue that we can increase the number of template to

improve the adaptability. However, more templates means more time is consumed

in matching since different templates are matched with the sub-image one by one.

Usually, only a few of templates is used.

The second reason is the broad ranges of vehicle illumination and contrast varies

the difference between the template and the images as the target vehicle moving

along the highway or city streets. Figure 34 shows the vehicle template and the

sub-image just before the tracking lost occurred at Frame 123. The visual

difference is quite large between these 2 images which cause the global maximum Chapter 3. 3-phase Vision-based Vehicle Tracking Framework 57

of normalized correlation function not falling into the vehicle location.

Ik ht jM

Frame 20 Frame 40

^ibSfiS ^^ mSmSSf )'’

Frame 60 Frame 80

mmmKKBm SSHH Frame 100 Frame 120 “ WEHH

^M ifc 邏 ^SSKSKSS^M li^HBHUS Frame 121 Frame 122

Frame 123 Frame 124 Figure 33: Vehicle tracking using adaptive correlation-based matching Chapter 3. 3-phase Vision-based Vehicle Tracking Framework 58

霍I“一一 ^^ ^f iflf f “ i 1

Vehicle template Sub-image at Frame 123 Figure 34: Visual difference between the vehicle template and the sub-image

3.6.3. Continuously Adaptive Mean Shift

The Continuously Adaptive Mean Shift (CAMShift) is proposed by Bradski [39]

originally used for face tracking. It is a modification of the Meanshift algorithm

which is a robust statistical method of finding the mode (lop) of a probability

distribution. Typically the probability distribution is derived from color via a 1-D

Hue histogram sampled from the object in an HSV color space version of the

image. Details of CAMShift are given in [39].

There is an implementation of CAMShift in OpenCV. We follow the work done by

T. Chateau [40] who has conducted an experiment of using this CAMShift

implementation for vehicle tracking. Since CAMShift expects tracking colored

objects,colored version of image Sequence A in Table 3 is used for experiment.

The initial vehicle tracking region of interest is selected by hand. Figure 35 shows

the tracking result. Chapter 3. 3-phase Vision-based Vehicle Tracking Framework 59

Frame 1 Frame 2

Frame 3 Frame 4

Frame 5 Frame 6

Figure 35. Vehicle tracking using CAMShift

Comparing with the correlation-based tracking algorithm, CAMShift tracking is

capable of the translation, scaling, shearing and rotation of target objects.

However, the tracking performance of CAMShift is even worse than that of

adaptive correlation. The tracking begins to lose at the first frame of tracking. It is

because CAMShift is originally designed for face and hand tracking, or other

target objects with large hue difference from the background. In the

implementation in OpenCV, a single channel (hue) is considered in the color

model. This heuristic is based on the assumption that flesh color has the same

value of hue. Chapter 3. 3-phase Vision-based Vehicle Tracking Framework 60

Difficulty may arise when one wishes to use CAMShift to track objects where the

assumption of single hue cannot be made such as vehicles. In particular, the

algorithm may fail to track multi-hued objects or objects where hue alone cannot

allow the object to be distinguished from the background and other objects.

Besides, the computational complexity of the implementation of the CAMShift

algorithm does not make it suitable for real-time processing. Because the number

of iterations necessary for convergence is not known beforehand, the average time

needed for processing is quite long.

In this section, we have applied 3 different methodologies for vehicle tracking.

The results show that il is difficult for these methodologies to compete with the

proposed 3-phase structure because of the low adaptability of the variation of the

illumination, contrast and size of vehicles. By the cooperation of the 3 phases, the

vehicles can be tracked continuously in outdoor scenes with high reliability. Chapter 4. Camera-to-vehicle Distance Measurement by Single Camera 61

4. Camera-to-vehicle Distance Measurement by Single

Camera

4.1. The Principle of Law of Perspective

The continuous distance information is provided when tracking the vehicles as

well as the changes of heights in the images. In this chapter, we explain how the

relative distance from the target vehicle to the camera can be found using the

vehicle height by a simple formula.

..A B C

'、約 ^

IQ l� j � ‘ qt^^^———^

‘* •

Z2

Figure 36. Schematic diagram of the imaging geometry of inter-vehicles

The newly proposed formula is based on the formula proposed by Mobileye® [6]

using the law of perspective with the height of vehicles. The diagram in Figure 36

shows a schematic pinhole camera comprised of a pinhole (P) and an imaging

plane (I) placed at a focal distance (f) from the pinhole. The camera is mounted on

vehicle (A) at a height (H). The rear of vehicle (B) is at a distance (Zl) from the

camera. The point of contact between the vehicle and the road projects onto the

image plane at a position (yl). Chapter 4. Camera-to-vehicle Distance Measurement by Single Camera 62

The Mobileye® formula is derived from the similarity of triangles:

f: the focal length of camera in pixel jM H: the height of the camera from the ground Z = y: the height of vehicles in pixel (13) y Z: the relative distance from the target vehicle to the camera

4.2. Distance Measurement by Single Camera

The Mobileye® formula has the assumption that the top of the vehicle is at the

same level of the camera. However, different types of vehicles make this

assumption invalid.

B C

yi,^^jK I ,,H^ H H ^o b ~n ——b-J

Figure 37. Schematic diagram of the imaging geometry of vehicles with different height

In order to relax this assumption, we first introduce the concept of horizontal line

in the image. If the position of the horizontal line in the image is known, the

similar triangle relation is still maintained with H corresponding to Z1 and yl

corresponding to (H+K) in Figure 37. The value of K is the part above the

horizontal line of the vehicle height. The new similar triangle relation becomes:

- (14) Chapter 4. Camera-to-vehicle Distance Measurement by Single Camera 63

The proposed formula is based on the fact that the ratio of vehicle body above the

horizontal line to the vehicle body below the horizontal line remains invariant in

the 2D image and the 3D world:

—” \ ^^Wfej .JBP

Substitute K into equation ( {‘一…''^P^jP^^^ I ITorizont^lininili^ - , j^^lTtiE

见丨 H N Iter" 1; yii: ihe number ol pixel from the horizontal 少 � ."乂 line to the bottom of the vehiclc

Figure 38. Newly proposed formula for computing the camera-to-vehicle distance

Distance computed by Distance computed by

the newly proposed formula Mobileye® formula ll^ ^In

Frame jgjfjjl^^

700 ppiMBijp HRP^PJI

^ Figure 39. The distance computed by Mobileye® formula and the newly proposed formula Chapter 4. Camera-to-vehicle Distance Measurement by Single Camera 64

r^Dis.q I a 13.5 d^ ^ P^^P Sl^

Frame 50 Frame 100 Frame 150 m^mi^HH DSM.'J DDIE J^M ^

T^iSIRmHHI jbh Frame 200 Frame 250 Frame 300

DDE2 iiii5iu^jj aTi 00^3 T ~|T ^

Frame 350 Frame 400 Frame 450

Frame 500 Frame 550 Frame 600

‘ m3 —-•^H ^ QDE.^

gHHnJI Ufagjij 邏 pHpSi^^^SI

Frame 650 Frame 700 Frame 750

Figure 40. Result of vehicle tracking images with distance information (The number: The

relative distance from the target to the camera in meter; the green rectangle: The tracking

region of vehicles) Chapter 4. Camera-to-vehicle Distance Measurement by Single Camera 65

The comparison between the camera-to-vehicle distance computed by the formula

proposed by Mobil eye® and the new formula proposed is shown in Figure 39.

The camera-to-vehicle distance computed by the Mobil eye® formula is shorter

than the actual distance, leading to possibility of unnecessary warning

The vehicle tracking result with camera-to-vehicle distance computed by the

newly proposed formula of Sequence A is shown in Figure 40 where there are

multiple cars following and overtaking our car. The distances measured vary as

the cars transverse at the rear end. The precise distance measurement is based on

the accurate tracking given by the 3-phase structure. In the coming chapters, we

will show how the real-time multiple vehicles tracking can be realized on

embedded systems. Chapter 5. Real Time Vehicle Detection 66

5. Real Time Vehicle Detection

5.1. Introduction

In the vehicle detection phase, symmetrical measurement is first applied to reduce

the searching area of the region of interest. Actually, symmetrical measurement is

quite computational intensive and the computation of gradient image is time

consuming. Vehicle detection should be further optimized so that more

computational resources could be reserved for vehicle tracking and other

advanced intelligent vehicle features. The optimization techniques adopted is in

the algorithm and instruction level. The symmetrical measurement is optimized by

the techniques of diminishing gradient image input and replacing division

operations. Besides,the over-complete Haar Transform is optimized by

pre-computalion. These techniques will be explained in this chapter.

5.2. Timing Analysis of Vehicle Detection

We have conducted an experiment to analyze the time complexity of different

steps in vehicle detection. The Intel ECX platform is used for evaluation. The

Image Sequence A used in Chapter 3 is used as the input data in this experiment.

There are 731 frames in this sequence and Phase 1 tries to detect as many vehicles

as possible in the newly frame. Table 6 shows the time consumption of individual

steps in average of 731 frames.

From Table 6,symmetrical measurement occupies nearly half of the timing in

vehicle detection. The complexity comes from the division operations when Chapter 5. Real Time Vehicle Detection 67

computing the symmetrical value. The optimization strategy is to minimize the

number of division and replace it by multiplication. Besides, the symmetrical

measurement is linearly proportional to the image width. We try to decrease the

dimension of input image and symmetrical measurement with little lateral position

error of vehicle center as trade-off. The step of horizontal gradient filtering also

benefits from the diminished image input.

The implementations of locating vehicle roof and bottom follow the original

approach. Therefore, we believe the degree of improvement is limited. The

over-complete Haar transform for locating vehicles' sides is another complex step.

Its optimization is also necessary.

Time consumed (millisecond) Horizontal gradient filtering 5.417 Symmetrical measurement 16.366 Locating vehicle roof and bottom 10.217 Over-complete Haar transform 12.813 Total time consumed in vehicle detection 44.813 Table 6. Timing information of the vehicle detection

5.3. Symmetrical Measurement Optimization

5.3.1. Diminished Gradient Image for Symmetrical Measurement

Gradient images are input for symmetrical measurement to reduce the influence of

illumination variation in Phase 1,Vehicle Detection. In order to highlight the

measurements surrounding the vehicles, the symmetrical measurement is further

enhanced using horizontal edges only as input for vehicle detection. We have

shown the effectiveness of using horizontal gradient image as input in Chapter 3. Chapter 5. Real Time Vehicle Detection 68

There are many choices of gradieni filler like Sobel, Prewilt and Canny [41]. We

have chosen to use Prewitt filler because of its simplicity and only consists of

addition and subtraction:

o/Y \ 111 Gradieni on horizontal: •’ = 0 0 0 00 /(x, dy , 111 On the other hand, Sobel and Canny filters consists division and multiplication

which are relatively expensive operations on embedded systems. If large-size

images are input for gradient filtering, the processing time is beyond computation.

Ill order to further reduce the complexity of gradient filtering and symmetrical

measurement, we have diminished the input image to — of its original size. We n

have set n equal to 2 because of the ease of image resizing. For example, images

ill 320x240 dimensions are captured by camera in our camera setting and they are

resized to 160x120 in dimensions. The gradieni filtering and symmetrical

measurement are benefited by this scheme with 3/4 of operations reduced. The

steps involved in diminished gradient image for symmetrical measurement are

shown in Figure 41.

广‘..她 Gradieni filtering Symmetrical measurement Adjj^ting the lateral Captunngwpdth^ghl • • (Number of operations • (Number of operations • PO 咖 nof^^aximum images by camera width/n x helght/n reduced by 1 ) reduced bv 1 ) symmetrical value by '-•jjT 1-p- ‘ multiplying a factor of n

^ Locating vehicle roof, bottom and sides

Figure 41. The steps involved in diminished gradient image for symmetrical measurement Chapter 5. Real Time Vehicle Detection 69

Original-size images are used for locating vehicle roof, bottom and vehicles in

order to keep the feature details. Finally, the peak found in symmetrical

measurement needs to be adjusted by multiplying a factor of n to the lateral

position of the original-size images.

There is trade-off between the symmetrical measurement accuracy and timing

improvement. It is because the resolution of symmetrical measurement is reduced

by a factor of n when the number of operations in gradient filtering and

symmetrical measurement is only — of the original one. Figure 42 shows the

comparison of symmetrical measurement with original image input and

diminished image input,

—;

I Uleral pwition of image I (a) Original image (320x240) for symmetrical measurement. Chapter 5. Real Time Vehicle Detection 70

• ^^^^^ I

line I

h^^^HHHH^H

I^^HHHI

jlSHHHH^I^IHI U W UU III Ul UJ lju I Latrral pMition of imagr I I

(b) Diminished image (160x120) for symmetrical measurement

Figure 42. The comparison of symmetrical measurement with original image input and diminished image input with n = 2. The symmetrical peak of (a) original image is found at lateral position 199 while the symmetrical peak of (b) diminished image is found at lateral position 99 (Equal to 198 after adjusting)

111 Figure 42,we can observe that the symmetrical measurement of diminished

image is not as detailed as original image. From this example, the symmetrical

peak found using diminished image has 1 pixel difference in lateral position from

that using original image because of the low-resolution symmetrical measurement.

In most cases, the lateral position difference is only 1 to 2 pixels because the

horizontal edges of vehicles still dominate the symmetrical response in different

image sizes. Even though there is lateral position error caused by diminished

images, Phase 3 (Vehicle Template Update) will correct the error and the tracking Chapter 5. Real Time Vehicle Detection 71

region will be fitting the vehicle as shown in Figure 43. ^M m.

Frame 120 (Initial vehicle detection) Frame 140 (Vehicle template update) Figure 43. Vehicle Template Update correcting the initial lateral position error caused by vehicle detection

5.3.2. Replacing Division by Multiplication Operations

Division is an expensive operation. It is desirable to avoid it where possible.

Sometimes expressions can be rewritten so that a division becomes a

multiplication [42]. For example, (x / y) > z can be rewritten as x > (z * y) if it is

known that y is positive and y * z fits in an integer.

Algorithm 1 shows the pseudo code of the symmetrical measuremenl

implementation. The goal of the symmetrical measuremenl is to locate the

position of the peak. There is a conditional statement at line 14 to determine the

maximum symmetrical measurement. Before that, the symmetrical value is

calculated by a division operation at line 13 where numerator and denominator are

both positive integer. This statement occupies half of the CPU liming in

SymmetricalMeasurement function. Chapter 5. Real Time Vehicle Detection 72

SyninietricalMeasurement('yGradientlmage) 1: max_sym = 0; 2: for i — scale to Image Width - scale 1 do 3: numerator = 0 4: denominator = 0 5: for j — 0 to ImageHeight — 1 do 6: for k — i - scale to i 7: numerator = numerator 十 yGradientlniage(j’k)*yGradientlmage(j’k+i*2-k*2) 8: end for 9: for k — i — scale lo i 十 scale 10: denominator = denominator + yGradientlmage(j,k)* yGradientlmage(j,k) ]]: end for 12: end for 13: S = numerator / ^denominator 14: If S > max_sym 15: max_syni = S 16: max_sym_j>os = i 17: end if 18: end for

Note 1: yGradientlmage is the gradient image with horizontal edges. All values are in integer. fimnerator + ^ Note 2: Since S is proportional to C, S is equal to mmeralor instead of denommator . denominator 2 Algorithm 1. Original implementation of symmetrical measurement

In order to replace expensive division operation by multiplication,the

SymmetricalMeasurment is re-implemented without the use of division as shown

in Algorithm 2. The new implementation still works because:

1)IfS>max—sym,then 画忆偷 > 魏———隱酬tor denominator max_ sym _ denominator

2) If numerator � max_ sym — numerator denominator max_ sym — denominator

then (numerator*max_sym_denominator) > (denominator*max一sym—numerator)

provided that denominator, numerator, maxsymdenominator and

max_sym_numerator are all positive integers.

As a result, we use the multiplication operation to replace the division operation.

The timing improvement will be shown in Section 5 of this chapter. Chapter 5. Real Time Vehicle Detection 73

SymmetricalMeasurement_NoDiv(yGradientImage) ]: max_sym_numeiator = 1 2: max_sym_denominator = 1000000 3: for i scale to ImageWidth 一 scale — 1 do 4: numerator = 0 5: denominator = 0 6: for j — 0 to ImageHeight - 1 do 7: for k — i - scale to i 8: numerator = numerator + yGradientImage(j ,k)*yGradientlmage(j ,k+i * 2-k* 2) 9: end for 10: for k — i - scale to i + scale 11 : denominator = denominator + yGradientlmage(j,k)* yGradientImage(j,k) 12: end for 13: end for 14: If (numerator*max_sym_denominator) > (denominator*max_sym_numerator) 15: max_sym_denominator = denominator 16: max_sym_numerator = numerator 17: max_syni_j)os = i 18: end if 19: end for Algorithm 2. Implementation of symmetrical measurement without division

5.4. Over-complete Haar Transform Optimization

5.4.1. Characteristics of Over-complete Haar Transform

To achieve better spatial resolution and a richer set of features, we have applied

the over-complete Haar transform for locating vehicles' sides as mentioned in

Chapter 3. Besides symmetrical measurement, over-complete Haar transform is

also complex and time consuming because it involves many pixel manipulation

operations like averaging pixel intensity of a block of pixels.

Figure 44 shows the wavelet computation part of the over-complete Haar

transform. Before being input into the Haar wavelet, the intensity inside each

block of pixel, called Haar block, is averaged. On the left of Figure 44, each Haar

block comes from different regions of the image where overlapping occurs

between block and block. The overlapping is the major source of timing wastage. Chapter 5. Real Time Vehicle Detection 74

Haar wavelet 1 I

0 , •••••• ~I I~ ——• I X1 + XI) Avorasjlntj Iho intenstty Jl* • 0 1 \ | | I | /• Averaging the IntorHlty ...... ---

本斗4一S-•…• -0 n-…”•( xi+Qx-i) 令: J —— .一 ^ • — • • • . I 一 一 • • •

tT'^tf::::....^ i|

lm,e ••…_ � ..…f x 1 + 门 x -1 ) around vehicle side _ o i \ I i I /,

-1

Figure 44. Wavelet computation in over-complete Haar Transform

5.4.2. Pre-computation of Haar block

The over-complete Haar transform is an extension of the standard Haar transform

where the wavelet is shifting 1/4 of the size resulting in the quadruple density.

Figure 45 focuses on the overlapping relation between Haar blocks. Each Haar

block is in size of 32x32 pixels. In the standard Haar transform, the 1024 pixels

inside each block are summed up and then compute the averaging intensity of the

Haar block. The same step takes place in other Haar block.

The idea of pre-computation of Haar block comes from the overlapping between

Haar block. Each Haar block has 3/4 of area overlapping with its vertical or

horizontal neighborhood and 9/16 area overlapping with its diagonal

neighborhood. Since all of the Haar blocks are going to compute its averaging

intensity, parts of the averaging intensity of sub-Haar block can be shared with

other overlapping Haar block. For example in Figure 46,the yellow area is the Chapter 5. Real Time Vehicle Detection 75

overlapping region of the two Haar blocks. The averaging intensity of this yellow

area can be pre-computed and shared by the two Haar blocks instead of duplicate

computation of this area when two blocks computing their own averaging

intensity separately.

I 32 pixels I

jrpicr I 32 pixels 1J I 32 pixels"] H 1 +

^"f^pixe^ ; • •

llllllllll^ m Hii^fl

Figure 45. Overlapping relation between Haar blocks

I 32 pixels I M •

I 32 pixels ,r I 32 pixels I H— It

f _ -- • I 32 pixels I

Figure 46. Average intensity of overlapping region shared by two blocks Chapter 5. Real Time Vehicle Detection 76

To general the case, we should further divide the overlapping area into smaller

region to maximize the reusability of pre-compuled averaging intensity. Because

of the 1/4 shifting of wavelet, the overlapping area between blocks are based on

the combination of 8x8 blocks, called sub-Haar block. Therefore, we can divide

the image into many sub-Haar blocks as shown in Figure 47. All of the average

intensities of sub-Haar blocks are pre-computed that can be reused by different

Haar blocks.

B: Average intensity of Haar block sB: Average intensity of sub-Haar block

M ^ ^ ^ B(blue) = ;^sB(i) + 2>B(i) + XsB(i) + £sB(i) i=53 i 二 69 4 36 « r B(red) = ZsB(i) + + ^sB® + £sB(i) / i=33 i=49 / M 128 pixels • W / • I I I个I'l中卜卜M l啊州 I / Tl ,中并酔:件牛円

mmimm 讕-.m

(D | - I,�|- :jlVijlv^ivll^fji&iili^^yjv.v I / 访4林帐闲二闲::::::祠 ln^mn !^

B(green) = ^sB^ ++ f^sB^ + |;sB(i)

55 71 87 103 B(yellow) = XsB(i) + ^sB(i) + ZsB(i) + ^sB(i) i=52 i-68 i=84 i-KH)

Figure 47. Division and reuse of sub-Haar blocks. 4 sample Haar blocks are given as examples which are found by the reuse of sub-Haar block Chapter 5. Real Time Vehicle Detection 77

5.5. Summary

After implementing all of the optimization techniques mentioned above, another

experiment is conducted to analyze the timing improvement of the optimized

vehicle detection. The timing improvement of steps is listed in Table 7.

Name of step Time consumed before Time consumed after

Image resize (n = 2) ; 5.789 Horizontal gradient filtering 5.417 1.79i SymmetTical measurement 16.366 jU35 Locating vehicle roof and bottom 10.217 10.601 Over-complete Haar transform 12.813 11.750 Total time consumed in vchiclc detection 44.813 33.068 Table 7. Timing information of the vehicle detection after optimization

The result shows that our optimization strategy achieves 25% of timing

improvement in vehicle detection. The steps of horizontal gradieni filtering and

symmetrical measurement have the greatest timing improvement. Although an

extra step, image resize, is created and consume a proportion of lime, together

with the pre-processing step, gradieni filtering, symmelrical measurement has

contribute above 90% of the total timing improvement. However, over-complete

Haar transform does not have any noticeable effect on liming improvement. One

of the reasons is that the pre-compulalion of sub-Haar block also consumes a

reasonable amount of time and covers up the contribution of sub-Haar block

sharing. Nevertheless, the target of reserving more computational resources for

vehicle tracking has been reached. Chapter 6. Real Time Vehicle Tracking using Simplified Image Alignment 78

6. Real Time Vehicle Tracking using Simplified Image

Alignment

6.1. Introduction

In Chapter 3,we have shown how the 3-phase structure achieves multiple vehicles

tracking in outdoor scenes. The experiment shows that our approach is practical in

real vehicle tracking application although there is large variability in vehicle shape,

color, illumination and contrast. The precise tracking in vehicle roof and bottom

allows the newly proposed formula giving accurate distance measurement. The

next step is how to realize the real-time multiple vehicle tracking on embedded

systems. We have simplified the computation in the original image alignment by

minimizing the number of image alignment parameters that would be explained in

this chapter. In this chapter, the image alignment mentioned before is named

"original image alignment" while the newly propose image alignment is named

"simplified image alignment".

6.2. Timing Analysis of Original Image Alignment

We have conducted an experiment to analyze the time complexity of the original

image alignment algorithm for vehicle tracking. The Intel ECX platform is used

for evaluation. Image Sequence A used in Chapter 3 is used for liming analysis.

Table 8 shows the information of the image sequence. During the experiment,

vehicles are tracked along the image sequence based on the vehicle template

described in the detection session. In the original image alignment algorithm, 6 Chapter 6. Real Time Vehicle Tracking using Simplified Image Alignment 79

parameters are used to describe the affine transformation from frame to frame.

Table 9 and Table 10 show the liming information of vehicle template tracking

and our original vehicle tracking methodology respectively. The lime consumed is

taken as the average of 731 frames.

二 Capturing device Logitech webcam 4()()() Frame size 320 x 240 pixels Duration 49 seconds Frame rate 15 frames / sec Total frames 731 frames Maximum number of" cars 3 Size of template About 80 x 80 pixels Table 8. Information of the image sequence in the timing analysis

^tej^i^mag^li^nmen^ ^^^^^Nam^^te^^^^^ Time consumed (millisecond) Pre-compute: (1) Gradient of template 0.156 ^ Jacobian 0.368 Q) Steepest descent images 0.731 (4) Hessian matrix 0.933 Iterate: (The timing is in millisecond / iteration. There are 10 iterations in one frame) ^ Warping Image 3.587 (6) + (7) Steepest descent parameters 0.155 ^ Compute Ap 0.005 ^ Update the warp 0.014 Total time consumed for 10 iterations: 37.610 Total time consumed per frame of 1 vehicle: 39.798 Table 9. Timing information of the original vehicle template tracking per frame of one vehicle Chapter 6. Real Time Vehicle Tracking using Simplified Image Alignment 80

„ Time consumed Frame rate Step (millisecond) (frames / sec) Step 1: Vehicle Detection 33.068 - Step 2: Vehicle Template Tracking (For each vehicle) 39.798 - Total time consumed of 1 vehicle (Step 1 + Step 2): 72.866 13.723 Total time consumed of 2 vehicles (Step 1 +2x Step 2): 112.664 8.875 Total time consumed of 3 vehicles (Step 1 + 3 x Step 2): 152.462 6.559 Table 10. Timing information of the original vehicle tracking methodology per frame

The embedded system is not the target platform for our original image alignment

algorithm. It can be expected that the experimental frame rate cannot reach the

target frame rate of 15 frames per second. Moreover, we should not assume there

is only 1 vehicle following us. As a result, the original vehicle tracking

methodology must be optimized for multiple vehicles real-time tracking. The

liming improvement of the optimization in vehicle tracking would be more

obvious than the optimization in vehicle detection because:

1) The time consumption of vehicle template tracking is proportional to the

number of target vehicles being tracked. The timing improvement would be

more significant when there are more target vehicles.

2) There are 10 iterations when running image alignment tracking algorithm one

time. The timing improvement would be magnified when any one of the steps

in the iteration is optimized.

6.3. Simplified Image Alignment

6.3.1. Reducing the Number of Parameters in Affine Transformation

The idea of simplified image alignment comes from the 4 basic affine

transformations: translation, scaling, shearing and rotation. These transformations

can be described by 6 parameters proposed by Bergen and his colleagues as Chapter 6. Real Time Vehicle Tracking using Simplified Image Alignment 81

shown in Equation (3) in Chapter 3. Table 11 shows those parameters involved in

each basic Iransformation. By observing the sample transformations in Figure 26

ill Chapter 3, it is evident that the cases of shearing and rotation are inappropriate

to apply for successive images for the same vehicle moving on the road and would

not assist the vehicle tracking. Therefore, we propose to exclude the shearing and

rotation transformations so that only 4 parameters (po, ps, p4 & ps) instead of 6

parameters are used to describe the affine transformation.

Po Pi P2 P3 P4 Pa Translation •/ Scaling ^ ^ Shearing � ^ Rotation ^ ^ ^ ^ Table 11. Parameters involved in each basic affme transformation: translation, scaling, shearing and rotation

We have conducted another experiment to support our argument. The variation of

parameters along the image sequence is shown in Figure 48. In this figure, the

parameters po, P3, P4 and ps vary along the image sequence while pi and p2, which

are used to describe the shearing and rotation transformations, fluctuate with small

magnitudes only. The near-zero values of pi and p2 make the terms p, •x and

尸2 •少 in Equation (3) practically equal to zero. In such a way, we simply omit

them from the computation. After the simplification, the number of multiplication

in affine transformation is greatly reduced in image warping (Step 5 of image

alignment). Chapter 6. Real Time Vehicle Tracking using Simplified Image Alignment 82

(a) Variation of parameter n•;

{•ara meter Value

I 3 - k 2 . I I :離她 -2 . W I I -3 -

-5 - • -6 -

-7 0 20 40 61) 80 100 121) 140 160 180 200

Frairif Numhci

(b) Variation of parameter p^

Parameter Value

4 1

3 - i 2 - j

-3 V !

-4 - 丨 i

-5 _ -6 - Q

-7 0 20 40 60 80 100 120 140 160 180 200

Frame Number Chapter 6. Real Time Vehicle Tracking using Simplified Image Alignment 83

(c) Variation of parameter p^

Parameter Value 1.2 j

I 0.8 - I I 1 I

0.6 - I

i

0.4 -

0.2 - a 0 I 1 1 1 1 1 1 1 • 1 ¥ 0 20 40 60 HO 100 120 140 160 180 200 Frame Number

(d) Variation of parameter pn

I^arameter Value

I !

0.8 - i I1

0.6 -

I 1 I 0.4 - I I I j I 0.2 - I j Q 0 1 1 1 1 1 1 1 I 1 •丨 0 20 40 60 80 100 120 140 160 180 200 Frame Number Chapter 6. Real Time Vehicle Tracking using Simplified Image Alignment 84

(e) Variation of parameter Di

Parameter Value

i 1

0.2 -

(.U - 丨

0 • ,^^ 入“M u^^u ^^

-0.1 - — j

-0.2 - -

a -0.3 — 0 20 40 60 80 100 120 140 160 180 200

Frame Number

(f) Variation of parameter d,

Parameter Value 0.3 r- 丨

I I I Q 2 - — -. -- • ———

JIj . ‘ ‘ ^ uju/^t V ��(I

-0.1 -

-0.2 -

-0.3

0 20 40 60 80 100 120 140 160 180 200

Frame Number Figure 48. Variation of affine transformation parameters: (a) p,, (b) p4, (c) p.^, (d) po,(e) p! and (f) p2 along the image sequence Chapter 6. Real Time Vehicle Tracking using Simplified Image Alignment 85

6.3.2. Size Reduction of Image Alignment Matrixes

Besides, because of the fewer number of parameters, the size of Jacobian, Steepest

Descent Images, Hessian Matrix and the inverse of Hessian Matrix are reduced as

shown in Table 12. This results in the decrease in the number of multiplication

operation and time consumption. The timing improvement and the performance

evaluation of the simplified image alignment will be given in Section 4 of this

chapter.

Original image alignment Simplified image alignment Step (Using 6 parameters for affine (Using 4 parameters for affine ^^^ d transformation) transformation) ‘

lyiH Steepest [”“,:�.:精零霸.pSTJ;「— _ ' Descent .- 一 Images 丨".:�.::丨?釋久?丨’.^^^^::/, :’;.•:.’.: :〒. ‘> —

Hessian IXjfl J|||H 5/9 ••H IH

Inverse ^^^H HH Hessian IKir

Table 12. The visualization of Jacobian, Steepest Descent Images, Hessian Matrix and the inverse of Hessian Matrix in the original image alignment and the simplified image alignment

6.4. Experiments and Discussions

We have conducted 2 experiments to evaluate the performance and the timing

improvement of the simplified vehicle tracking methodology. The Intel ECX

platform mentioned is used as the testing platform. The image sequences of the

road traffic were taken by the camera mounted on the rear side of our car and

following the specification mentioned in Table 8. Chapter 6. Real Time Vehicle Tracking using Simplified Image Alignment 86

6.4.1. Successful Tracking Percentage

The first experiment analyzes the reliability of the simplified vehicle tracking

methodology. The 4 image sequences are the same as the experimental data used

in Chapter 3. By observing the number of frame when tracking became

discontinuous, the reliability of the simplified methodology can be compared with

the original one. Table 13 shows the information of the 4 image sequences and

their successful tracking percentage.

Sequence A SequenceB SequenceC SequenceD

ti^ 品 i ^ 0—品. 际―"一mT"

Sequence A Sequence B SequenceC SequenceD

Original Simplified Original Simplified Original Simplified Original Simplified

Road type City streets City streets Highway Highway

Duration 49 seconds 4 seconds 68 seconds 7 seconds

Total number of 731 frames 60 frames 1001 frames 101 frames

frame (NF)

Number of tracking 9 11 0 0 13 22 0 0 lost frame (LF)

Successful tracking KTP 98..8% 98.5% 100.0% 100.0% 98.7% 97.8% 100.0% 100.0% percentage: iNf - Lr NF

Table 13: Tracking percentage of simplified image alignment

The vehicle tracking methodology using the original image alignment is able to

track the vehicles over 98% of frames. When using the simplified image

alignment, the number of 'lost' frame increases slightly in some sequences. The

main reason for the increase can be attributed to the little distortion occurring Chapter 6. Real Time Vehicle Tracking using Simplified Image Alignment 87

when vehicles are moving on the winding roads. In such situation, the 4

parameters are often insufficient to describe the shearing-like distortion. In fad,

the lost tracking seldom occurs successively and the simplified vehicle tracking

methodology still keeps the successful rale high at 97%.

Camera-to-vehicle distance (meters) 12.00 |— —

4.()()

2.00 Simplified (using 4 parameters) —Original (using 6 parameters) 0.00 ~‘‘~‘‘~‘~‘‘~‘~‘~‘~‘~‘‘~‘‘~‘~‘~‘~‘~‘‘~‘~‘~‘‘~‘~‘~I~~J 0 50 100 150 200 250 300 350 400 450 500 550 600 650 700 Frame Number

Figure 49. The comparison of the calculated distance of the simplified and the original vehicle tracking methodology

We compare the calculated distance of the simplified and the original vehicle

tracking methodology. From Figure 49,the mean absolute difference of the

calculated distance of the two methodologies is only 23 cm, a measurement which

is acceptable in the on-road environment.

6.4.2. Timing Improvement

The second experiment analyzes the timing improvement of the simplified vehicle

tracking methodology. The experiment setup is similar with that in Table 8 but Chapter 6. Real Time Vehicle Tracking using Simplified Image Alignment 88

now 4 parameters are used. Table 14 and Table 15 show the liming information of

vehicle template tracking and our simplified vehicle tracking methodology

respectively.

.. -, Original time Time consumed Step in image alignment Name of step , __� ,�� (millisecond) (millisecond) Pre-compute: (1) Gradient of template 0.156 0.141 (2) Jacobian 0.368 0.179 Steepest descent images 0.731 0.282 一 (4) Hessian matrix 0.933 0.401 Iterate: (The timing is in millisecond / iteration. There are 10 iterations in one frame) (5) — Warping Image 3.587 0.772 一 (6) + (7) Steepest descent parameters 0.155 0.095 Compute Ap 0.005 0.004 (9) Update the warp 0.014 0.013 Total time consumed for 10 iterations: 37.610 8.840 39.798 9.843 Table 14. Timing of the simplified vehicle template tracking per frame of one vehicle

� Time consumed Frame rate Step (millisecond) (frames / sec)

Step 1: Vehicle Detection 33.068 - Step 2: Vehicle Template Tracking (For each vehicle) 9.843 - Total time consumed of 1 vehicle (Step 1 + Step 2): 42.911 23.304 Total time consumed of 2 vehicles (Step 1 +2x Step 2): 52.754 - 18.955 Total time consumed of 3 vehicles (Step 1 +3 x Step 2): 62.597 15.975 Table 15. Timing of the simplified vehicle tracking methodology per frame

From the results, we can see that the computation of the Jacobian, Steepest

Descent Images, Hessian Matrix and its inverse and Steepest Descent parameters

benefit from the size reduction using 4 parameters. The step of image warping

even has 80% of timing improvement after the simplification. In particular, the

vehicle template tracking for each vehicle is greatly reduced from around 40

milliseconds to below 10 milliseconds. Moreover, our methodology can track 3

vehicles simultaneously in 15 frames per second. It results in the realization of

multi-vehicle tracking on embedded systems. Chapter 7. Conclusions 89

7. Conclusions

In this thesis, we have described a 3-phase vehicle tracking methodology for

detecting multiple moving vehicles in image sequences captured by a single

camera mounted on driver's car with precise distance information for preventing

automobile collisions. Our methodology differs from most previous methods

which detect vehicles in each frame without correlating the successive frames.

Since the image alignment is robust enough, the tracking would be continuing

once the vehicles are detected at any one of the frames. While there is sudden

change in illumination or contrast making large difference between vehicle

template and the current image,the 3-phase structure would not be interrupted.

Instead, it will update the vehicle template for best fitting. The 3-phase structure

provides valuable information to accommodate changes in the viewpoint and

position of the vehicles. The tracking is precise and allows thus the distances from

the target vehicles to our vehicle be computed accurately using the newly

proposed distance formula. Moreover, the cooperation of 3 phases highly

enhances the successful tracking results.

To realize the real-time tracking of multiple vehicles using embedded systems, we

have optimized and simplified the image alignment tracking algorithm by

minimizing the number of parameters in the affine transformation. The improved

lime consumption of vehicle tracking is only 25% of the original one. Although

the calculated distances are slightly different from the actual measurements, the

successful tracking rate maintains over 97%. Therefore, we can claim that our

methodology is able to track multiple rear vehicles with distance information Chapter 7. Conclusions 90

accurately using embedded systems in real-lime.

Further enhancement can be made lo reduce the failure rate, especially when the

vehicles are traversing along some winding roads. When computing the distances

from rear vehicles, we assume that the horizontal line remains in position in the

image. Nevertheless, running on the bumpy road may cause the assumption not

valid. In the future, there should be a method to update the position of horizontal

line dynamically so that the distance measurement formula can still be applied in

different road situation. All image sequences used in experiments are taken by the

camera mounted on a real car traveling in highway and city streets. It is lime and

energy consuming. Due to the advance of computing graphics, image sequences

of road traffic synthesized by computers, like the image sequences in computer

racing game, would be an alternative of the source of experimental data used in

simulation. Besides, because of the absence of publicly available road traffic

database, benchmarking our system's performance is difficult, making it difficult

to compare our results with others.

Providing inter-vehicle separating distance and advance warning to a driver is just

the beginning of the development of Intelligent Transportation Systems.

Ultimately, we hope that the techniques developed in our system could lead lo the

development of other higher level advice for driving safety like braking, steering,

changing lane and vehicle trajectory prediction. Chapter 8. Bibliography 91

8. Bibliography

[1] K. Hartman and J. Strasser, "Saving Lives Through Advanced Vehicle Safety Technology, “ Cambridge Systematics, Inc., MA. [Online]. Available: httt)://www.itsdocs.fhwa.dot.gov/JPODOCS/REPTS PR/14153 files/ivi.pdf

[2] Z. Sun,G. Bebis and R. Miller, "On-Road Vehicle Detection Using Optical Sensors: A Review, “ IEEE Intelligent Transportation Systems Conference, Oct 2004.

[3] E. H. Adelson and J. R. Bergen, "The plenoptic function and the elements of early vision, “ 111 Computational Models of Visual Processing, M. Landy and J.A. Movshon, MIT Press: Boston, MA, pp. 1-20.

[4] B. Lucas and T. Kanade, "An iterative image registration technique with an application to stereo vision, “ In Proceedings of the international Joint Conference on Artificial Intelligence, pages 674-679,1981.

[5] G. D. Hager and P. N. Belhumeur, "Efficient region tracking with parametric models of geometry and illumination, “ IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(10): 1025-1039, 1998.

[6] G. P. Stein, O. Mano and A. Shashua, "Vision-based ACC with a Single Camera: Bounds on Range and Range Rate Accuracy, “ IEEE Intelligent Vehicles Symposium, Jun. 2003,Columbus, OH

[7] Q.T. Luong, J. Weber, D. Koller and J. Malik, "An integrated stereo-based approach to automatic vehicle guidance, “ Proceedings of the 5th ICCV, June 1995.

[8] A. Giachetti,M. Camoani and V. Torre, "The use of optical flow for road navigation, “ IEEE Transactions on robotics and automation, vol. 14,no.l,pp. 34-48. 1998.

[9] S.M. Smith, "ASSET-2: Real-time motion segmentation and shape tracking, “ In Proceedings of 5th International Conference on Computer Vision, pages 237-244, 1995. Chapter 8. Bibliography 92

[10] M. Betke, E. Haritaoglu and L. S. Davis,"Real-Time Multiple Vehicle Detection and Tracking from a Moving Vehicle, “ Journal of Machine Vision and Applications, vol. 12, no. 2,pp. 69-83,September 2000.

[11] "Advance warning system ,,,http://www.mobileve.com

[12] I.Gal, M.Benady and A.Shashua, "Monocular Vision Advance Warning System for the Automotive Aftermarket’” SAE World Congress & Exhibition 2005, April 2005, Detroit USA.

[13] M. Bertozzi and A. Broggi, "Gold: A parallel real-time stereo vision system for generic obstacle and lane detection, “ IEEE Transaction on Image Processing, vol. 7, pp. 62-81, 1998.

[14] C. Tzomakas and W. Seel en, "Vehicle detection in traffic scenes using shadows,“ Techical Report 98-06,Institut fur neuroinformatik, Ruht-universital, Bochuni, Germany,1998.

[15] N. Matthews, P. An, D. Chamley and C. Harris, "Vehicle detection and recognition in grayscale imagery, “ Control Engineering Practice, vol. 4, pp. 473-479, 1996.

[16] C. Goerick, N. Derlev and M. Werner, "Artificial neural networks in real-time car detection and tracking applications, “ Pattern Recognition Letters, vol. 17,pp. 335-343,1996.

[17] U. Handmann, T. Kalinke, C. Tzomakas, M. Werner and W. Seeleii, "An image processing system for driver assistance, “ IEEE International Conference on Intelligent Vehicles, pp. 481-486,Stuttgart, Germany, 1998.

[18] R Parodi and G. Piccioli,"A feature-based recognition scheme for traffic scenes, “ IEEE Intelligent Vehicles Symposium, pp. 229-234, 1995.

[19] N. Srinivasa, "A vision-based vehicle detection and tracking method for forward collision warning. “ IEEE Intelligent vehicle symposium, pp. 626-631,2002. Chapter 8. Bibliography 93

[20] T. Bucher, C. Curio el al, "Image processing and behavior planning for intelligent vehicles, “ IEEE Transactions on Industrial Electronics, vol. 50, no. 1,,pp. 62-75,2003.

[21] H. Mori and N. Charkai, "Shadow and rhythm as sign patterns of obstacle detection, “ International Symposium on industrial electronics, pp. 271-277, 1993.

[22] E. Dickmanns et. al, "The seeing passenger car ‘vamors-p“ International Symposium on Intelligent Vehicles' 94,pp. 24-26, 1994.

[23] N. Kiryati and Y. Gofman, "Detecting Symmetry in Grey Level Images: The Global Optimization Approach, “ International Journal of Computer Vision, Vol. 29 No.l,pp 29-45,1998.

[24] C. P. Papageorgiou and T. Poggio, “A Trainable Object Detection System: Car Detection in Static Images, “ MIT Al Memo No. 180. October, 1999.

[25] C. K. Chui,"An Introduction of Wavelets, “ Academic Press, San Diego, 1992.

[26] C. Mulcahy, "Image Compression using Haar Wavelet Transform, “ Spelman Science and Math Journal.

[27] S. G. Mallat, "A theory for multi-resolution signal decomposition: The wavelet representation, “ IEEE Transactions on Pattern Analysis and Machine Intelligence, ll(7):674-93, July 1989.

[28] E. J. Stollnitz, T. D. DeRos and D. H. Salesin,"Wavelets for computer graphics: A primer, “ Technical Report 94-09-11, Department of Computer Science and Engineering, University of Washington, September 1994.

[29] M. Oren, C. Papageorgiou, P. Sinha,E. Osuna, and T. Poggio, "Pedestrian detection using wavelet templates, “ In Computer Vision and Pattern Recognition, pages 193-199,1997.

[30] M. Black and A. Jepson,"Eigen-tracking: Robust matching and tracking of articulated objects using a view-based representation, “ International Journal of Computer Vision, 36(2): 101-130,1998. Chapter 8. Bibliography 94

[31] S. Baker and I. Matthews, "Lucas-Kanade 20 Years On: A unifying Framework,“ International Journal of Computer Vision, Vol. 56,No. 3, March, 2004, pp. 221 — 255.

[32] J.R. Bergen, P. Anandan, KJ. Hanna, and R. Hingorani, "Hierarchical model-based motion estimation, “ In Proceedings of the European Conference on Computer Vision, pages 237-252, 1992.

[33] H. Y. Shum and R. Szeliski, "Construction of panoramic image mosaics with global and local alignment, “ International Journal of Computer Vision, 16(l):63-84, 2000.

[34] S. Baker and I. Matthews, "Equivalence and efficiency of image alignment algorithms, “ In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, volume 1, pages 1090-1097, 2001.

[35] F. Bashir and F. Porikli, "Performance Evaluation of Object Detection and Tracking Systems, “ IEEE International Workshop on Performance Evaluation of Tracking and Surveillance (PETS),June 2006 (PETS 2006)

[36] G. Cao, J. Jiang and J. Chen, "An improved object tracking algorithm based on image correlation, “ Industrial Electronics, 2003. ISIE '03. 2003 IEEE International Symposium on,Vol.1, Iss.,9-11 June 2003.

[37] U.M. Cahn von Seelen and R. Bajcsy, "Adaptive correlation tracking of targets with changing scale, “ Technical report, GRASP Laboratory Technical Report, June 1996.

[38] "Open Source Computer Vision Library" (OpenCV), http://www.intel.coni/technologv/comDuting/opencv/

[39] G. R. Bradski, "Computer Vision Face Tracking For Use In Perceptual User Interface, “ In Intel Technology Journal, pages 123-140, 1998.

[40] T. Chateau, V. Gay-Belille, F. Chausse and J. Lapreste, "Real-Time Tracking with Classifiers, “ Workshop on Dynamical Vision, European Conference on Computer Vision 2006. Chapter 8. Bibliography 95

[41] J. Canny, "A computational approach to edge detection, “ IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 8, Issue 6 (Nov. 1986), 679-698.

[42] "Writing Efficient Cfor ARM, “ Application Note, Advanced RISC Machines Ltd (ARM), 1998 . ” � .•

• .

• .

. , - ‘ ‘ •;

......

> ,.• . .. ..^ • --C'-l •. � ..�‘A�,.-..,‘"- ' ‘

、...〜. :“ , . - . •、 -

‘ •* ... •‘:., . .

» ...... t, •,.-..,...... V • •. V- ::'� :‘,:• . • ., = C • •� ‘. • “ . .� .. ‘‘ ,,:..:•�• ’. -‘•‘• , ‘‘ .�. .‘..V:,.0:,••";,�. •、.\:.‘ , • •

• , •识;、:’:.....、. . • . •、.i' v..厂、:. .. . r » - • ‘ w • • . . .、. ...,.:•

, • “‘ • - ‘ • - •‘ .

.-_..

.. L .... •

.“ - -…. • • • • ” »* . • V. • -‘. . . 、 : • ‘ - - ... ….”:.L ..,•乂; ::...々'..., ....,.。•_;:: — CUHK Libraries i酬^^^ 004439967