Motion Tracking on Embedded Systems -
Vision-based Vehicle Tracking using
Image Alignment with Symmetrical Function
CHEUNG, Lap Chi
A Thesis Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Philosophy in Computer Science and Engineering
© The Chinese University of Hong Kong August 2007
The Chinese University of Hong Kong holds the copyright of this thesis. Any person(s) intending to use a pari or whole of the materials in the thesis in a proposed publication must seek copyright release from the Dean of the Graduate School. Iti^Hj u 謹 ITY y^/j ^XLIBRARY SYSTEM^纷 Thesis/Assessment Committee
Professor WONG Kin Hong (Chair) Professor MOON Yiu Sang (Thesis Supervisor) Professor JIA Jiaya Leo (Committee Member) Professor FAIRWEATHER Graeme (External Examiner) i
Abstract
The large number of rear end collisions due to driver inattention has been identified as a major automotive safety issue in Intelligent Transportation Systems (ITS), part of whose goals is to apply automotive technology to improve driver safely with efficiency. In this thesis, we describe a 3-phase vehicle tracking methodology with a single moving camera mounted on the driver's automobile as input for detecting rear vehicles on highways and city streets for the purpose of preventing potential rear-end collisions. In the first phase, a rear vehicle is detected by symmetrical measurement and Haar transform. A template is also created for tracking the vehicle using image
alignment techniques in the second phase. The template is continuously monitored
and updated to handle abrupt changes in surrounding conditions in the third phase.
Different from most previous methods which simply delect vehicles in each frame
without correlating the successive frames, our methodology gives precise tracking, an
essential requirement for estimating the distances between the rear vehicles and the
driver's car using the newly proposed formula. To implement our methodology in
real-time embedded systems, we further enhance the efficiency of our algorithms with
the aid of symmetrical function and simplified image alignment techniques.
Preliminary testing results show that our system acquires a successful tracking rate
over 97%. ii
論文摘要
由於駕駛者的低警覺性造成的車輛尾端碰撞意外已成爲智能運輸系統(Intelligent
Transportation Systems)的一大議題°智能運輸系統就是透過應用自動化技術來提
高駕駛安全及效率。在本論文中,我們會描述一個三階段車輛追蹤法,以設於駕
駛者車輛上的單鏡攝像頭作爲系統輸入來偵測後逐車輛’防止尾端碰撞的發生。
在第一階段,我們使用對稱測量(Symmetrical Measurement)及哈爾轉換(Haar
Transform)兩方法來偵測後逐車輛,並創造模板供第二階段的圖像拼接法(Image
Alignment)作物體追蹤。其間模板會不斷受到監察,一旦遇到環境突變而影響追
蹤效能,第三階段會即時作出模板更新以保持精確的車輛追蹤。新提出的追蹤方
法著重於影像片間之交互作用’有別於現有的簡單單一影像片偵測法。透過應用
三階段車輛追縱法及最新提出的車輛相互距離計算方程式,便能對後逐車輛的距
離作出準確的運算°在對稱測量及簡化圖像拼接法(Simplified Image Alignment)
的幫助下,這實時多車輛追蹤法更能在嵌入式系統上實現,而初步的實驗追蹤率
更維持在97個巴仙以上。 iii
To my dearest family and friends iv
Acknowledgement
I wish to thank my supervisor, Prof. Moon Yiu Sang, for his continuous guidance and supervision, for supporting me to attend academic conferences and for all his teachings in research, learning and living. Also, gives me many opportunities being the leader of projects and competition. I wish to express my gratitude to Prof. Wong
Kin Hong and Prof. JIA Jiaya Leo for being members of my thesis committee. I would also like to thank Prof. G. Fairweather for being my external marker.
Special thanks are due to the Robotics Institute at the Carnegie Mellon University for making their image alignment algorithm implementation publicly available. Without their contribution, this research work would not have such a good start. I gratefully acknowledge helpful discussions with Mr, Chen Jiansheng, Mr. Chan Ka Cheong, Mr.
Chan Fai and Mr. Tang Xinmin for sharing their research insight and ideas with me.
My sincere thanks to Wong Kwong Wa and Tse Wing Cheong for doing the initial study of the vehicle tracking system. I must thank Tony, Kim, Oscar, Simon and other technical staff for their patience with me and excellent work in maintaining a stable
computing environment. I would like to thank the Chinese University of Hong Kong
for providing such a beautiful campus, a resourceful library and excellent recreational
facilities and Shaw College for providing such a wonderful college life and
comfortable living place.
Then there is the list of friends and colleagues that deserve my deepest gratitude.
Many thanks to Kuen, Sunny, Siu Ming, Felix, Yu Tsz and Fung, for bringing me the V
memorable joy in Intel Cup competition including working overnight at room 1024, trip lo Shanghai and Beijing and all of the follow-up events; Ming, Brittle, Kit, Brian,
Cheong, Ocean, CJ, Ma Fai, Kevin, Lo Wing and Chan Fai, for creating such enjoyable and relaxing atmosphere; all of the friends in room 1003, for letting me know what is meant by efficient and hardworking; all of the hostel tutors in Student
Hostel II,Shaw College, for working together to organize joyful and meaningful
hostel activities.
And, my heartfelt thanks to those who have helped me one way or the other, even
though both you and I may not be aware of it. I wish to express my deepest gratitude
to Ming, Chan Fai, Cyrus and Peng Xiang for giving me humble advice and chatting
with me during trouble limes. Thanks Ming and Chan Fai, my TA partners, for
sharing the workload when I am busy in research. You both are enthusiastic in
leaching! My special thanks goes to Peng Xiang, who gave me motivation to finish
this thesis by telling me thai he had already finished his in April. To the researchers in
SPIE and MVA conferences, thanks your comments and suggestions which help me
keep improving my thesis.
I would like to thank the world for being such a wonderful place to live in, to
appreciate and to explore. Thanks the Department of Computer Science and
Engineering, the Chinese University of Hong Kong. I am glad to study here for 5
years of undergraduate and postgraduate study.
And most of all, thanks to my family members, Mum, Father, Sister, Brother and
Ching for giving me love, support and concern during trouble limes. vi
Table of Contents
1. INTRODUCTION 1
1.1. BACKGROUND I
1.1.1. Introduction to Intelligent Vehicle 1
1.1.2. Typical Vehicle Tracking Systems for Rear-end Collision Avoidance 2 1.1.3. Passive VS Active Vehicle Tracking 3 1.1.4. Vision-hased Vehicle Tracking Systems 4 1.1.5. Characteristics of Computing Devices on Vehicles 5
1.2. MOTIVATION AND OBJECTIVES 6
1.3. MAJOR CONTRIBUTIONS 7
1.3.1. A 3-phase Vision-based Vehicle Tracking Framework 7 7.3.2. Camera-to-vehicle Distance Measurement by Single Camera 9 1.3.3. Real Time Vehicle Detection 10 1.3.4. Real Time Vehicle Tracking using Simplified Image Alignment JO
1.4. EVALUATION PLATFORM 11
1.5. THESIS ORGANIZATION 11
2. RELATED WORK 13
2.1. STEREO-BASED VEHICLE TRACKING 13
2.2. MOTION-BASED VEHICLE TRACKING 16
2.3. KNOWLEDGE-BASED VEHICLE TRACKING ] 8
2.4. COMMERCIAL SYSTEMS 19
3. 3-PHASE VISION-BASED VEHICLE TRACKING FRAMEWORK 22
3.1. INTRODUCTION TO THE 3-PHASE FRAMEWORK 22
3.2. VEHICLE DETECTION 23
3.2.1. Overview of Vehicle Detection 23 3.2.2. Locating the Vehicle Center — Symmetrical Measurement 25 3.2.3. Locating ihe Vehicle Roof and Bottom 28 3.2.4. Locating the Vehicle Sides 一 Over-complete Haar Transform 30
3.3. VEHICLE TEMPLATE TRACKING - IMAGE ALIGNMENT 37
3.3.5. Overview of Vehicle Template Tracking 57 3.3.6. Goal of Image Alignment 41 3.3.7. Allernative Image Alignment - Compositional Image Aligmnent 42
3.3.8. Efficient Image Alignment - Inverse Compositional AIgorithm 43 vii
3.4. VEHICLE TEMPLATE UPDATE 46
3.4.1. Situation of Vehicle lost 46 3.4.2. Template Fitting by Updating the posiliom of Vehicle Features 48
3.5. EXPERIMENTS AND DISCUSSIONS 49
3.5.1. Experiment Setup 49 3.5.2. Successful Tracking Percentage 50
3.6. COMPARING WITH OTHER TRACKING METHODOLOGIES 52
3.6.1. 1-phase Vision-based Vehicle Tracking 52 3.6.2. Image Correlation 54 3.6.3. Confinuously Adapiive Mean Shift 58
4. CAMERA-TO-VEHICLE DISTANCE MEASUREMENT BY SINGLE CAMERA 61
4.1. THE PRINCIPLE OF LAW OF PERSPECTIVE 61
4.2. DISTANCE MEASUREMENT BY SINGLE CAMERA 62
5. REAL TIME VEHICLE DETECTION 66
5.1. INTRODUCTION 66
5.2. TIMING ANALYSIS OF VEHICLE DETECTION 66
5.3. SYMMETRICAL MEASUREMENT OPTIMIZATION 67
5.3.1. Diminished Gradieni Image for Symmelrical Measurement 67 5.3.2. Replacing Division hy Multiplication Operations 77
5.4. OVER-COMPLETE HAAR TRANSFORM OPTIMIZATION 73
5.4.1. Char acl eristics of Over-complete Haar Transform 73 5.4.2. Pre-compntation of Haar block 74
5.5. SUMMARY 77
6. REAL TIME VEHICLE TRACKING USING SIMPLIFIED IMAGE ALIGNMENT 78
6.1. INTRODUCTION 78
6.2. TIMING ANALYSIS OF ORIGINAL IMAGE ALIGNMENT 78
6.3. SIMPLIFIED IMAGE ALIGNMENT 80 6.3.1. Reducing the Number of Parameters in AJfine Transformation 80 6.3.2. Size Rednclion of Image A ligmneni Matrixes 85
6.4. EXPERIMENTS AND DISCUSSIONS 85
6.4.1. Successful Tracking Percentage 86 6.4.2. Timing Improvement 87
7. CONCLUSIONS 89
8. BIBLIOGRAPHY 91 viii
List of Tables
Table 1. Specification of the evaluation system - Intel ECX platform 11 Table 2. Setting of camera mounted on the testing vehicle 50 Table 3. Information and tracking percentage of road image sequences 51 Table 4. The successful tracking percentage of 3-phase structure versus 1 -phase structure 53 Table 5: Experiment information of the adaptive correlation vehicle tracking 56 Table 6. Timing information of the vehicle detection 67 Table 7. Timing information of the vehicle detection after optimization 77 Table 8. Information of the image sequence in the timing analysis 79 Table 9. Timing information of the original vehicle template tracking per frame of one vehicle 79 Table 10. Timing information of the original vehicle tracking methodology per frame 80 Table 11. Parameters involved in each basic affine transformation 81 Table 12. The visualization of matrixes in the original the simplified image alignment 85 Table 13: Tracking percentage of simplified image alignment 86 Table 14. Timing of the simplified vehicle template tracking per frame of one vehicle 88 Table 15. Timing of the simplified vehicle tracking methodology per frame 88 ix
List of Figures
Figure 1. The 8 major problem areas in Intelligent Transportation Systems 1 Figure 2. Typical vehicle tracking system for rear-end collision avoidance 3 Figure 3. An example of ROI (Region of Interesting) in vehicle tracking 5 Figure 4. Intel ECX platform embedded with Intel Celeron M 6(){)MHz processor 6 Figure 5. Testing vehicle used in the project "Machine Vision Based Vehicle Guidance" 13 Figure 6. Stereo image pair and its corresponding disparity maps 14 Figure 7. Objects above and on the road surface detected by the disparity map 15 Figure 8. Vehicles detected and tracked by the stereo-based vehicle tracking method 16 Figure 9. Flow field and object segmentation computed by ASSET : 17 Figure 10. Vehicle tracking result of ASSET 17 Figure 11. Result of "Real-Time Multiple Vehicle Detection and Tracking from a Moving Vehicle" ...19 Figure 12. Tracking result of EyeQ with distance information 20 Figure 13. Visualization of the 3-phase framework 23 Figure 14. Segmentation of the vehicle part from the whole image 24 Figure 15. Symmetrical measurement using gradient image with horizontal and vertical edges 27 Figure 16. Symmetrical measurement using gradient image with horizontal edges 28 Figure 17. Searching window with the symmetrical line of the vehicle as the centerline 29 Figure 18. Locating the car roof and car bottom within the searching window 30 Figure 19. The good vehicle template versus the bad template 31 Figure 20. An example of vehicle images on road environment : 32 Figure 21. Multi-scaling Haar transformed car image 33 Figure 22. Steps of estimating the location of the vehicles, sides 34 Figure 23. The standard Haar Transform Framework 35 Figure 24. The over-complete Haar Transform Framework 36 Figure 25. Vertical coefficient of the vehicles, side image 37 Figure 26. The four basic transformations in affme transformation 38 Figure 27. Image warping in vehicle tracking 40 Figure 28. Visualization of the image alignment tracking process 42 Figure 29. The framework of the inverse compositional algorithm for vehicle template tracking 46 Figure 30. The situation of vehicle tracking lost 47 Figure 31. The framework of Vehicle Template Update 49 Figure 32. An example of wrongly detected vehicle 52
Figure 33: Vehicle tracking using adaptive correlation-based matching 57 Figure 34: Visual difference between the vehicle template and the sub-image 58 V
Figure 35. Vehicle tracking using CAMShift 59 Figure 36. Schematic diagram of the imaging geometry of inter-vehicles 61 Figure 37. Schematic diagram of the imaging geometry of vehicles with different height 62 Figure 38. Newly proposed formula for computing the camera-to-vehicle distance 63 Figure 39. The distance computed by Mobileye® formula and the newly proposed formula 63 Figure 40. Result of vehicle tracking images with distance information 64 Figure 41. The steps involved in diminished gradient image for symmetrical measurement 68 Figure 42. The comparison of symmetrical measurement with original diminished image input 70 Figure 43. Vehicle Template Update correcting the initial lateral position error 71 Figure 44. Wavelet computation in over-complete Haar Transform 74 Figure 45. Overlapping relation between Haar blocks 75 Figure 46. Average intensity of overlapping region shared by two blocks 75 Figure 47. Division and reuse of sub-Haar blocks 76 Figure 48. Variation of affine transformation parameters 84 Figure 49. Comparison of computed distances of simplified and original tracking methodology 87 xi
List of Algorithms
Algorithm �.Origina implementatiol n of symmetrical measurement 72 Algorithm 2. Implementation of symmetrical measurement without division 73 Chapter 1. Introduction 1
1. •ntroduction
1.1. Background
1.1.1. Introduction to Intelligent Vehicle
Despite advances over the past 100 years, no aspect of automotive technology has
ever tried to accomplish what the human driver does with his or her own eyes.
Providing drivers with additional in-vehicle information is a complex endeavor
that unless technologies are carefully designed may even compromise driver
safety and efficiency. For this reason, Intelligent Transportation Systems (ITS) has
become an active research area over the last decade. The ITS is something like an
intelligent navigator, thai can, in place of a human navigator sitting on the next
seal, gives the driver appropriate advice on safe and efficient driving.
f Rear-End Collision \ � Avoidance J
V y V Avoidance J
广 / Intelligent \ (Driver Condition \ Transportation A Road Departure \ V Warning / _ V Collision Avoidance / \ Systems j
( Vision \ I Enhancement J
Figure �.Th 8e major problem areas in Intelligent Transportation Systems Chapter 1. Introduction 2
Researchers have carefully selected 8 major problem areas [1] that they consider
“prime candidates” for improving driver performance because they improve safety
or may impact safety. Figure 1 shows the 8 major areas. Among the major areas,
rear-end collisions account for one in four crashes or over 1.5 million crashes a
year. New technologies will be used to sense the presence and speed of vehicles
behind and provide warnings to avoid collisions. Early versions will use
extensions of adaptive cruise control capabilities to delect and classify stationary
objects and to determine the level of threat from vehicles in rear. It results in the
moving vehicle detection being the important component in an ITS that can be
used in driver assistance system to give warning signals to drivers in case of
potential collisions.
1.1.2. Typical Vehicle Tracking Systems for Rear-end Collision Avoidance
Generally, rear-end collisions occur when the drivers become unaware or aware
too late of the potential dangers on the road. The vehicles are closer than they
appear in convex-type side mirrors so that the driver may underestimate the
approach of rear vehicles. Vehicle tracking on automobiles aims to provide an
automatic and continuous supervision on road situation aiming to alert a driver
about driving environments, and possible collision with rear-end vehicles have
attracted a lot of attention. A typical vehicle tracking system for rear-end collision
avoidance includes capturing sensor input, recognizing the current traffic situation
and generating appropriate driving advice. Figure 2 shows the architecture of a
typical vehicle tracking for rear-end collision avoidance. The road scene
recognition subsystem recognizes the current traffic situation using the rear-end
sensor mounted on the driver vehicle as input. The advice generation subsystem Chapter 1. Introduction 3
generates appropriate advice and gives it to the driver. The driver may control the
vehicle according to the given advice.
Vehicle Tracking System for Rear-end Collision Avoidance
Road Scene H;。" Advice Recognition _ Generation Subsystem Subsystem
Rear-end Advice or Sensor Input j� Warning
Figure 2. Typical vehicle tracking system for rear-end collision avoidance
1.1.3. Passive VS Active Vehicle Tracking
In the vehicle tracking system, the first task is to identify the rear vehicles. One of
the common approaches to vehicle detection is using active sensors such as lasers,
light detection and ranging, or millimeter-wave radars as the sensor input. They
are called active because they detect the distance of an object by measuring the
travel time of a signal emitted by the sensors and reflected by the object. Their
main advantage is that they can measure certain quantities (e.g. distance) directly
requiring limited computing resources. Prototype vehicles employing active
sensors have several drawbacks, such as low spatial resolution, and slow scanning
speed [2]. Moreover, when a large number of vehicles are moving simultaneously
in the same direction, interference among sensors of the same type poses a big
problem. Chapter 1, Introduction 4
Optical sensors, such as normal cameras, are usually referred to as passive sensors
because they acquire data in a non-intrusive way. One advantage of passive
sensors over active sensors is cost. Optical sensors can be used to track more
effectively cars entering a curve or moving from one side of the road to another.
Also, visual information can be very important in a number of related applications,
such as lane detection,traffic sign recognition, or object identification (e.g.,
pedestrians, obstacles), without requiring any modifications to road infrastructures.
On the other hand, vehicle detection based on optical sensors is very challenging
due to huge within class variability. For example, vehicles may vary in shape, size,
and color. Vehicle appearance depends on its pose and is affected by nearby
objects. Illumination changes, complex outdoor environments (e.g. illumination
conditions), unpredictable interactions between traffic participants, and cluttered
background are difficult to control.
1.1.4. Vision-based Vehicle Tracking Systems
The passive vehicle tracking system becomes popular because of the physical
drawbacks like low spatial resolution, slow scanning speed and interference of
active approaches and the advancing of vision-based vehicle tracking algorithm.
Therefore, it is worthwhile to investigate completely vision-based approaches for
vehicle tracking. The primary stage of detection and tracking is the moving
objects segmentation which is a class of pattern recognition problem. An accurate
segmentation of the vehicles is required in order to obtain an accurate tracking.
ROI (Region of Interesting) extraction is first done to segment candidate vehicle
regions, called vehicle template, of which the commonly used method includes
stereo-based, motion-based and knowledge-based. Figure 3 shows an example of
I Chapter 1. Introduction 5
ROI in vehicle tracking. The aim of vehicle tracking is to locate the updated
vehicle location over the image sequence by minimizing the image difference
between the extracted ROI and the current images.
Mm jw^
Jteajikessr^^C'
Figure 3. An example of ROI (Region of Interesting) in vehicle tracking
1.1.5. Characteristics of Computing Devices on Vehicles
Limited power supply, small space and closed environment restrict the
computational resources available on automobiles. Although the technology keep
advancing, common computers are not suitable in the automobile environment
due to high power consumption, large size and high heal dissipation. Typical
processors for computation on vehicles such as the Intel ECX (Embedded
Compact Extended) platform, one of its design purposes being the in-vehicle
technology, has low power consumption, compact size and fanless capacity.
Figure 4 shows the Intel ECX platform.
However, the low operating frequency makes the real-time image processing
applications, such as real-time vehicles tracking, difficult to be run on this type of Chapter 1. Introduction 6
platform. For example, the operating frequency of Intel ECX platform is 600 MHz.
Therefore, dedicated efforts must be made to optimize the performance of
real-time tracking algorithm.
�.i :; I I i ; _AA_U Ur_.. iHliff^l^fflfflBm^
^BlLll^^B^H 酬瞧i ihililniHiiiinijilllli ffi^HHHQl^^U
Figure 4. Intel ECX platform embedded with Intel Celeron M 6{)()MHz processor
1.2. Motivation and Objectives
The large number of rear end collisions due to driver inattention has been
identified as a major automotive safely issue. Even a short advance warning can
significantly reduce the number and severity of the collisions. We have developed
a real-time rear vehicle detection system, which can give an advance warning to a
driver when the relative distance between a rear vehicle and the driver's car is too
small, for the purpose of prevention of rear-end collisions.
Our research is focused on developing a real-time moving vision-based vehicle
detection system for operation in outdoor scenes thai employs only a single
moving camera mounted on the driver's automobile as input, for use in delecting Chapter 1. Introduction 7
multiple rear vehicles on highways and city streets. Several facts make this task
rather challenging:
1) Real-time performance requires efficient algorithms
2) Cameras mounted in a moving vehicle are not static, which means most
methods used by surveillance systems can not be applied here
3) Scenes are dynamic and so, less prior knowledge can be utilized
4) Illuminalion and contrast change, even in the same scene at different times
5) Objects' shape and color have broad ranges
111 this thesis, we will propose a 3-phase vision-based framework for robust and
precise vehicle tracking in outdoor scenes with camera-to-vehicle distance
measuremenl using single moving camera. Our goal is to show thai real-time
multiple vehicle tracking can be achieved on computing devices on automobiles
with high successful tracking percentage.
1.3. Major Contributions
1.3.1. A 3-phase Vision-based Vehicle Tracking Framework
In our proposal, the vehicle tracking is divided into three phases: vehicle detection,
vehicle template tracking and vehicle template update. The vehicle is first located
by symmetrical function using the special feature of vehicles, symmetry, in the
images. The lateral centers of vehicles in images are located with high
symmetrical measurement. With the aid of symmetrical function, the region of
interest in the image can be greatly reduced. Moreover, it makes the methodology
becomes relatively simple to implement using embedded system technology in the
automobile environment. By locating the car features like car roof, car bottom and Chapter 1. Introduction 8
car sides using the techniques of horizontal edge response, horizontal pixel
intensity response and over-complete Haar transform respectively, the vehicle is
delected in the image and the segmented region, called vehicle template, is passed
to the vehicle template tracking phase.
The 3-phase structure is used in our methodology instead of simple detection of
vehicles in each frame, or a 1 -phase vehicle detection framework, because
traditional vehicle detection techniques for tracking cannot distinguish between
changes in viewpoint or configuration of the vehicles relative to the camera [3].
Such a structure aims to determine the image configuration of a target region of a
vehicle as it moves through a camera's field of view and estimate the steropsis and
motion of vehicles. Its objective is to determine,in a global sense, the movement
of an entire vehicle template over a long sequence of images. The information of
change in viewpoint and position varies in the tracking system, leading to clue for
estimation of the relative distance or the contact time of a rear vehicle.
After the vehicles have been located, and segmented in the image, the segmented
area is used as a template for object tracking employing image alignment
technique consisting of moving, and possibly deforming the template lo minimize
the difference between the template and the current image. Since the first use of
the Lucas-Kanade optical flow algorithm [4], image alignment has become one of
the most widely used techniques in computer vision. There are many image
alignment algorithms and the inverse compositional algorithm is adopted in our
methodology because of its high efficiency whose keys include switching the role
of the image and the template and the pre-computation of template gradient,
Jacobian, steepest descent images and Hessian matrix. Chapter 1. Introduction 9
What makes vehicle tracking difficult is the potential variability in the images of
vehicles over time. This variability arises from two principal sources: variation in
vehicle pose or deformations and variation in illumination [5]. When ignored, the
variability is enough to cause a tracking algorithm to lose its target. This result
calls for the template update in phase 3.
The proposed 3-phase framework is robust as it can tolerate frequent illumination
and contrast changes under different lighting environments. In addition, it can
handle vehicles with broad ranges of shapes and colors as well as multiple
numbers of vehicles simultaneously.
1.3.2. Camera-to-vehicle Distance Measurement by Single Camera
Our methodology is to provide an advance warning to a driver when the relative
distance between a rear vehicle and the camera on the driver's car is below a
certain threshold. The relative distance from the target vehicle to the camera on
the driver's car can be found by the newly proposed formula based on the
equation proposed by an intelligent transportation system provider, Mobileye® [6]
using the law of perspective with the height of vehicles in the image. The
continuously distance information is provided by tracking the vehicles, and also
the change of height, in the image.
The newly proposed formula relaxes the assumption that the top of the vehicle
must be at the same level of the camera, independent of vehicle types. Chapter 1. Introduction 10
1.3.3. Real Time Vehicle Detection
To enable real time performance, several computational techniques are adopted in
our system. Firstly, it is not practical to search the ROI of vehicles from the entire
image. In our case, symmetrical measurement is employed to reduce the searching
area. Such measurement is further optimized using techniques of diminishing
image input and replacing division operations so that more computational
resources could be reserved for vehicle tracking and other advanced intelligent
vehicle features. Secondly, we also optimize the over-complete Haar transform by
pre-computation.
1.3.4. Real Time Vehicle Tracking using Simplified Image Alignment
The original image alignment tracking methodology is loo computational
intensive to run on embedded systems in real-time. To realize the real-time
tracking, we have simplified the computation by minimizing the number of image
alignment parameters. The idea of simplifying image alignment stems from the 4
basic affme transformations: translation, scaling, shearing and rotation. From
experience and experiment results, it is evident that the cases of shearing and
rotation are inappropriate to apply for successive images for the same vehicle
moving on the road and would not assist the vehicle tracking. Therefore, we
propose to exclude the shearing and rotation transformations so that the real-time
vehicle tracking can be achieved on embedded systems. Chapter 1. Introduction 11
1.4. Evaluation Platform
Limited power supply, small space and closed environment restrict the
computational resources available on automobiles. Although the technology keep
advancing, common desktop computers are not suitable in the automobile
environment due to high power consumption, large size and high heat dissipation.
The Intel ECX (Embedded Compact Extended) platform which has low power
consumption, compact size and fanless capacity is ideal for in-vehicle applications.
An ECX development board is used for evaluation. Table 1 shows its specification.
All of the experiments mentioned in this thesis are tested on this platform.
Processor Intel Celeron M 600MHz
Memory 256MB DDR 266MHz
OS Linux (kernel version 2.6.9)
Compiler gcc (version 3.4.2)
Table 1 • Specification of the evaluation system - Intel ECX platform
1.5. Thesis Organization
This chapter provides an overview of intelligent vehicle and vision-based vehicle
tracking system. A summary of the objectives and contributions of our work is
also given. The remaining chapters of this thesis are organized as follows: Chapter
2 briefly reviews the related work in vehicle tracking with different approaches.
Chapter 3 presents the framework of 3-phase vision-based vehicle tracking
including the method of vehicle detection method, the procedure of the vehicle
template tracking using inverse compositional image alignment and the principle
of vehicle template update. Then, the technique of finding the relative distance Chapter 1. Introduction 12
from the target vehicle is introduced in Chapter 4. Chapter 5 and 6 present how
the vehicle detection and vehicle tracking can be optimized to run in real-time.
Finally, Chapter 7 summarizes the research performed, and describes some
challenges encountered and possible directions for future research. Chapter 2. Related Work �3
2. Related Work
Vision-based vehicle tracking is often classified into 3 categories: stereo-based,
motion-based and knowledge-based [2]. In this chapter, a brief description and
some sample tracking results will be given for all categories. Most of the previous
works are different from our 3-phase structure since they simply detect vehicles in
each frame of image sequence. Although some tracking results are for front-end
tracking, the methodologies still can be applied for rear-end tracking.
2.1. Stereo-based Vehicle Tracking
Stereo vision is widely used in 3D reconstruction application. By computing the
difference in the left and right images between corresponding pixels, 3D world
information can be gained. The difference is called disparity and the disparities of
all the image points form the so-called disparity-map. If the parameters of the
stereo rig are known, the disparity map can be converted into a 3D map of the
viewed scene.
I stereo camera mounted on the test vehide |
Figure 5mm. Testing vehicle used in the project "Machine Vision Based Vehicl e Guidance’, Chapter 2. Related Work �14
The project "Machine Vision Based Vehicle Guidance” [7] is one of the projects
thai use the stereo vision technique for vehicle tracking. Figure 5 shows the testing
vehicle mounted with a stereo camera rig [7]. The correspondences of image
features in the left and right view are first resolved by computing the stereo
disparity. Then obstacles are mapped to points of non-zero disparity making Ihem
detectable. See Figures 6 where (a) and (b) are a stereo image pair, and (c) and (d)
show points of interest on the corresponding images. Significant disparities
correspond to the obstacles.
I m
(a) (b)
,,_ �V. l " “� ,._
/ . ';^' “ ?> . 't �:. ‘ s -
(C) (d)
Figure 6. Stereo image pair and its corresponding disparity maps
From the residual disparity in an image location we can obtain the 3D location of
the points of interest. Residual disparities, which appear in the image after the
ground plane disparity has been mapped to zero, indicate objects which appear
above the ground plane. A simple threshold is used to distinguish between features Chapter 2. Related Work �15
lying on the ground plane and features due to objects lying above the ground plane.
Figure 7 shows the result on a single frame. Objects detected as above the ground
plane are colored red and those on the ground plane are colored green. Black
regions indicate regions for which the disparity could not be accurately recovered.
im (a) (b) Figure 7. Objects above and on the road surface detected by the disparity map
Finally, neighboring points with similar disparity values are connected together to
form the objects using the clustering technique. These connected components
form the basis of potential vehicles which are to be tracked with lime. If the same
object appears in two consecutive frames, the object is determined as a vehicle.
Figure 8 shows the objects found by the proposed method in this project. The
result shows that the vehicle can be tracked approximately. However, the vehicle
roof and bottom cannot be located and tracked precisely. This can lead to
inaccurate supervision of the traffic on the road and distance measurements. The
most important drawback is that computing the disparity map is very consuming.
The synchronization and heavy I/O of two cameras also makes this approach
unsuitable for embedded systems. Chapter 2. Related Work �16
Figure 8. Vehicles detected and tracked by the stereo-based vehicle tracking method
2.2. Motion-based Vehicle Tracking
In stereo-based vehicle tracking as well as the knowledge-based method described
later, spatial features are used to distinguish between vehicles and their
background. An important cue thai can also be used is the relative motion obtained
via the calculation of optical flow. Approaching vehicles from opposite directions
produce a diverging flow, which can be quantitatively distinguished from the flow
caused by the car ego-motion [8]. On the other hand, departing or overtaking
vehicles produce a converging flow.
The “ASSET: A Scene Segmenter Establishing Tracking’,project is a
representative work employing the motion-based method for vehicle tracking [9].
ASSET starts by tracking 2D image features as they move across the picture over
time, to get a sparse image flow field. The flow field is then segmented into
clusters which have internally consistent flow variation and which have different
flow to each other and to the background. The green shapes in Figure 9 are
examples of initial segmentation. The cluster shapes are filtered over time to give Chapter 2. Related Work �17
improved accuracy and robustness.
Figure 9. Flow field and object segmentation computed by ASSET
(a) Frame 1
(b) Frame 3 晒 (c) Frame 9
Figure 10. Vehicle tracking result of ASSET
From Figure 10, the objects tracked by ASSET seem to be unstable. The optical
flow computation is very sensitive to noise. Given the difficulties faced by
moving camera scenario, getting a reliable dense optical flow is not an easy task. Chapter 2. Related Work �18
Besides, the vehicle roof and bottom cannot be located and tracked precisely as in
the stereo-based vehicle project discussed before.
2.3. Knowledge-based Vehicle Tracking
Knowledge-based methods employ a-priori knowledge to hypothesize vehicle
locations in an image. The vehicle information includes symmetry, color, shadow,
comers, horizontal and vertical edges, texture, and vehicle lights.
The project "Real-Time Multiple Vehicle Detection and Tracking from a Moving
Vehicle" [10] is one of the projects applying knowledge-based method for vehicle
tracking. From the image sequence captured by the camera input, the approximate
positions of vehicles in the images are first located by the temporal difference.
The vehicle tracking regions are then further refined by locating the vehicle lights
and comers. Figure 11 shows the sample tracking result of this project.
The advantage of knowledge-based vehicle tracking is that the vehicle features,
uniquely distinguished from the background objects, can often be identified easily.
Nevertheless, searching a particular vehicle feature from the entire image is often
inefficient. Chapter 2. Related Work �19
幽
(a) Frame 7 (b) Frame 25
:。:.:,:,:—Hej™™™™™
(c) Frame 173 (d) Frame 202
Figure 11. Tracking result of "Real-Time Multiple Vehicle Detection and Tracking from a Moving Vehicle"
2.4. Commercial Systems
Mobileye® [11] has developed an advance warning system on a family of SoC
(System-on-Chip) called EyeQ. The EyeQ supports vehicle detection, lane
detection and pedestrian recognition.
The core of the vehicle detection process lies in the classification. The system
scans the whole image captured by the camera mounted on driver's vehicle for
rectangular shaped regions at all positions and all sizes. Each region of interest is
given a score that represents the likelihood of thai region to be a vehicle. EyeQ
uses several classification schemes throughout the system and they can be rather
degenerate such as the nearest neighbor approach and integration via a cascaded Chapter 2. Related Work �20
classifier such as the hierarchical SVM approach [12]. Figure 12 shows the
sample tracking result of EyeQ.
lobi1Eye. version Nov?9-200?
•
(a)
^^ilEye. version MovZ9~Zp9||||H
(b) MobilEye. version NovZ9-2002
(c)
Figure 12. Tracking result of EyeQ with distance information
Mobileye® has not disclosed their approach used for vehicle tracking in details.
From Figure 12, we can see that the EyeQ SoC achieves a precise vehicle tracking Chapter 2. Related Work �21
result with distance measurement between the camera and targeted vehicles.
However, there are 11 computing processors working simultaneously at 332 MHz.
It seems that the vehicle tracking algorithm running on EyeQ is extremely
computational intensive.
In this chapter, we have summarized several vision-based vehicle tracking projects
with different approaches. Most of the projects can track multiple vehicles
approximately but not precisely. Such results are not helpful for making accurate
camera-to-distance estimations. Some of the projects are even loo computationally
intensive to be run practically on embedded systems. The interest of a relatively
simple, practical and inexpensive system for real-lime multiple vehicles tracking
becomes necessary. Chapter 3. 3-phase Vision-based Vehicle Tracking Framework 22
3. 3-phase Vision-based Vehicle Tracking Framework
3.1. Introduction to the 3-phase Framework
The 3-phase framework is used in our methodology instead of simply detecting
vehicles in each frame, or a 1 -phase vehicle detection framework, because
traditional vehicle detection techniques for tracking treat an image region simply
as moving "stuff and hence cannot distinguish between changes in viewpoint or
configuration of the vehicles and changes in position relative to the camera. The
information of change in viewpoint and position is valuable in the adaptive cruise
control systems since it gives the clue to the relative distance or the contact time
between a rear vehicle and the camera on the driver's car and thus appropriate
driving advice or advance warning. The 3-phase framework vehicle tracking aims
to determine the image configuration of a target region of a vehicle (The template
segmented in phase 1) as it moves through a camera's field of view (Tracking the
template in phase 2). Vehicle tracking arises in stereopsis and motion estimation
of vehicles. It differs, however, in thai the goal is not to determine the exact
correspondence for every vehicle location in a pair of images, but rather to
determine, in a global sense, the movement of an entire vehicle template over a
long sequence of images.
The visualization of the 3-phase framework is shown in Figure 13. In the first
phase, the vehicle is detected by symmetrical measurement and the over-complete
Haar transform in the image. The segmented region, called vehicle template, is
passed to the vehicle template tracking phase. In the second phase, the vehicle
template is used for object tracking applying image alignment which consists of Chapter 3. 3-phase Vision-based Vehicle Tracking Framework 23
moving and deforming a template to minimize the difference between the
template and an image. The images of vehicles have the potential variation in the
pose and illumination which make the tracking difficult. This results in the
necessity of template update in phase 3 where the template is repositioned for best
tracking. We will show how the proposed framework increases the successful
tracking percentage.
BffllBWffRfflBl • » /"> ;/
— L..i 一
"••••••..� V, 1 國
m-immmmmmmmmmm •濕
Figure 13. Visualization of the 3-phase framework
3.2. Vehicle Detection
3.2.1. Overview of Vehicle Detection
Vehicle detection is the first phase of the 3-phase vehicle tracking methodology.
Our first task is to detect and locate the candidate vehicle in an image and then
pass the segmented image as a template to the vehicle template tracking phase.
Vehicle detection, also interpreted as the vehicle template segmentation, aims to
segment the vehicle part from the whole image as shown in Figure 14. The
locating procedure is also named as Hypothesis Generation (HG) [2] where the Chapter 3. 3-phase Vision-based Vehicle Tracking Framework 24
locations of potential vehicles in an image are hypothesized.
mm
.•, 1 " ‘
Figure 14. Segmentation of the vehicle part (The red rectangle) from the whole image (The image is taken by the camera mounted on the driver's automobile)
A review of on-road vehicle detection using optical sensors is given by Sun, Bebis
and Miller [2] where Hypothesis Generation is classified into one of the following
three categories: (1) stereo-vision based, (2) motion-based, and (3)
knowledge-based. Stereo-vision based method uses stereo information provided
by the two cameras mounted on driver's automobile. By mapping the features
from the image taken by the right camera to the image taken by the left camera,
the contours of vehicles can be found [13]. In general, stereo-vision based
methods are accurate if the stereo parameters have been estimated accurately,
which is really hard to guarantee in the on-road scenario. Optical flow calculation
of vehicles is the representative method in motion-based category. Approaching
vehicles at an opposite direction produce a diverging flow, which can be
quantitatively distinguished from the flow caused by the car ego-motion [8]. Three
factors causing poor performance of the optical flow approach were summarized
in [8]: (a) displacement between consecutive frames, (b) lack of textures, and (c)
shocks and vibrations. Knowledge-based methods employ a-priori knowledge to Chapter 3. 3-phase Vision-based Vehicle Tracking Framework 25
vehicles like color, shadow, comers, horizontal/vertical edges, texture, and vehicle
lights. Among which locating the roof of vehicles by horizontal edge detection
[10,14-20] and locating the bottom using shadow information as a sign pattern of
vehicles [14,21,22] are popular techniques since these features, which are quite
unique from background objects, act as the horizontal boundaries for the
segmentation of vehicles. However, it is not practical to search the roof and
bottom of vehicles from the entire image. In this thesis, we propose using
symmetrical measurement to reduce the searching area of roof, bottom and sides
of vehicles.
3.2.2. Locating the Vehicle Center - Symmetrical Measurement
By minimizing the searching area of image, the compulation time of finding the
interesting objects could be decreased. In our proposal, the vehicle is first located
by applying a symmetrical function to the image. To find a reliable measure of the
degree of symmetry, we begin with an elementary theory of one dimensional
function [23], namely, a function /(x) can be divided in two parts: a purely
symmetric function/^(x) and a purely asymmetric fundi on/as(jc). that is
/Y� ( \ r ( \ \ with - JAx)
Moreover, we have the following properties:
f w = /M-/(--). • 'as 2 Chapter 3. 3-phase Vision-based Vehicle Tracking Framework 26
Now, a measure of the degree of symmetry is introduced. It seems logical
to compare the contribution of the symmetrical part of the function with the entire
contribution of the function, as follows:
U'si4 (2� _一 y-r V~ r ( V r ( V ()
With this definition, has the following properties : S{f{x)}e[0,\] 5'{/(x)} = 1 if /(x)is purely symmetric ^{/(x)} = 0 if /(x)is purely antisymmetric
In order to make relationship between the symmetrical measurement and the
intensity value of pixels, we now introduce the reflectional correlation coefficient:
C{/(x)}.
./(X) /(-x) dx C{f{x)h j/W dx
^[fMlIM \fsi4- fasiA^
where f(x) is the intensity value of pixels in grayscale at lateral position x of
image. Once we have calculated C, we can calculate S, as S = .
As the camera and target vehicles move, the illumination variation may cause the
target vehicles to appear unlike a symmetrical object in grayscale intensity. The
centre of the taxi (The position of the dash line) and the other position (the
position of the thick line) in the background have high symmetrical measurement
in the image simultaneously in Figure 15. To reduce this kind of influence, Chapter 3. 3-phase Vision-based Vehicle Tracking Framework 27
gradient images are input for symmetrical measurement. In order to highlight the
measurements surrounding the vehicles, the symmetrical measurement is further
enhanced using horizontal edges only as input for vehicle detection. The result is
shown in Figure 16 in which the center of the vehicle is marked by a dotted line.
I
一 i
;經:l|vf 1 .驗
-
JHHHIHI ‘u I 十 ~ 3UU JjiJ •b,“. b'.li.i
Lateral position of image
Figure 15. Symmetrical measurement using gradient image (a) with horizontal and vertical edges and its corresponding symmetrical measurement. The centre of the taxi (The position of the dash line) and the other position (The position of the thick line) in the background have high symmetrical measurement in the image simultaneously Chapter 3. 3-phase Vision-based Vehicle Tracking Framework 28
I I I I
0.0
U . 4 W^^^EBjj^K^B^^^E^^^^^^S^^^^K^H^m 0.3 0.2
% 1 100 200 300 4(jo 500 ^ I Lateral position of image
Figure 16. Symmetrical measurement using gradient image with horizontal edges as input. The center of the vehicle is marked by the dotted line
3.2.3. Locating the Vehicle Roof and Bottom
The region of interesting vehicles is approximately located after the symmetrical
measurement. The vehicle roof and the vehicle bottom can be located in a
relatively small searching window instead of searching in the whole image which
is shown in Figure 17. We have set the searching window width to about 1/10 of
the image width. Chapter 3. 3-phase Vision-based Vehicle Tracking Framework 29
Hfeii •L
liiM
Original Inagei j ^^^ Symmetri^l line |
Searching
e^^HHH:
% lUU 2UU 3UU Jlju � aUU BLiU 7ULI I Lateral position olMniage
Figure 17. Searching window with the symmetrical line of the vehicle as the cenlerline
In the searching window, the vehicle roof is located as the first horizontal position
when searching from the top to the bottom with high horizontal edge response. On
the other hand, the vehicle bottom is located as the first horizontal position when
searching from the bottom to the top with low horizontal pixel intensity response.
The idea is illustrated in Figure 18. The roof and bottom features are simply
located since they are quite unique from the background within the searching
window. Chapter 3. 3-phase Vision-based Vehicle Tracking Framework 30
first horizontal edge: mmThe location of car moP ^BIIHb ••••••Searching window •
(a)
mm
, ‘ Tlie first dark horizontal pixel line: ^^1*^� = . I The location of car boHom
(b)
Figure 18. Locating the (a) car roof by horizontal edge response and (b) car bottom by horizontal pixel intensity response within the searching window
3.2.4. Locating the Vehicle Sides - Over-complete Haar Transform
The outcome of vehicle segmentation greatly affects the quality of template used
in vehicle tracking that will be mentioned in the next part. If the segmentation is
not done well, the vehicle template may include some background pixels. This Chapter 3. 3-phase Vision-based Vehicle Tracking Framework 31
makes the tracking is not done on the exact vehicle objects. Eventually, the
tracking of vehicles may be loss. Figure 19 shows the comparison of the good
vehicle template and bad vehicle template.
I • -- ! ^ 务: PI
The whole vehicle is segmented well in the image The vehicle template includes some background (Good vehicle template) (Bad vehicle template) Figure ] 9. The good vehicle template versus the bad template
In the previous parts, we have shown how the roof and the bottom of vehicles can
be found. The remaining task, which is much more difficult, is to locate the sides
of vehicles. One of the common techniques is finding the most obvious vertical
edges by searching from the center of the vehicle to the sides of the image. The
technique has three main weaknesses which make it difficult for practical usage.
First, it is hard to define the degree of obviousness of vertical edges. Also, the
sides of vehicles usually are not strictly vertical in two dimensional images. The
existence of vertical edges in the road background makes the sides of vehicles
difficult to be located.
The moving and variant-illumination conditions make the ordinary pixel
difference method unsuitable to represent the sides of vehicles clearly. In Figure
20, it should be easy to see that the pixel-based representation have a significant
amount of variability that may lead to difficulties in segmenting the vehicles from
the images. As a result, a compact representation of the vehicles in the images Chapter 3. 3-phase Vision-based Vehicle Tracking Framework 32
using Haar wavelet transform proposed by C. P. Papageorgiou [24] for vehicle
learning is used.
I —I —laBli 趣 IWMT ••I•國• Figure 20. An example of vehicle images on road environment
Haar wavelet was proposed by Alfred Haar [25] and nowadays, it is widely used
in image compression by the advantages of the simple image scaling and the
multi-orientation representation [26]. The SVM (Support Vector Machine) based
car image classifier proposed by C. P. Papageorgiou takes these advantages where
multi-scaling images transformed by three oriented wavelets — vertical, horizontal,
and diagonal are used to train the classifier as shown in Figure 21. To provide a
compact representation, Papageorgiou uses an over-complete dictionary of Haar
wavelets in which there is a large set of features that respond to local intensity
differences.
In our approach, we have not taken the advantage of multi-scaling property of
Haar wavelet transform. Based on the work done by Papageorgiou, the wavelet
transform is not computed for the whole image because of the complex
computation and only the over-complete vertical Haar wavelet transform is
applied to the estimated location of the vehicles' sides. There is no training
process in our methodology and our purpose is to locate the sides of vehicle from
the compact representation of vehicle images. We present an overview of the Chapter 3. 3-phase Vision-based Vehicle Tracking Framework 33
vertical Haar wavelet transform representation here. Details can be found in
[27-28].
BS
Original vehicle image
_ I 基 f I if m ft
(a) (b) (c) (d) (e) . (f)
Figure 21. Multi-scaling Haar transformed car image. Positions with stronger wavelet feature response are darker, those weaker response are lighter, (a)-(c) are vertical, horizontal, and diagonal wavelets at scale 32x32 pixels, (d)-(f) are the vertical, horizontal, and diagonal wavelets at scale 16x16 pixels
For a given pattern, the wavelet transform computes the responses of the wavelet
filters over the image. By the clue of the symmetrical line, roof and bottom of
vehicles, the location of the sides of vehicles can be estimated by the reasonable
height-to-width ratio of vehicles as shown in Figure 22
In our implementation, only vertical orientation, which gives high response to the
sides of vehicles, is computed. In the traditional wavelet tTansform, the wavelets
do not overlap; they are shifted by the size of the support of the wavelet inx and y.
The standard Haar transform framework is shown in Figure 23. The resized image
around the sides of the vehicle is divided into blocks and then passed to the Haar
wavelet for transformation. The normalized vertical coefficient table, in which the
higher coefficient value represents the existence of much vertical feature, shows
the normalized vertical response of the intensity around the sides of the vehicle.
The lateral position with the highest coefficient value is the location of vehicles' Chapter 3. 3-phase Vision-based Vehicle Tracking Framework 34
sides. In Figure 23, the coefficient data is too rough to figure out the location of
vehicles' sides.
i _ I ^^^^ - I Symmetrical line of the vehicle
BH \Jmm
^^mm-4^1 f…-耀 ri^^^�
Estimated location of / \ \ Estimated location of the left side / | \ the right side
mm 謝
Resized image to 16x16 in order / \ Resized image to 16x16 in order to average the illumination / \ to average the illumination
rr3 ^ -m
^HHH HH^
Figure 22. Steps of estimating the location of the vehicles, sides Chapter 3. 3-phase Vision-based Vehicle Tracking Framework 35
standard vertical Haar wavelet transfor� m
The wavelet do not overlap Haar wavelet 1 I 1
, ,—_ _I I ^ I X 1 + X -1 j Averaging the intensity “ * � i \ I 1 I / \ .'-'Averaging the intensity ^ \ \ .z 1 \ ••• Z 11——I \ Z I II I / I I I I \ Normalized vertical o-f oT 0 [T • • V X1 + x-1 j . coefficients_
gjg ^^ � �� �� @02)6 0.42
“ 0 33 0 05 0.26 iilnnM £ • k'^ _ : 1门 / '•*•��-• ��’• / m^mmmmm , , \ —J �…“ (• |xi +1 x-1 j /
-1
Figure 23. The standard Haar Transform Framework
To achieve better spatial resolution and a richer set of features, Oren and
Papageorgiou suggest shifting the transform by 1/4 of the size of the support of
each wavelet [29], yielding an over-complete (quadruple density) dictionary of
wavelet features. The framework of the over-complete Haar transform is shown in
Figure 24. The vertical coefficients are more detailed and compact. We propose
using the resulting high dimensional vertical feature table to locate the sides of
vehicles. By summarizing and averaging the coefficients of each column, the
vertical coefficient at each lateral position is shown in Figure 25. The lateral
position with the highest vertical coefficient is the location of the vehicle's side.
With the same size of vehicle sides image (16x16 pixels), the vertical coefficient
table computed by the over-complete Haar transform provides 4-time resolution
higher than that computed by the standard Haar transform along the lateral
direction. The high-resolution vertical coefficients of the over-complete Haar Chapter 3. 3-phase Vision-based Vehicle Tracking Framework 36
transform along the lateral direction make the vehicle sides being located
precisely.
Now, the vehicle is segmented from the image as the template for vehicle tracking.
The height-to-width ratio of vehicles computed is useful for template update in the
future. By the above techniques, the vehicle is detected in the image and the
segmented region, called vehicle template, is passed to the vehicle template
tracking phase.
Haar wavelet 1
,,-----• -4 I— -----• ( XI + X-1 I Averaging the mlensny .1 墨 • 0 1 \ • • /: ,••::::.•••••Avoragingth e intensity -1 /7 -攀攀 1 :
• • ,厂-rn 丨(厂门、
…+ -0 �-/ V Lj XI+U x-11
; : .1 / / :
: 1门 / / :
—一 — • I ~� ^^Tm ^ ••”• ~ 0 ~ 1 /•“-"•/ / {\ L^ i X1 + I—I x-1 /J. ‘•• 丨丨• • « ! • H/ ; / •• /V !/^ // Normalized vertica•• l coefficients y * i 0.22 0.04 0.00 0j04 0.30 0.39 0.39 0 52 0.43 0.43
^'3^0.43 0.26 0.04 0.04 ,6.09 0.13 0.26 017 0.03 0.39 0.03 � / 0.48 0 52 0.30 0.09 0.^ 0.35 0.17 0.13 0.26 0.30 0.30 0.09
0.43 0.35 0.13 0.22 (^43 0.48 0.09 0,17 0.30 0.43 0.17 0.04 Over-complete vertical /
Maar wavelet transform 0 22 0.13 0.09 0.1 y 0.30 0.30 0.00 0.22 0.22 0.39 0.00 0.09
“ 0.04 0.17 0.09 oie 0.22 0.04 0.04 0.00 0,04 0 00 0.22 0.04 rv^ ••• 0.09 0.30 0.13 /).13 0.04 0.26 0.09 0 30 0.36 0,48 0.30 0.09 I 1 0.00 0.26 0.Q6 0.09 0.04 0.22 0.48 0.65 0.65 074 0.30 0.09 The transform shifted by % of the /
size of the support of each wavelet 0.04 0.22 y.17 0.17 0.17 0.04 0.74 0.87 0.87 1.00 0.30 0.13
1 0.00 0.0a''0.09 0.13 0.22 0.30 0.87 0.96 0.83 0.83 0.26 0.17
0.09 0/3 0.13 0.17 0.30 0.39 0.70 0.70 0.57 0.52 0.17 0.17
0.09^.13 0.17 0.17 0.30 0.35 0.43 0.39 0.35 0.26 0.13 0.17
.0^0.17 0.17 0.17 0.22 0.22 0.22 0.17 0.09 0.09 0.09 0.13
Figure 24. The over-complete Haar Transform Framework Chapter 3. 3-phase Vision-based Vehicle Tracking Framework 37
I Vehicle's side found by I Vofod Coeffidon I over-complete Haar transform I 0.5「 I
: /:i\ :-^/v 7 I \ liiiHiiibii^diteii^ Mnif I \
0.1 J^^^^HHBl QQ£j _ ,!,JHPM_P‘||-.••.丨!_,.丨丨'11 I “ I
0 I I I I I 1 I I I 1 1 j
1 2 3 4 5 6 7 8 9 |lO 11 12
Lateral position in the njsizad image I
Figure 25. Vertical coefficient of the vehicles" side image
3.3. Vehicle Template Tracking - Image Alignment
3.3.5. Overview of Vehicle Template Tracking
Our ultimate goal is to track the vehicles in a continuous sequence of images with
distance information provided. After the vehicles are located, and segmented in
the image as described in the previous vehicle detection stage, the segmented area
is used as the template for vehicle tracking. In this section, we propose to use
image alignment as the technique of vehicle tracking.
Image alignment consists of moving, and possibly deforming, a template to
minimize the difference between the template and an image. Since the first use of
image alignment in the Lucas-Kanade optical flow algorithm [4], image alignment Chapter 3. 3-phase Vision-based Vehicle Tracking Framework 38
has become one of the most widely used techniques in computer vision. The
applications of image alignment include tracking [5, 30], parametric and layered
motion estimation, mosaic construction and medical image registration. The
unifying framework is proposed and various algorithms are described in the
unifying framework of Lucas-Kanade algorithm [31].
Theoretical representation Transformation example in vehicles
y
ff…夢 1 1 (a) Translation - /SSm^ �/r—,/ I 1
Mnwi
ol ol y
厂7 /?…, (b) Scaling ^ -
^ ol y y
厂q 厂]一巧 (c) Shearing / /
-o| M ''''
(d) Rotation / / ^^^ /
o| —o|
Figure 26. The four basic transformations in affine transformation: (a) Translation, (b) Scaling, (c) Shearing and (d) Rotation. The red polygons represent the original polygons while the blue polygons represent the transformed polygons. The sample transformations in vehicle tracking are shown in the second row. Chapter 3. 3-phase Vision-based Vehicle Tracking Framework 39
The moving and deforming process is essentially the affme transformation
consisting of translation, scaling, shearing, rotation or combinations of them.
Figure 26 shows how a polygon transforms in a 2D space. Let W(x; p) denote the
parameterized set of allowed warps, where p = (po, ... pn-i)^ is a vector of
parameters. The warp W(x; p) takes the pixel x in the coordinate frame of the
template T and maps it to the sub-pixel location W(x; p) in the coordinate frame
of the image I. If we are tracking an image patch, for example, the vehicle image
patch,we may consider employing the set of affme warps:
M" � � A)•义+ + (Po Pi PA fV(x;p)= = y (3) V
described by 6 parameters p = (po,pi, pi, P3, P4,Ps)^ as what has been done in [32].
These 6 parameters are enough to describe the 4 basic tTansformations and their
combinations in affme transformation of the image patch in 2D space. Image
alignment aims to compute the updated parameters for transformation in order to
minimize the differences among the templates (T) in consecutive images (I).
Before explaining the details of image alignment, we should first describe how we
apply image alignment in vehicle tracking. In a sequence of images, image
alignment is to minimize the difference between the template obtained in image n
and the images n+1, n+2, n+3 Using vehicle tracking as an example, the size
and location of the vehicles in the image are changing from image to image. Once
a vehicle template is detected in image n, the template needs to be moved or
deformed in order to minimize the difference between it and its correspondence in
the subsequent images. Figure 27 shows how image alignment is done in vehicle
tracking. The aligned position and the aligned image region are usually called Chapter 3. 3-phase Vision-based Vehicle Tracking Framework 40
warped position and warped image respectively.
蓋^^ I
^^灣 j^�
A vehicle is detected in image n (The template is marked with red rectangle) The template got in image n
I Warped image: Resulting aligned image region K computed by the current frame � A 欢arpec/ position: The newly position [3
Kfflil^ i “ . 5〈忍
Warped position computed W / / T ' ' T T A A in the previous frame
The template in image n (marked with red rectangle) is moved and deformed to match with the vehicle in image n+1 (marked with blue rectangle) Figure 27. Image warping in vehicle tracking
We will follow the unifying framework of image alignment proposed and
described in the Lucas-Kanade algorithm [31 ] and apply it for vehicle template
tracking. Chapter 3. 3-phase Vision-based Vehicle Tracking Framework 41
3.3.6. Goal of Image Alignment
The goal of the image alignment is to minimize the sum of square error between
two images, the template T and the image I warped back onto the coordinate
frame of the template and thus achieve the template tracking. The first version of
image alignment is the Lucas-Kanade algorithm where the error is represented by:
Y,[imx;p))-T{x)Y �
.V
where p are the affine transformation parameters.
Warping I back to compute /(W(x;p)) requires interpolating the image I at the
sub-pixel locations W(x; p). The minimization of the expression in the above
equation is performed with respect to p and the sum is performed over all of the
pixels X in the template image T{\). The Lucas-Kanade algorithm assumes that a
current estimate of p is known and then iteratively solves for increments to the
parameters Ap; i.e. the following expression is (approximately) minimized:
Y\I{W{x•p + ^p))-T{x)\ (5) v
with respect to Ap, and then the parameters are updated:
p�p+A p (6)
These two steps are iterated until the estimates of the parameters p converge or set
the target number of iterations.
Subsequently, the image tracking process can be described as:
1) Aligning the warped position and updating the parameters of the warped
position; performing the first order Taylor expression of the image alignment
equation (5) and minimize its value by the least-squares method. Chapter 3. 3-phase Vision-based Vehicle Tracking Framework 42
2) Minimizing the image alignment equation by iterating and updating the
warped parameters p.
The visualization of the image alignment tracking process is shown in Figure 28.
Updated warp parameters: p = (pi,p2, p^, P4, p5’ P6)丁
I I I Current warped image: l(W(x: p))"| | Template image: T StaP^^ « » JL^jl^ Affine Transformation ^^sS&iU^^L ,T'T 1?. ‘ Ifi"',, ^gftirn I iVi ‘ gssgjBwWI^ ^^^^^^^ j Lucas-Kanade minimizing equation: 辟职起殺》—.厂码“ X Figure 28. Visualization of the image alignment tracking process 3.3.7. Alternative Image Alignment 一 Compositional Image Alignment The Lucas-Kanade algorithm approximately minimizes the expression (5) with respect to Ap and then updates the estimates of the parameters by equation (6). In fact, iterating these two steps is not the only way to minimize the expression of Z j(W(x- p)) - T(x)f ^^ . The alternative to the Lucas-Kanade algorithm is the compositional algorithm [33]. The compositional algorithm approximately minimizes: Chapter 3. 3-phase Vision-based Vehicle Tracking Framework 43 with respect to Ap in each iteration and then updates the estimate of the warp as: W(x; p) i.e. the compositional approach iteratively solves for an incremental warp W(x;Ap) rather than an additive update to the parameters A p. The compositional and additive approaches are proved to be equivalent lo first order in Ap in the unifying framework of Lucas-Kanade algorithm [31]. Throughout the vehicle tracking we concentrate on the inverse compositional algorithm, an efficient algorithm which works without any significant loss of efficiency [34]. 3.3.8. Efficient Image Alignment - Inverse Compositional Algorithm As mentioned by Baker and Matthews [31], one of the key points of efficiency is switching the role of the image and the template, where an exchange of variables is made to switch or invert the roles of the template and the image [5]. To distinguish the previous algorithms from the new ones, we will refer to the original algorithms as the forwards additive (i.e. Lucas-Kanade) and iht forwards compositional algorithm. The corresponding algorithm after the inversion that we used for vehicle tracking is called inverse compositional algorithm. The proof of equivalence between the inverse compositional algorithm and the original algorithm is in the unifying framework of Lucas-Kanade algorithm [31]. The result is that the inverse compositional algorithm minimizes: ^[TiWix-,Ap))-I{W{x;p))f (9) Chapter 3. 3-phase Vision-based Vehicle Tracking Framework 44 with respect to A p (note that the roles of I and T are reversed) and then updates the warp: W(x; p) Performing a first order Taylor expansion on the expression of the inverse compositional algorithm gives: r OTT7- Z nw{xm+vT—Ap-i{w{x- p)) (11) Assuming without loss of generality that W(x; 0) is the identity warp, the solution to this least-squares problem is: 厂 I?" O 71/ = [/�(I;尸))—n^ ] (12) 卞L ^p� where H is the Hessian matrix: X L 办 J L ^p. ^ JTT and the Jacobian ——is evaluated at (x; 0). The steps involved for the image alignment of one image using the inverse compositional algorithm are shown as followings: Chapter 3. 3-phase Vision-based Vehicle Tracking Framework 45 Pre-compute: (1) Evaluate the gradient VI of the template T(x) (2) Evaluate the Jacobian —at (x; 0) (3) Compute the steepest descent images VT — dp (4) Compute the Hessian matrix H Iterate: (5) Warp/with W(x; p) to compute I(W(x; p)) (6) Compute the error image I(W(x; p)) - T(x) � -iT ” (7) Compute the steepest descent parameter V vr— ll(fF(x-p)) - T(x) - -ir (8) Compute Ap by = VT— [/(ff(x;;?))- ^L 尔」 (9) Update the warp W(x; p) until \\Ap\\ <£ OT after a constant number of iterations. Typically, it is set to 10. (The corresponding steps are shown in Figure 29) The efficiency of the inverse compositional algorithm comes from the pre-computation of template gradient, Jacobian, steepest descent images and Hessian matrix. As a result, all of the computationally demanding steps are performed once to avoid performing them in each iteration. The main algorithm simply consists of image warping (Step 5),image differencing (Step 6),image dot product (Step 7), multiplication with the inverse of the Hessian (Step 8),and the update to the warp (Step 9). The framework of the inverse compositional image alignment algorithm for vehicle template tracking is shown in Figure 29. The parameters are updated iteratively in order to align the tracking region to the new warped position with minimized difference between the template and the current image. Chapter 3. 3-phase Vision-based Vehicle Tracking Framework 46 In this section, we have shown that how vehicle tracking is achieved by image alignment technique in an image sequence. By tracking the rear vehicles, with the vehicle height, an advance warning to a driver can be given to a driver when the relative distance computed by the newly proposed formula introduced in Chapter 4 is below a certain threshold. Pre "Computed 二 jfr^iL 遍•聽 Hhm Templale T(k) ^^^^^ ^I^BjjH^ Template gradienty Ttmpl«t9gr«d*»nl x J»cob»ao • Steepest DMcem I巾ages Warp PArAm»tAnI i Parameter Updates ^^^^^^^^ ^^^^^^^^ I ii^s •“ I ^a^^H m^sji 二 . ^p^—I i^B^^BB SD Patamettti Updat»s � ^ff^fi"?^^^ r . . • ^ Erroi irn»gs ‘“—~‘广 l(W<)r. p)) - T(x) Figure 29. The framework of the inverse compositional image alignment algorithm for vehicle template tracking. 3.4. Vehicle Template Update 3.4.1. Situation of Vehicle lost Variation in vehicle pose or deformations and variation in illumination can cause the tracking algorithm fail. They can lead lo the necessity of updating the template in order to let the tracking process remain effective. Figure 30 shows the situation Chapter 3. 3-phase Vision-based Vehicle Tracking Framework 47 of tracking discontinued. I Best frt tracking I Warped tracking image _ H 舊{ 譯 % IP « “ "S » t » t » ^ \ L • “ ‘ \ L “ ‘ gtt^x^^ Template \ Template \ Template dJ! \ \ f^j) • 、.力fc :-• 畜 \ • ‘ 凡 \ fc ^mm. ^ ; Applying the previous >u ^mmrnmi^]' Applying the previous • i^TfT*! l/l f flff*^ Applying the previous * UHU&iflBB ' warping (tracking) position BS&Ji^fiSBff, fe warping (tracking) position warping (tracking) position ^^Zg^^flfc 碎ffin \ to the current HUl^HBii^y •�MBBMH ^ to the image j | Warped Image Warped ln«ge 广…; � … Tracking lost Figure 30. The situation of vehicle tracking lost Recalling the goal of image alignment, we are going to minimize the difference between the template and the warped image by the inverse compositional algorithm: ^[TiW{x;Ap))-IiW{x;p))Y X As the illumination and the size of vehicles change from image to image, the template and the warped image become unlike and make the error between them gradually increases. As a result, it is difficult to align the template to the vehicle in the current image and the error will accumulate along the tracking and eventually the warped image will totally lose the track of vehicles. The error is usually measured as the root mean square error between the template and the warped image. Once the error exceeds a threshold value, the template should be updated. Chapter 3. 3-phase Vision-based Vehicle Tracking Framework 48 3.4.2. Template Fitting by Updating the positions of Vehicle Features Updating the template for tracking is simple by "Making the new template fitting the vehicle in the current image”. The procedure is similar to vehicle detection but the searching is limited lo the image area around the currently warped image. The update is composed of updating the new positions of vehicle symmetrical line, roof and bottom. The update includes: 1) Symmetrical line: Locating the new symmetrical line with the highest symmetrical response within the current warped image. 2) Car roof: Locating the new horizontal position of the car roof by locating the longest horizontal edge within the searching window. 3) Car bottom: Locating the new horizontal position of the car bottom by locating the darkest horizontal pixel line within the searching window. The frequency of updating the tracking template will greatly affect the performance of the vehicle tracking. If the updating frequency is too high (error threshold too low), the updating process will waste the computational resource. If the updating frequency is too low (error threshold too high), the update process will not be completed well since the vehicle is out of the searching windows of the car roof and car bottom. The warped position will tend to the background image as shown in Figure 30. The framework of the template update is shown in Figure 31. In order to lower the computational loading, the new car roof and car bottom are located within a small searching window, instead of the whole image using previous knowledge. Since the height-to-width ratio is constant, without Haar wavelet transform, the sides of Chapter 3. 3-phase Vision-based Vehicle Tracking Framework 49 the vehicles can be located based on the updated positions of the symmetrical line, car roof, car bottom and the ratio found in the vehicle detection part. Tracking lost: The warped image is shifting lo the background image instead of fitting the vehicle FcuiT^t warped image | ^ •••• …/ - It^y I Synimetriualline update | „ IH^ I Car Roof Update | .. '~Car BotSm Update J ...龙 ^^XJS ^^u^irJS ^^Vs^rdtiity window ^•HB^BBIBUB^ —— i • : • d-^v \ \ .zz \ • ... , 、\ The height of the vehicle is updated j ratio to^d ] 、 J in the part of vehicle detection | \ ^ i .一-•--,一•••••一 、 The width and sides of the vehicle is updated / B updated warped image | ^ j^M I, ShiiisM Ihhi^^A^ updated template Figure 31. The framework of Vehicle Template Update 3.5. Experiments and Discussions 3.5.1. Experiment Setup We have conducted an experiment to analyze the reliability of the proposed 3-phase vehicle tracking methodology. There are 4 grayscale image sequences of the road traffic which is taken by the camera mounted on our car (testing vehicle). Chapter 3. 3-phase Vision-based Vehicle Tracking Framework 50 All of the sequences are taken in the daytime with duration from 4 seconds to 68 seconds. The camera setting is shown in Table 2. Camera model: Logitech webcam 4000 Frame rate: 15 frames / second Frame size: 320 x 240 pixels Table 2. Setting of camera mounted on the testing vehicle 3.5.2. Successful Tracking Percentage By observing the number of frame that the tracking is discontinued, the reliability of our methodology can be analyzed. Table 3 shows the information of the 4 grayscale image sequences and their successful tracking percentage. The successful tracking percentage is defined as [35]: Successful tracking percentage = Total number of frame - Number of tracking lost frame Total number of frame where "Number of tracking lost frame" is the number of frames where ground truth contains at least one vehicle, while there is at least one tracked vehicle not falling within the bounding box of any ground truth object. The result shows that the proposed 3-phase methodology is able to track the vehicles over 98% of frames. Failures occur when vehicles are running on winding roads and when the appearance of the target vehicles deforms frequently due to their unstable movement. Besides, there is wrong vehicle detection in Sequence A. One of the background is wrongly detected as a vehicle because its upper and lower boundary are wrongly regarded as the roof and bottom of vehicle respectively as shown in Figure 32. Chapter 3. 3-phase Vision-based Vehicle Tracking Framework 51 Along the image sequences captured by the camera facing behind of our car, horizontal features of background objects always move from the upper edge to the center of images. It is different from the moving behavior of vehicle roof. To avoid background objects wrongly detected as vehicles, we propose the horizontal features moving in this direction in the images should be first filtered out. To further lower the false positive rate in vehicle detection, Sun, Bebis and Miller [2] suggest adding the second step, Hypothesis Verification (HV), after the step of Hypothesis Generation (HG) in order to verify the correctness of a hypothesis. The representative methods of Hypothesis Verification include Principal Component Analysis (PCA), Local Orientation Coding (LOC) and Neural Network (NN) based classifier. Adding the verification step can also filter out the background objects with much horizontal features while the uniqueness of the moving direction of horizontal features in background loses when the camera faces ahead. Sequence A SequenceB SequenceC SequenceD I^M mM ^^taS^ ^^^fe� ’ ^HHHB r •liliiiMlr^W''.-'' ••--• ••••HKBHIH t � ., Sequence A SequenceB SequenceC SequenceD Road type City streets City streets Highway Highway Duration 49 seconds 4 seconds 68 seconds 7 seconds Total number of frame (NF) 731 frames 60 frames 1001 frames 101 frames Number of tracking lost frame (LF) 9 frames 0 frames 13 frames 0 frame Successful tracking percentage 98..8% 100.0% 98.7% 100.0% =NF-LF NF Number of wrong vehicle detection 10 0 0 Table 3. Information and tracking percentage of road image sequences Chapter 3. 3-phase Vision-based Vehicle Tracking Framework 52 In this chapter, we have achieved multiple rear vehicles tracking using a single camera in outdoor scenes with high successful tracking percentage. However, the original image alignment tracking methodology introduced in this section is computational intensive enough so that it is not possible to be implemented in a real time embedded system based on a processor used for mobile applications. To realize the real-time tracking, we have simplified the above computation by minimizing the number of image alignment parameters which will be explained in Chapter 6. ^^^W^^^^^HMIii^lBK^IM^fctfhJ^^^Bi^ Wrongly detected vehidT"] Figure 32. An example of wrongly detected vehicle 3.6. Comparing with other tracking methodologies 3.6.1. 1-phase Vision-based Vehicle Tracking As mention at the beginning of this chapter, our methodology is based on the 3-phase structure: vehicle detection, vehicle template tracking and vehicle template update. However, most of previous works mentioned like the projects mentioned in Chapter 2 is simply detecting vehicles in each frame, or based on the a 1 -phase vehicle detection structure. These techniques treat an image region simply as moving “stuff’ and hence cannot distinguish between changes in viewpoint or configuration of the vehicles and changes in position relative to the camera. Chapter 3. 3-phase Vision-based Vehicle Tracking Framework 53 111 order to exhibit the cooperation power and essentiality of the 3 phases, we have conducted an experiment to observe the performance of the 1 -phase structure. It means the last 2 phases: vehicle template tracking and vehicle template update are disabled and the vehicles are detected using the techniques in the first phase from frame to frame individually. Table 4 shows the successful tracking percentage of the 1 -phase structure comparing with the 3-phase image alignment-based methodology. The result shows that the 1 -phase structure can only track the vehicles in about 70% of frames. Failure occurs when there is sudden illumination and contrast change of the vehicle region or the targeted vehicles moving far away from the camera when the vehicle roof, bottom and appearance become blurred. Sequence A Sequence B SequenceC SequenceD l^^^jM mM .“� Sequence A Sequence B SequenceC SequenceD 3-phase 1-phase 3-phase 1-phase 3-phase 1-phase 3-phase 1-phase Road type City streets City streets Highway Highway Duration 49 seconds 4 seconds 68 seconds 7 seconds Total number of 731 frames 60 frames 1001 frames 101 frames frame (NF) Number of tracking 9 188 0 18 13 227 0 2 lost frame (LF) Successful tracking ‘ NF TF 98..8% 74.3% 100.0% 70.0% 98.7% 77.3% 100.0% 98.0% percentage: i� -r ir NF Table 4. The successful tracking percentage of 3-phase structure versus 1 -phase structure This experiment shows the robustness of the 3-phase structure in vehicle tracking. Our methodology greatly increases the successful tracking percentage because of Chapter 3. 3-phase Vision-based Vehicle Tracking Framework 54 the proper division of functionality of 3 phases. The tracking will be ignited once the vehicle is delected by phase 1 (Vehicle Detection) along the image sequence and let the phase 2 (Vehicle Template Tracking) keep tracking. Since the image alignment is robust enough, the tracking would be continuing once the vehicles are delected at any one of the frames. While there is sudden change in illuminalion or contrast making large difference between vehicle template and the current image, the 3-phase structure would not be interrupted. Instead, it will let phase 3 (Vehicle Template Update) to update the vehicle template for best fitting. 3.6.2. Image Correlation The correlation-based algorithm is extensively adopted in real tracking systems because of its high recognition ability which enables it to track objects reliably in complicated environment with low S/N [36]. Generally, tracking methods based on image correlation work in such a way: an object template is pre-stored as the basis of recognition and position, then it is matched with each sub-image of real-time images until we find out the one that best matches the template. Normalized correlation function is one of the common matching criterions [36]: m ./•) = , M 广’ M N J(z2:k,>,")f)(iz[nm,")]2) I ni=l n=l m=l n=l where T denotes the template image (whose size is M x N) and S denotes the sub-image of scene images. The best matches can be found as global maximums ofR(i,j). Chapter 3. 3-phase Vision-based Vehicle Tracking Framework 55 It's quite difficult to implement the correlation-based tracking in real-time because of its large data and computational cost for correlation matching, which degrades the practicability greatly. The computational cost is mainly dependent on the search strategy that is adopted to find the optimal matching position. Unfortunately most of current correlation-based algorithms employ some traversal strategy, which makes il difficult to reduce the computation substantially. Regardless of the computational complexity, the tracking robustness and adaptability of correlation looks doubtful because the matching between the template and the current image is in fixed scale. It means the size of the template keeps constant throughout the tracking process, unlike other tracking methodologies such as image alignment where the template is warped. This greatly restricted the usability of correlation-based tracking in applications with moving camera and target objects since the size of target objects is varied along the image sequence. From the view of affme transformation, correlation-based tracking can only provide basic transformation in translation but not in scaling, shearing and rotation. For the applications like vehicle tracking, where target vehicles are tracked by the camera mounted on driver's car, correlation-based tracking seems not to be a good solution. There are previous works that use adaptive correlation tracking of targets with changing scale [37] where templates at different scales are provided for matching. The adaptive matching selects the best matched template so that the template scale is varied along the image sequence. We have implemented this method in OpenCV [38] for the vehicle tracking. Table 5 shows the experiment information of the adaptive correlation vehicle tracking in OpenCV. Chapter 3. 3-phase Vision-based Vehicle Tracking Framework 56 OpenCV matching function cvMatchTemplatc Parameters CV TM CCORR NORMED (Normalized correlation) Image sequence Sequence A in Table 3 (Totally 731 frames) W/ Template image i 他‘ , gSBBiikSSSSS' mmmmf Number of template in difference size 5 Table 5: Experiment information of the adaptive correlation vehicle tracking The tracking result of the adaptive correlation-based matching is shown in Figure 33. The tracking result at the beginning of the sequence is acceptable but becomes unstable al Frame 120. The tracking even lost after 4 frames at Frame 124. The poor performance demonstrates that this approach is not suitable for vehicle tracking. There are 2 reasons for the poor performance of adaptive correlation. Comparing with image alignment, the number of template in different size is finite in adaptive correlation approach. For example, there are 5 templates in different size in our experiment. One may argue that we can increase the number of template to improve the adaptability. However, more templates means more time is consumed in matching since different templates are matched with the sub-image one by one. Usually, only a few of templates is used. The second reason is the broad ranges of vehicle illumination and contrast varies the difference between the template and the images as the target vehicle moving along the highway or city streets. Figure 34 shows the vehicle template and the sub-image just before the tracking lost occurred at Frame 123. The visual difference is quite large between these 2 images which cause the global maximum Chapter 3. 3-phase Vision-based Vehicle Tracking Framework 57 of normalized correlation function not falling into the vehicle location. Ik ht jM Frame 20 Frame 40 ^ibSfiS ^^ mSmSSf )'’ Frame 60 Frame 80 mmmKKBm SSHH Frame 100 Frame 120 “ WEHH ^M ifc 邏 ^SSKSKSS^M li^HBHUS Frame 121 Frame 122 Frame 123 Frame 124 Figure 33: Vehicle tracking using adaptive correlation-based matching Chapter 3. 3-phase Vision-based Vehicle Tracking Framework 58 霍I“一一 ^^ ^f iflf f “ i 1 Vehicle template Sub-image at Frame 123 Figure 34: Visual difference between the vehicle template and the sub-image 3.6.3. Continuously Adaptive Mean Shift The Continuously Adaptive Mean Shift (CAMShift) is proposed by Bradski [39] originally used for face tracking. It is a modification of the Meanshift algorithm which is a robust statistical method of finding the mode (lop) of a probability distribution. Typically the probability distribution is derived from color via a 1-D Hue histogram sampled from the object in an HSV color space version of the image. Details of CAMShift are given in [39]. There is an implementation of CAMShift in OpenCV. We follow the work done by T. Chateau [40] who has conducted an experiment of using this CAMShift implementation for vehicle tracking. Since CAMShift expects tracking colored objects,colored version of image Sequence A in Table 3 is used for experiment. The initial vehicle tracking region of interest is selected by hand. Figure 35 shows the tracking result. Chapter 3. 3-phase Vision-based Vehicle Tracking Framework 59 Frame 1 Frame 2 Frame 3 Frame 4 Frame 5 Frame 6 Figure 35. Vehicle tracking using CAMShift Comparing with the correlation-based tracking algorithm, CAMShift tracking is capable of the translation, scaling, shearing and rotation of target objects. However, the tracking performance of CAMShift is even worse than that of adaptive correlation. The tracking begins to lose at the first frame of tracking. It is because CAMShift is originally designed for face and hand tracking, or other target objects with large hue difference from the background. In the implementation in OpenCV, a single channel (hue) is considered in the color model. This heuristic is based on the assumption that flesh color has the same value of hue. Chapter 3. 3-phase Vision-based Vehicle Tracking Framework 60 Difficulty may arise when one wishes to use CAMShift to track objects where the assumption of single hue cannot be made such as vehicles. In particular, the algorithm may fail to track multi-hued objects or objects where hue alone cannot allow the object to be distinguished from the background and other objects. Besides, the computational complexity of the implementation of the CAMShift algorithm does not make it suitable for real-time processing. Because the number of iterations necessary for convergence is not known beforehand, the average time needed for processing is quite long. In this section, we have applied 3 different methodologies for vehicle tracking. The results show that il is difficult for these methodologies to compete with the proposed 3-phase structure because of the low adaptability of the variation of the illumination, contrast and size of vehicles. By the cooperation of the 3 phases, the vehicles can be tracked continuously in outdoor scenes with high reliability. Chapter 4. Camera-to-vehicle Distance Measurement by Single Camera 61 4. Camera-to-vehicle Distance Measurement by Single Camera 4.1. The Principle of Law of Perspective The continuous distance information is provided when tracking the vehicles as well as the changes of heights in the images. In this chapter, we explain how the relative distance from the target vehicle to the camera can be found using the vehicle height by a simple formula. ..A B C '、約 ^ IQ l� j � ‘ qt^^^———^ ‘* • Z2 Figure 36. Schematic diagram of the imaging geometry of inter-vehicles The newly proposed formula is based on the formula proposed by Mobileye® [6] using the law of perspective with the height of vehicles. The diagram in Figure 36 shows a schematic pinhole camera comprised of a pinhole (P) and an imaging plane (I) placed at a focal distance (f) from the pinhole. The camera is mounted on vehicle (A) at a height (H). The rear of vehicle (B) is at a distance (Zl) from the camera. The point of contact between the vehicle and the road projects onto the image plane at a position (yl). Chapter 4. Camera-to-vehicle Distance Measurement by Single Camera 62 The Mobileye® formula is derived from the similarity of triangles: f: the focal length of camera in pixel jM H: the height of the camera from the ground Z = y: the height of vehicles in pixel (13) y Z: the relative distance from the target vehicle to the camera 4.2. Distance Measurement by Single Camera The Mobileye® formula has the assumption that the top of the vehicle is at the same level of the camera. However, different types of vehicles make this assumption invalid. B C yi,^^jK I ,,H^ H H ^o b ~n ——b-J Figure 37. Schematic diagram of the imaging geometry of vehicles with different height In order to relax this assumption, we first introduce the concept of horizontal line in the image. If the position of the horizontal line in the image is known, the similar triangle relation is still maintained with H corresponding to Z1 and yl corresponding to (H+K) in Figure 37. The value of K is the part above the horizontal line of the vehicle height. The new similar triangle relation becomes: - (14) Chapter 4. Camera-to-vehicle Distance Measurement by Single Camera 63 The proposed formula is based on the fact that the ratio of vehicle body above the horizontal line to the vehicle body below the horizontal line remains invariant in the 2D image and the 3D world: —” \ ^^Wfej .JBP Substitute K into equation ( {‘一…''^P^jP^^^ I ITorizont^lininili^ - , j^^lTtiE 见丨 H N Iter" 1; yii: ihe number ol pixel from the horizontal 少 � ."乂 line to the bottom of the vehiclc Figure 38. Newly proposed formula for computing the camera-to-vehicle distance Distance computed by Distance computed by the newly proposed formula Mobileye® formula ll^ ^In Frame jgjfjjl^^ 700 ppiMBijp HRP^PJI ^ Figure 39. The distance computed by Mobileye® formula and the newly proposed formula Chapter 4. Camera-to-vehicle Distance Measurement by Single Camera 64 r^Dis.q I a 13.5 d^ ^ P^^P Sl^ Frame 50 Frame 100 Frame 150 m^mi^HH DSM.'J DDIE J^M ^ T^iSIRmHHI jbh Frame 200 Frame 250 Frame 300 DDE2 iiii5iu^jj aTi 00^3 T ~|T ^ Frame 350 Frame 400 Frame 450 Frame 500 Frame 550 Frame 600 ‘ m3 —-•^H ^ QDE.^ gHHnJI Ufagjij 邏 pHpSi^^^SI Frame 650 Frame 700 Frame 750 Figure 40. Result of vehicle tracking images with distance information (The number: The relative distance from the target to the camera in meter; the green rectangle: The tracking region of vehicles) Chapter 4. Camera-to-vehicle Distance Measurement by Single Camera 65 The comparison between the camera-to-vehicle distance computed by the formula proposed by Mobil eye® and the new formula proposed is shown in Figure 39. The camera-to-vehicle distance computed by the Mobil eye® formula is shorter than the actual distance, leading to possibility of unnecessary warning The vehicle tracking result with camera-to-vehicle distance computed by the newly proposed formula of Sequence A is shown in Figure 40 where there are multiple cars following and overtaking our car. The distances measured vary as the cars transverse at the rear end. The precise distance measurement is based on the accurate tracking given by the 3-phase structure. In the coming chapters, we will show how the real-time multiple vehicles tracking can be realized on embedded systems. Chapter 5. Real Time Vehicle Detection 66 5. Real Time Vehicle Detection 5.1. Introduction In the vehicle detection phase, symmetrical measurement is first applied to reduce the searching area of the region of interest. Actually, symmetrical measurement is quite computational intensive and the computation of gradient image is time consuming. Vehicle detection should be further optimized so that more computational resources could be reserved for vehicle tracking and other advanced intelligent vehicle features. The optimization techniques adopted is in the algorithm and instruction level. The symmetrical measurement is optimized by the techniques of diminishing gradient image input and replacing division operations. Besides,the over-complete Haar Transform is optimized by pre-computalion. These techniques will be explained in this chapter. 5.2. Timing Analysis of Vehicle Detection We have conducted an experiment to analyze the time complexity of different steps in vehicle detection. The Intel ECX platform is used for evaluation. The Image Sequence A used in Chapter 3 is used as the input data in this experiment. There are 731 frames in this sequence and Phase 1 tries to detect as many vehicles as possible in the newly frame. Table 6 shows the time consumption of individual steps in average of 731 frames. From Table 6,symmetrical measurement occupies nearly half of the timing in vehicle detection. The complexity comes from the division operations when Chapter 5. Real Time Vehicle Detection 67 computing the symmetrical value. The optimization strategy is to minimize the number of division and replace it by multiplication. Besides, the symmetrical measurement is linearly proportional to the image width. We try to decrease the dimension of input image and symmetrical measurement with little lateral position error of vehicle center as trade-off. The step of horizontal gradient filtering also benefits from the diminished image input. The implementations of locating vehicle roof and bottom follow the original approach. Therefore, we believe the degree of improvement is limited. The over-complete Haar transform for locating vehicles' sides is another complex step. Its optimization is also necessary. Time consumed (millisecond) Horizontal gradient filtering 5.417 Symmetrical measurement 16.366 Locating vehicle roof and bottom 10.217 Over-complete Haar transform 12.813 Total time consumed in vehicle detection 44.813 Table 6. Timing information of the vehicle detection 5.3. Symmetrical Measurement Optimization 5.3.1. Diminished Gradient Image for Symmetrical Measurement Gradient images are input for symmetrical measurement to reduce the influence of illumination variation in Phase 1,Vehicle Detection. In order to highlight the measurements surrounding the vehicles, the symmetrical measurement is further enhanced using horizontal edges only as input for vehicle detection. We have shown the effectiveness of using horizontal gradient image as input in Chapter 3. Chapter 5. Real Time Vehicle Detection 68 There are many choices of gradieni filler like Sobel, Prewilt and Canny [41]. We have chosen to use Prewitt filler because of its simplicity and only consists of addition and subtraction: o/Y \ 111 Gradieni on horizontal: •’ = 0 0 0 00 /(x, dy , 111 On the other hand, Sobel and Canny filters consists division and multiplication which are relatively expensive operations on embedded systems. If large-size images are input for gradient filtering, the processing time is beyond computation. Ill order to further reduce the complexity of gradient filtering and symmetrical measurement, we have diminished the input image to — of its original size. We n have set n equal to 2 because of the ease of image resizing. For example, images ill 320x240 dimensions are captured by camera in our camera setting and they are resized to 160x120 in dimensions. The gradieni filtering and symmetrical measurement are benefited by this scheme with 3/4 of operations reduced. The steps involved in diminished gradient image for symmetrical measurement are shown in Figure 41. 广‘..她 Gradieni filtering Symmetrical measurement Adjj^ting the lateral Captunngwpdth^ghl • • (Number of operations • (Number of operations • PO 咖 nof^^aximum images by camera width/n x helght/n reduced by 1 ) reduced bv 1 ) symmetrical value by '-•jjT 1-p- ‘ multiplying a factor of n ^ Locating vehicle roof, bottom and sides Figure 41. The steps involved in diminished gradient image for symmetrical measurement Chapter 5. Real Time Vehicle Detection 69 Original-size images are used for locating vehicle roof, bottom and vehicles in order to keep the feature details. Finally, the peak found in symmetrical measurement needs to be adjusted by multiplying a factor of n to the lateral position of the original-size images. There is trade-off between the symmetrical measurement accuracy and timing improvement. It is because the resolution of symmetrical measurement is reduced by a factor of n when the number of operations in gradient filtering and symmetrical measurement is only — of the original one. Figure 42 shows the comparison of symmetrical measurement with original image input and diminished image input, —; I Uleral pwition of image I (a) Original image (320x240) for symmetrical measurement. Chapter 5. Real Time Vehicle Detection 70 • ^^^^^ I line I h^^^HHHH^H I^^HHHI jlSHHHH^I^IHI U W UU III Ul UJ lju I Latrral pMition of imagr I I (b) Diminished image (160x120) for symmetrical measurement Figure 42. The comparison of symmetrical measurement with original image input and diminished image input with n = 2. The symmetrical peak of (a) original image is found at lateral position 199 while the symmetrical peak of (b) diminished image is found at lateral position 99 (Equal to 198 after adjusting) 111 Figure 42,we can observe that the symmetrical measurement of diminished image is not as detailed as original image. From this example, the symmetrical peak found using diminished image has 1 pixel difference in lateral position from that using original image because of the low-resolution symmetrical measurement. In most cases, the lateral position difference is only 1 to 2 pixels because the horizontal edges of vehicles still dominate the symmetrical response in different image sizes. Even though there is lateral position error caused by diminished images, Phase 3 (Vehicle Template Update) will correct the error and the tracking Chapter 5. Real Time Vehicle Detection 71 region will be fitting the vehicle as shown in Figure 43. ^M m. Frame 120 (Initial vehicle detection) Frame 140 (Vehicle template update) Figure 43. Vehicle Template Update correcting the initial lateral position error caused by vehicle detection 5.3.2. Replacing Division by Multiplication Operations Division is an expensive operation. It is desirable to avoid it where possible. Sometimes expressions can be rewritten so that a division becomes a multiplication [42]. For example, (x / y) > z can be rewritten as x > (z * y) if it is known that y is positive and y * z fits in an integer. Algorithm 1 shows the pseudo code of the symmetrical measuremenl implementation. The goal of the symmetrical measuremenl is to locate the position of the peak. There is a conditional statement at line 14 to determine the maximum symmetrical measurement. Before that, the symmetrical value is calculated by a division operation at line 13 where numerator and denominator are both positive integer. This statement occupies half of the CPU liming in SymmetricalMeasurement function. Chapter 5. Real Time Vehicle Detection 72 SyninietricalMeasurement('yGradientlmage) 1: max_sym = 0; 2: for i — scale to Image Width - scale 1 do 3: numerator = 0 4: denominator = 0 5: for j — 0 to ImageHeight — 1 do 6: for k — i - scale to i 7: numerator = numerator 十 yGradientlniage(j’k)*yGradientlmage(j’k+i*2-k*2) 8: end for 9: for k — i — scale lo i 十 scale 10: denominator = denominator + yGradientlmage(j,k)* yGradientlmage(j,k) ]]: end for 12: end for 13: S = numerator / ^denominator 14: If S > max_sym 15: max_syni = S 16: max_sym_j>os = i 17: end if 18: end for Note 1: yGradientlmage is the gradient image with horizontal edges. All values are in integer. fimnerator + ^ Note 2: Since S is proportional to C, S is equal to mmeralor instead of denommator . denominator 2 Algorithm 1. Original implementation of symmetrical measurement In order to replace expensive division operation by multiplication,the SymmetricalMeasurment is re-implemented without the use of division as shown in Algorithm 2. The new implementation still works because: 1)IfS>max—sym,then 画忆偷 > 魏———隱酬tor denominator max_ sym _ denominator 2) If numerator � max_ sym — numerator denominator max_ sym — denominator then (numerator*max_sym_denominator) > (denominator*max一sym—numerator) provided that denominator, numerator, maxsymdenominator and max_sym_numerator are all positive integers. As a result, we use the multiplication operation to replace the division operation. The timing improvement will be shown in Section 5 of this chapter. Chapter 5. Real Time Vehicle Detection 73 SymmetricalMeasurement_NoDiv(yGradientImage) ]: max_sym_numeiator = 1 2: max_sym_denominator = 1000000 3: for i scale to ImageWidth 一 scale — 1 do 4: numerator = 0 5: denominator = 0 6: for j — 0 to ImageHeight - 1 do 7: for k — i - scale to i 8: numerator = numerator + yGradientImage(j ,k)*yGradientlmage(j ,k+i * 2-k* 2) 9: end for 10: for k — i - scale to i + scale 11 : denominator = denominator + yGradientlmage(j,k)* yGradientImage(j,k) 12: end for 13: end for 14: If (numerator*max_sym_denominator) > (denominator*max_sym_numerator) 15: max_sym_denominator = denominator 16: max_sym_numerator = numerator 17: max_syni_j)os = i 18: end if 19: end for Algorithm 2. Implementation of symmetrical measurement without division 5.4. Over-complete Haar Transform Optimization 5.4.1. Characteristics of Over-complete Haar Transform To achieve better spatial resolution and a richer set of features, we have applied the over-complete Haar transform for locating vehicles' sides as mentioned in Chapter 3. Besides symmetrical measurement, over-complete Haar transform is also complex and time consuming because it involves many pixel manipulation operations like averaging pixel intensity of a block of pixels. Figure 44 shows the wavelet computation part of the over-complete Haar transform. Before being input into the Haar wavelet, the intensity inside each block of pixel, called Haar block, is averaged. On the left of Figure 44, each Haar block comes from different regions of the image where overlapping occurs between block and block. The overlapping is the major source of timing wastage. Chapter 5. Real Time Vehicle Detection 74 Haar wavelet 1 I 0 , •••••• ~I I~ ——• I X1 + XI) Avorasjlntj Iho intenstty Jl* • 0 1 \ | | I | /• Averaging the IntorHlty ...... --- 本斗4一S-•…• -0 n-…”•( xi+Qx-i) 令: J —— .一 ^ • — • • • . I 一 一 • • • tT'^tf::::....^ i| lm,e ••…_ � ..…f x 1 + 门 x -1 ) around vehicle side _ o i \ I i I /, -1 Figure 44. Wavelet computation in over-complete Haar Transform 5.4.2. Pre-computation of Haar block The over-complete Haar transform is an extension of the standard Haar transform where the wavelet is shifting 1/4 of the size resulting in the quadruple density. Figure 45 focuses on the overlapping relation between Haar blocks. Each Haar block is in size of 32x32 pixels. In the standard Haar transform, the 1024 pixels inside each block are summed up and then compute the averaging intensity of the Haar block. The same step takes place in other Haar block. The idea of pre-computation of Haar block comes from the overlapping between Haar block. Each Haar block has 3/4 of area overlapping with its vertical or horizontal neighborhood and 9/16 area overlapping with its diagonal neighborhood. Since all of the Haar blocks are going to compute its averaging intensity, parts of the averaging intensity of sub-Haar block can be shared with other overlapping Haar block. For example in Figure 46,the yellow area is the Chapter 5. Real Time Vehicle Detection 75 overlapping region of the two Haar blocks. The averaging intensity of this yellow area can be pre-computed and shared by the two Haar blocks instead of duplicate computation of this area when two blocks computing their own averaging intensity separately. I 32 pixels I jrpicr I 32 pixels 1J I 32 pixels"] H 1 + ^"f^pixe^ ; • • llllllllll^ m Hii^fl Figure 45. Overlapping relation between Haar blocks I 32 pixels I M • I 32 pixels ,r I 32 pixels I H— It f _ -- • I 32 pixels I Figure 46. Average intensity of overlapping region shared by two blocks Chapter 5. Real Time Vehicle Detection 76 To general the case, we should further divide the overlapping area into smaller region to maximize the reusability of pre-compuled averaging intensity. Because of the 1/4 shifting of wavelet, the overlapping area between blocks are based on the combination of 8x8 blocks, called sub-Haar block. Therefore, we can divide the image into many sub-Haar blocks as shown in Figure 47. All of the average intensities of sub-Haar blocks are pre-computed that can be reused by different Haar blocks. B: Average intensity of Haar block sB: Average intensity of sub-Haar block M ^ ^ ^ B(blue) = ;^sB(i) + 2>B(i) + XsB(i) + £sB(i) i=53 i 二 69 4 36 « r B(red) = ZsB(i) + + ^sB® + £sB(i) / i=33 i=49 / M 128 pixels • W / • I I I个I'l中卜卜M l啊州 I / Tl ,中并酔:件牛円 mmimm 讕-.m (D | - I,�|- :jlVijlv^ivll^fji&iili^^yjv.v I / 访4林帐闲二闲::::::祠 ln^mn !^ B(green) = ^sB^ ++ f^sB^ + |;sB(i) 55 71 87 103 B(yellow) = XsB(i) + ^sB(i) + ZsB(i) + ^sB(i) i=52 i-68 i=84 i-KH) Figure 47. Division and reuse of sub-Haar blocks. 4 sample Haar blocks are given as examples which are found by the reuse of sub-Haar block Chapter 5. Real Time Vehicle Detection 77 5.5. Summary After implementing all of the optimization techniques mentioned above, another experiment is conducted to analyze the timing improvement of the optimized vehicle detection. The timing improvement of steps is listed in Table 7. Name of step Time consumed before Time consumed after Image resize (n = 2) ; 5.789 Horizontal gradient filtering 5.417 1.79i SymmetTical measurement 16.366 jU35 Locating vehicle roof and bottom 10.217 10.601 Over-complete Haar transform 12.813 11.750 Total time consumed in vchiclc detection 44.813 33.068 Table 7. Timing information of the vehicle detection after optimization The result shows that our optimization strategy achieves 25% of timing improvement in vehicle detection. The steps of horizontal gradieni filtering and symmetrical measurement have the greatest timing improvement. Although an extra step, image resize, is created and consume a proportion of lime, together with the pre-processing step, gradieni filtering, symmelrical measurement has contribute above 90% of the total timing improvement. However, over-complete Haar transform does not have any noticeable effect on liming improvement. One of the reasons is that the pre-compulalion of sub-Haar block also consumes a reasonable amount of time and covers up the contribution of sub-Haar block sharing. Nevertheless, the target of reserving more computational resources for vehicle tracking has been reached. Chapter 6. Real Time Vehicle Tracking using Simplified Image Alignment 78 6. Real Time Vehicle Tracking using Simplified Image Alignment 6.1. Introduction In Chapter 3,we have shown how the 3-phase structure achieves multiple vehicles tracking in outdoor scenes. The experiment shows that our approach is practical in real vehicle tracking application although there is large variability in vehicle shape, color, illumination and contrast. The precise tracking in vehicle roof and bottom allows the newly proposed formula giving accurate distance measurement. The next step is how to realize the real-time multiple vehicle tracking on embedded systems. We have simplified the computation in the original image alignment by minimizing the number of image alignment parameters that would be explained in this chapter. In this chapter, the image alignment mentioned before is named "original image alignment" while the newly propose image alignment is named "simplified image alignment". 6.2. Timing Analysis of Original Image Alignment We have conducted an experiment to analyze the time complexity of the original image alignment algorithm for vehicle tracking. The Intel ECX platform is used for evaluation. Image Sequence A used in Chapter 3 is used for liming analysis. Table 8 shows the information of the image sequence. During the experiment, vehicles are tracked along the image sequence based on the vehicle template described in the detection session. In the original image alignment algorithm, 6 Chapter 6. Real Time Vehicle Tracking using Simplified Image Alignment 79 parameters are used to describe the affine transformation from frame to frame. Table 9 and Table 10 show the liming information of vehicle template tracking and our original vehicle tracking methodology respectively. The lime consumed is taken as the average of 731 frames. 二 Capturing device Logitech webcam 4()()() Frame size 320 x 240 pixels Duration 49 seconds Frame rate 15 frames / sec Total frames 731 frames Maximum number of" cars 3 Size of template About 80 x 80 pixels Table 8. Information of the image sequence in the timing analysis ^tej^i^mag^li^nmen^ ^^^^^Nam^^te^^^^^ Time consumed (millisecond) Pre-compute: (1) Gradient of template 0.156 ^ Jacobian 0.368 Q) Steepest descent images 0.731 (4) Hessian matrix 0.933 Iterate: (The timing is in millisecond / iteration. There are 10 iterations in one frame) ^ Warping Image 3.587 (6) + (7) Steepest descent parameters 0.155 ^ Compute Ap 0.005 ^ Update the warp 0.014 Total time consumed for 10 iterations: 37.610 Total time consumed per frame of 1 vehicle: 39.798 Table 9. Timing information of the original vehicle template tracking per frame of one vehicle Chapter 6. Real Time Vehicle Tracking using Simplified Image Alignment 80 „ Time consumed Frame rate Step (millisecond) (frames / sec) Step 1: Vehicle Detection 33.068 - Step 2: Vehicle Template Tracking (For each vehicle) 39.798 - Total time consumed of 1 vehicle (Step 1 + Step 2): 72.866 13.723 Total time consumed of 2 vehicles (Step 1 +2x Step 2): 112.664 8.875 Total time consumed of 3 vehicles (Step 1 + 3 x Step 2): 152.462 6.559 Table 10. Timing information of the original vehicle tracking methodology per frame The embedded system is not the target platform for our original image alignment algorithm. It can be expected that the experimental frame rate cannot reach the target frame rate of 15 frames per second. Moreover, we should not assume there is only 1 vehicle following us. As a result, the original vehicle tracking methodology must be optimized for multiple vehicles real-time tracking. The liming improvement of the optimization in vehicle tracking would be more obvious than the optimization in vehicle detection because: 1) The time consumption of vehicle template tracking is proportional to the number of target vehicles being tracked. The timing improvement would be more significant when there are more target vehicles. 2) There are 10 iterations when running image alignment tracking algorithm one time. The timing improvement would be magnified when any one of the steps in the iteration is optimized. 6.3. Simplified Image Alignment 6.3.1. Reducing the Number of Parameters in Affine Transformation The idea of simplified image alignment comes from the 4 basic affine transformations: translation, scaling, shearing and rotation. These transformations can be described by 6 parameters proposed by Bergen and his colleagues as Chapter 6. Real Time Vehicle Tracking using Simplified Image Alignment 81 shown in Equation (3) in Chapter 3. Table 11 shows those parameters involved in each basic Iransformation. By observing the sample transformations in Figure 26 ill Chapter 3, it is evident that the cases of shearing and rotation are inappropriate to apply for successive images for the same vehicle moving on the road and would not assist the vehicle tracking. Therefore, we propose to exclude the shearing and rotation transformations so that only 4 parameters (po, ps, p4 & ps) instead of 6 parameters are used to describe the affine transformation. Po Pi P2 P3 P4 Pa Translation •/ Scaling ^ ^ Shearing � ^ Rotation ^ ^ ^ ^ Table 11. Parameters involved in each basic affme transformation: translation, scaling, shearing and rotation We have conducted another experiment to support our argument. The variation of parameters along the image sequence is shown in Figure 48. In this figure, the parameters po, P3, P4 and ps vary along the image sequence while pi and p2, which are used to describe the shearing and rotation transformations, fluctuate with small magnitudes only. The near-zero values of pi and p2 make the terms p, •x and 尸2 •少 in Equation (3) practically equal to zero. In such a way, we simply omit them from the computation. After the simplification, the number of multiplication in affine transformation is greatly reduced in image warping (Step 5 of image alignment). Chapter 6. Real Time Vehicle Tracking using Simplified Image Alignment 82 (a) Variation of parameter n•; {•ara meter Value I 3 - k 2 . I I :離她 -2 . W I I -3 - -5 - • -6 - -7 0 20 40 61) 80 100 121) 140 160 180 200 Frairif Numhci (b) Variation of parameter p^ Parameter Value 4 1 3 - i 2 - j -3 V ! -4 - 丨 i -5 _ -6 - Q -7 0 20 40 60 80 100 120 140 160 180 200 Frame Number Chapter 6. Real Time Vehicle Tracking using Simplified Image Alignment 83 (c) Variation of parameter p^ Parameter Value 1.2 j I 0.8 - I I 1 I 0.6 - I i 0.4 - 0.2 - a 0 I 1 1 1 1 1 1 1 • 1 ¥ 0 20 40 60 HO 100 120 140 160 180 200 Frame Number (d) Variation of parameter pn I^arameter Value I ! 0.8 - i I1 0.6 - I 1 I 0.4 - I I I j I 0.2 - I j Q 0 1 1 1 1 1 1 1 I 1 •丨 0 20 40 60 80 100 120 140 160 180 200 Frame Number Chapter 6. Real Time Vehicle Tracking using Simplified Image Alignment 84 (e) Variation of parameter Di Parameter Value i 1 0.2 - (.U - 丨 0 • ,^^ 入“M u^^u ^^ -0.1 - — j -0.2 - - a -0.3 — 0 20 40 60 80 100 120 140 160 180 200 Frame Number (f) Variation of parameter d, Parameter Value 0.3 r- 丨 I I I Q 2 - — -. -- • ——— JIj . ‘ ‘ ^ uju/^t V ��(I -0.1 - -0.2 - -0.3 0 20 40 60 80 100 120 140 160 180 200 Frame Number Figure 48. Variation of affine transformation parameters: (a) p,, (b) p4, (c) p.^, (d) po,(e) p! and (f) p2 along the image sequence Chapter 6. Real Time Vehicle Tracking using Simplified Image Alignment 85 6.3.2. Size Reduction of Image Alignment Matrixes Besides, because of the fewer number of parameters, the size of Jacobian, Steepest Descent Images, Hessian Matrix and the inverse of Hessian Matrix are reduced as shown in Table 12. This results in the decrease in the number of multiplication operation and time consumption. The timing improvement and the performance evaluation of the simplified image alignment will be given in Section 4 of this chapter. Original image alignment Simplified image alignment Step (Using 6 parameters for affine (Using 4 parameters for affine ^^^ d transformation) transformation) ‘ lyiH Steepest [”“,:�.:精零霸.pSTJ;「— _ ' Descent .- 一 Images 丨".:�.::丨?釋久?丨’.^^^^::/, :’;.•:.’.: :〒. ‘> — Hessian IXjfl J|||H 5/9 ••H IH Inverse ^^^H HH Hessian IKir Table 12. The visualization of Jacobian, Steepest Descent Images, Hessian Matrix and the inverse of Hessian Matrix in the original image alignment and the simplified image alignment 6.4. Experiments and Discussions We have conducted 2 experiments to evaluate the performance and the timing improvement of the simplified vehicle tracking methodology. The Intel ECX platform mentioned is used as the testing platform. The image sequences of the road traffic were taken by the camera mounted on the rear side of our car and following the specification mentioned in Table 8. Chapter 6. Real Time Vehicle Tracking using Simplified Image Alignment 86 6.4.1. Successful Tracking Percentage The first experiment analyzes the reliability of the simplified vehicle tracking methodology. The 4 image sequences are the same as the experimental data used in Chapter 3. By observing the number of frame when tracking became discontinuous, the reliability of the simplified methodology can be compared with the original one. Table 13 shows the information of the 4 image sequences and their successful tracking percentage. Sequence A SequenceB SequenceC SequenceD ti^ 品 i ^ 0—品. 际―"一mT" Sequence A Sequence B SequenceC SequenceD Original Simplified Original Simplified Original Simplified Original Simplified Road type City streets City streets Highway Highway Duration 49 seconds 4 seconds 68 seconds 7 seconds Total number of 731 frames 60 frames 1001 frames 101 frames frame (NF) Number of tracking 9 11 0 0 13 22 0 0 lost frame (LF) Successful tracking KTP 98..8% 98.5% 100.0% 100.0% 98.7% 97.8% 100.0% 100.0% percentage: iNf - Lr NF Table 13: Tracking percentage of simplified image alignment The vehicle tracking methodology using the original image alignment is able to track the vehicles over 98% of frames. When using the simplified image alignment, the number of 'lost' frame increases slightly in some sequences. The main reason for the increase can be attributed to the little distortion occurring Chapter 6. Real Time Vehicle Tracking using Simplified Image Alignment 87 when vehicles are moving on the winding roads. In such situation, the 4 parameters are often insufficient to describe the shearing-like distortion. In fad, the lost tracking seldom occurs successively and the simplified vehicle tracking methodology still keeps the successful rale high at 97%. Camera-to-vehicle distance (meters) 12.00 |— — 4.()() 2.00 Simplified (using 4 parameters) —Original (using 6 parameters) 0.00 ~‘‘~‘‘~‘~‘‘~‘~‘~‘~‘~‘‘~‘‘~‘~‘~‘~‘~‘‘~‘~‘~‘‘~‘~‘~I~~J 0 50 100 150 200 250 300 350 400 450 500 550 600 650 700 Frame Number Figure 49. The comparison of the calculated distance of the simplified and the original vehicle tracking methodology We compare the calculated distance of the simplified and the original vehicle tracking methodology. From Figure 49,the mean absolute difference of the calculated distance of the two methodologies is only 23 cm, a measurement which is acceptable in the on-road environment. 6.4.2. Timing Improvement The second experiment analyzes the timing improvement of the simplified vehicle tracking methodology. The experiment setup is similar with that in Table 8 but Chapter 6. Real Time Vehicle Tracking using Simplified Image Alignment 88 now 4 parameters are used. Table 14 and Table 15 show the liming information of vehicle template tracking and our simplified vehicle tracking methodology respectively. .. -, Original time Time consumed Step in image alignment Name of step , __� ,�� (millisecond) (millisecond) Pre-compute: (1) Gradient of template 0.156 0.141 (2) Jacobian 0.368 0.179 Steepest descent images 0.731 0.282 一 (4) Hessian matrix 0.933 0.401 Iterate: (The timing is in millisecond / iteration. There are 10 iterations in one frame) (5) — Warping Image 3.587 0.772 一 (6) + (7) Steepest descent parameters 0.155 0.095 Compute Ap 0.005 0.004 (9) Update the warp 0.014 0.013 Total time consumed for 10 iterations: 37.610 8.840 39.798 9.843 Table 14. Timing of the simplified vehicle template tracking per frame of one vehicle � Time consumed Frame rate Step (millisecond) (frames / sec) Step 1: Vehicle Detection 33.068 - Step 2: Vehicle Template Tracking (For each vehicle) 9.843 - Total time consumed of 1 vehicle (Step 1 + Step 2): 42.911 23.304 Total time consumed of 2 vehicles (Step 1 +2x Step 2): 52.754 - 18.955 Total time consumed of 3 vehicles (Step 1 +3 x Step 2): 62.597 15.975 Table 15. Timing of the simplified vehicle tracking methodology per frame From the results, we can see that the computation of the Jacobian, Steepest Descent Images, Hessian Matrix and its inverse and Steepest Descent parameters benefit from the size reduction using 4 parameters. The step of image warping even has 80% of timing improvement after the simplification. In particular, the vehicle template tracking for each vehicle is greatly reduced from around 40 milliseconds to below 10 milliseconds. Moreover, our methodology can track 3 vehicles simultaneously in 15 frames per second. It results in the realization of multi-vehicle tracking on embedded systems. Chapter 7. Conclusions 89 7. Conclusions In this thesis, we have described a 3-phase vehicle tracking methodology for detecting multiple moving vehicles in image sequences captured by a single camera mounted on driver's car with precise distance information for preventing automobile collisions. Our methodology differs from most previous methods which detect vehicles in each frame without correlating the successive frames. Since the image alignment is robust enough, the tracking would be continuing once the vehicles are detected at any one of the frames. While there is sudden change in illumination or contrast making large difference between vehicle template and the current image,the 3-phase structure would not be interrupted. Instead, it will update the vehicle template for best fitting. The 3-phase structure provides valuable information to accommodate changes in the viewpoint and position of the vehicles. The tracking is precise and allows thus the distances from the target vehicles to our vehicle be computed accurately using the newly proposed distance formula. Moreover, the cooperation of 3 phases highly enhances the successful tracking results. To realize the real-time tracking of multiple vehicles using embedded systems, we have optimized and simplified the image alignment tracking algorithm by minimizing the number of parameters in the affine transformation. The improved lime consumption of vehicle tracking is only 25% of the original one. Although the calculated distances are slightly different from the actual measurements, the successful tracking rate maintains over 97%. Therefore, we can claim that our methodology is able to track multiple rear vehicles with distance information Chapter 7. Conclusions 90 accurately using embedded systems in real-lime. Further enhancement can be made lo reduce the failure rate, especially when the vehicles are traversing along some winding roads. When computing the distances from rear vehicles, we assume that the horizontal line remains in position in the image. Nevertheless, running on the bumpy road may cause the assumption not valid. In the future, there should be a method to update the position of horizontal line dynamically so that the distance measurement formula can still be applied in different road situation. All image sequences used in experiments are taken by the camera mounted on a real car traveling in highway and city streets. It is lime and energy consuming. Due to the advance of computing graphics, image sequences of road traffic synthesized by computers, like the image sequences in computer racing game, would be an alternative of the source of experimental data used in simulation. Besides, because of the absence of publicly available road traffic database, benchmarking our system's performance is difficult, making it difficult to compare our results with others. Providing inter-vehicle separating distance and advance warning to a driver is just the beginning of the development of Intelligent Transportation Systems. Ultimately, we hope that the techniques developed in our system could lead lo the development of other higher level advice for driving safety like braking, steering, changing lane and vehicle trajectory prediction. Chapter 8. Bibliography 91 8. Bibliography [1] K. Hartman and J. Strasser, "Saving Lives Through Advanced Vehicle Safety Technology, “ Cambridge Systematics, Inc., MA. [Online]. Available: httt)://www.itsdocs.fhwa.dot.gov/JPODOCS/REPTS PR/14153 files/ivi.pdf [2] Z. Sun,G. Bebis and R. Miller, "On-Road Vehicle Detection Using Optical Sensors: A Review, “ IEEE Intelligent Transportation Systems Conference, Oct 2004. [3] E. H. Adelson and J. R. Bergen, "The plenoptic function and the elements of early vision, “ 111 Computational Models of Visual Processing, M. Landy and J.A. Movshon, MIT Press: Boston, MA, pp. 1-20. [4] B. Lucas and T. Kanade, "An iterative image registration technique with an application to stereo vision, “ In Proceedings of the international Joint Conference on Artificial Intelligence, pages 674-679,1981. [5] G. D. Hager and P. N. Belhumeur, "Efficient region tracking with parametric models of geometry and illumination, “ IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(10): 1025-1039, 1998. [6] G. P. Stein, O. Mano and A. Shashua, "Vision-based ACC with a Single Camera: Bounds on Range and Range Rate Accuracy, “ IEEE Intelligent Vehicles Symposium, Jun. 2003,Columbus, OH [7] Q.T. Luong, J. Weber, D. Koller and J. Malik, "An integrated stereo-based approach to automatic vehicle guidance, “ Proceedings of the 5th ICCV, June 1995. [8] A. Giachetti,M. Camoani and V. Torre, "The use of optical flow for road navigation, “ IEEE Transactions on robotics and automation, vol. 14,no.l,pp. 34-48. 1998. [9] S.M. Smith, "ASSET-2: Real-time motion segmentation and shape tracking, “ In Proceedings of 5th International Conference on Computer Vision, pages 237-244, 1995. Chapter 8. Bibliography 92 [10] M. Betke, E. Haritaoglu and L. S. Davis,"Real-Time Multiple Vehicle Detection and Tracking from a Moving Vehicle, “ Journal of Machine Vision and Applications, vol. 12, no. 2,pp. 69-83,September 2000. [11] "Advance warning system ,,,http://www.mobileve.com [12] I.Gal, M.Benady and A.Shashua, "Monocular Vision Advance Warning System for the Automotive Aftermarket’” SAE World Congress & Exhibition 2005, April 2005, Detroit USA. [13] M. Bertozzi and A. Broggi, "Gold: A parallel real-time stereo vision system for generic obstacle and lane detection, “ IEEE Transaction on Image Processing, vol. 7, pp. 62-81, 1998. [14] C. Tzomakas and W. Seel en, "Vehicle detection in traffic scenes using shadows,“ Techical Report 98-06,Institut fur neuroinformatik, Ruht-universital, Bochuni, Germany,1998. [15] N. Matthews, P. An, D. Chamley and C. Harris, "Vehicle detection and recognition in grayscale imagery, “ Control Engineering Practice, vol. 4, pp. 473-479, 1996. [16] C. Goerick, N. Derlev and M. Werner, "Artificial neural networks in real-time car detection and tracking applications, “ Pattern Recognition Letters, vol. 17,pp. 335-343,1996. [17] U. Handmann, T. Kalinke, C. Tzomakas, M. Werner and W. Seeleii, "An image processing system for driver assistance, “ IEEE International Conference on Intelligent Vehicles, pp. 481-486,Stuttgart, Germany, 1998. [18] R Parodi and G. Piccioli,"A feature-based recognition scheme for traffic scenes, “ IEEE Intelligent Vehicles Symposium, pp. 229-234, 1995. [19] N. Srinivasa, "A vision-based vehicle detection and tracking method for forward collision warning. “ IEEE Intelligent vehicle symposium, pp. 626-631,2002. Chapter 8. Bibliography 93 [20] T. Bucher, C. Curio el al, "Image processing and behavior planning for intelligent vehicles, “ IEEE Transactions on Industrial Electronics, vol. 50, no. 1,,pp. 62-75,2003. [21] H. Mori and N. Charkai, "Shadow and rhythm as sign patterns of obstacle detection, “ International Symposium on industrial electronics, pp. 271-277, 1993. [22] E. Dickmanns et. al, "The seeing passenger car ‘vamors-p“ International Symposium on Intelligent Vehicles' 94,pp. 24-26, 1994. [23] N. Kiryati and Y. Gofman, "Detecting Symmetry in Grey Level Images: The Global Optimization Approach, “ International Journal of Computer Vision, Vol. 29 No.l,pp 29-45,1998. [24] C. P. Papageorgiou and T. Poggio, “A Trainable Object Detection System: Car Detection in Static Images, “ MIT Al Memo No. 180. October, 1999. [25] C. K. Chui,"An Introduction of Wavelets, “ Academic Press, San Diego, 1992. [26] C. Mulcahy, "Image Compression using Haar Wavelet Transform, “ Spelman Science and Math Journal. [27] S. G. Mallat, "A theory for multi-resolution signal decomposition: The wavelet representation, “ IEEE Transactions on Pattern Analysis and Machine Intelligence, ll(7):674-93, July 1989. [28] E. J. Stollnitz, T. D. DeRos and D. H. Salesin,"Wavelets for computer graphics: A primer, “ Technical Report 94-09-11, Department of Computer Science and Engineering, University of Washington, September 1994. [29] M. Oren, C. Papageorgiou, P. Sinha,E. Osuna, and T. Poggio, "Pedestrian detection using wavelet templates, “ In Computer Vision and Pattern Recognition, pages 193-199,1997. [30] M. Black and A. Jepson,"Eigen-tracking: Robust matching and tracking of articulated objects using a view-based representation, “ International Journal of Computer Vision, 36(2): 101-130,1998. Chapter 8. Bibliography 94 [31] S. Baker and I. Matthews, "Lucas-Kanade 20 Years On: A unifying Framework,“ International Journal of Computer Vision, Vol. 56,No. 3, March, 2004, pp. 221 — 255. [32] J.R. Bergen, P. Anandan, KJ. Hanna, and R. Hingorani, "Hierarchical model-based motion estimation, “ In Proceedings of the European Conference on Computer Vision, pages 237-252, 1992. [33] H. Y. Shum and R. Szeliski, "Construction of panoramic image mosaics with global and local alignment, “ International Journal of Computer Vision, 16(l):63-84, 2000. [34] S. Baker and I. Matthews, "Equivalence and efficiency of image alignment algorithms, “ In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, volume 1, pages 1090-1097, 2001. [35] F. Bashir and F. Porikli, "Performance Evaluation of Object Detection and Tracking Systems, “ IEEE International Workshop on Performance Evaluation of Tracking and Surveillance (PETS),June 2006 (PETS 2006) [36] G. Cao, J. Jiang and J. Chen, "An improved object tracking algorithm based on image correlation, “ Industrial Electronics, 2003. ISIE '03. 2003 IEEE International Symposium on,Vol.1, Iss.,9-11 June 2003. [37] U.M. Cahn von Seelen and R. Bajcsy, "Adaptive correlation tracking of targets with changing scale, “ Technical report, GRASP Laboratory Technical Report, June 1996. [38] "Open Source Computer Vision Library" (OpenCV), http://www.intel.coni/technologv/comDuting/opencv/ [39] G. R. Bradski, "Computer Vision Face Tracking For Use In Perceptual User Interface, “ In Intel Technology Journal, pages 123-140, 1998. [40] T. Chateau, V. Gay-Belille, F. Chausse and J. Lapreste, "Real-Time Tracking with Classifiers, “ Workshop on Dynamical Vision, European Conference on Computer Vision 2006. Chapter 8. Bibliography 95 [41] J. Canny, "A computational approach to edge detection, “ IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 8, Issue 6 (Nov. 1986), 679-698. [42] "Writing Efficient Cfor ARM, “ Application Note, Advanced RISC Machines Ltd (ARM), 1998 . ” � .• • . • . . , - ‘ ‘ •; ...... > ,.• . .. ..^ • --C'-l •. � ..�‘A�,.-..,‘"- ' ‘ 、...〜. :“ , . - . •、 - ‘ •* ... •‘:., . . » ...... t, •,.-..,...... V • •. V- ::'� :‘,:• . • ., = C • •� ‘. • “ . .� .. ‘‘ ,,:..:•�• ’. -‘•‘• , ‘‘ .�. .‘..V:,.0:,••";,�. •、.\:.‘ , • • • , •识;、:’:.....、. . • . •、.i' v..厂、:. .. . r » - • ‘ w • • . . .、. ...,.:• , • “‘ • - ‘ • - •‘ . .-_.. .. L .... • .“ - -…. • • • • ” »* . • V. • -‘. . . 、 : • ‘ - - ... ….”:.L ..,•乂; ::...々'..., ....,.。•_;:: — CUHK Libraries i酬^^^ 004439967