Biologically Motivated Modeling and Imitating the ’s Vision System

Ofir Avni

Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006 Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006 Biologically Motivated Modeling and Imitating the Chameleon’s Vision System

Research Thesis

Submitted in partial fulfillment of the requirements

for the degree of Master of Science in Computer Science

Ofir Avni

Submitted to the Senate of the Technion—Israel Institute of Technology

HESHVAN 5767 HAIFA NOVEMBER 2006

Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006 Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006 The research was done under the supervision of Assoc. Prof. Ehud Rivlin in the department of Computer Science

I wish to thank my advisor, Assoc. Prof. Ehud Rivlin, for his guidance and support through my long research. I would also like to thank Prof. Gadi Katzir from the department of

Biology in the university of Haifa, for his guidance in the biological aspects of this research. The support and guidance of Dr. Hector Rotstein and Dr. Francesco Borrelli on issues regarding control theory was also very valuable. I would also like to thank Shay Ohayon, which provided help on pose estimation, and the source code for his tracking system, which served as a basis for mine.

The generous financial help of the Technion is gratefully acknowledged.

Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006 Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006 Contents

Abstract 1

1 Introduction 3

1.1 Biologically Motivated Approach ...... 3

1.2 Tracking the of the Chameleon ...... 4

1.3 The Chameleon as a Model ...... 4

1.4 Thesis Outline ...... 8

2 The of the Chameleon 11

3 Related Work 15

3.1 3D Pose Estimation ...... 15

3.2 Environment Visual Scanning ...... 18

3.3 Active Target Tracking ...... 20

4 Tracking the Eyes of the Chameleon 23

4.1 Tracking the Chameleon Head ...... 23

4.2 Tracking the Eyes ...... 34

4.3 Calibration of Multiple Cameras and Mirrors ...... 38

4.4 Experiments ...... 44

Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006 5 Scanning the Environment 47

6 Low Level Control - Smooth Pursuit and Saccade 53 6.1 Active Vision Model ...... 53

6.2 Smooth Pursuit Control ...... 56 6.3 Saccade Controller ...... 57 6.4 Switching between Smooth Pursuit and Saccade ...... 60

7 Simulations and Experiments 61 7.1 System Architecture ...... 61 7.2 Scanning Method Simulations ...... 62

7.3 Smooth Pursuit and Saccade Controllers: Experimental Validation . . . . 66

8 Conclusions 69

References 75

Hebrew Abstract ix

Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006 List of Figures

1.1 The robotic head and a chameleon head ...... 6

1.2 System architecture ...... 7

2.1 A map of the ganglion cells in the chameleon ...... 12

2.2 Comparison of Chameleon Eye with a Typical Eye ...... 13

2.3 Focus/De-focus Movements of the Chameleon Eyes ...... 14

4.1 Body, world and camera coordinates systems ...... 24

4.2 A chameleon with the artificial features attached to it ...... 31

4.3 The parameters of the appearance of a feature ...... 32

4.4 Pattern of a feature ...... 33

4.5 Eye Model ...... 34

4.6 Finding the direction of the eye by line-sphere intersection ...... 36

4.7 Difference image reveals the location of the eyelid ...... 37

4.8 Mirror as a virtual camera ...... 40

4.9 Experiments setup ...... 45

4.10 Results of tracking the eyes ...... 46

6.1 Pan-tilt model and coordinate systems ...... 54

6.2 System model ...... 56

6.3 Smooth Pursuit min-max MPC Controller ...... 58

Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006 6.4 Saccade controller for different horizons N ...... 59

7.1 Simulation setup for the 1D case ...... 63 7.2 Simulation results for the search method in 1D ...... 63 7.3 Simulation results for search method using four cameras ...... 65 7.4 Simulation results for search method in 2D ...... 65

7.5 Experimental results for tracking sinusoidal signal ...... 67 7.6 Experimental results for tracking a target with constant acceleration . . . . 67

Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006 Abstract

This thesis takes the biologically motivated approach with regard to the visual system of the chameleon. The chameleon has a unique visual system, in which the two eyes scan independently the environment. In the first part of this work, a complex and innovative computer vision system, which uses cameras and mirrors, is designed and implemented in order to track the direction of the eyes of . In this part the problem of pose estimation with multiple cameras and mirrors is formulated and solved. The problem is formulated as a minimization problem of the three-dimensional geometric errors given the location of features found in the image and their three-dimensional model. The direction of the eyes is found by founding the eyelid in the image, and based on a geometric model of the eye. Preliminary results from this part, which includes biological experiments, indicates that chameleons scan the environment using a “negative correlation” strategy. That is, when one eye scans forward, the other, with high probability, scans backwards. In the second part, a robotic system based on the chameleon visual system is con- structed. Based on the observation from the first part, a new method for visual scanning with a set of independent pan-tilt cameras is presented. The method combines information about the environment and a model for the motion of the target to perform optimal scan- ning based on stochastic dynamic programming. For the implementation, a model-based control strategy is developed that performs target tracking. A switching control, combining smooth pursuit and saccades, is proposed. Robust and Minimum-time Model Predictive Control (MPC) theory is used for the design of the control law. Finally, simulative and experimental validation of the approach is presented. The scanning algorithm was simulated in matlab, and the resulting scanning pattern has a remarkable resemblance to the scanning behavior of chameleons. The target tracking method was implemented in a real-time system of two cameras mounted on top of pan-tilt heads. Experimental results of tracking a target are presented.

1

Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006 Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006 Chapter 1

Introduction

1.1 Biologically Motivated Approach

In Biologically Motivated Robotics, biologists and scientists are working together to create robotic systems that mimic biological behaviors. The benefits from this kind of work are mutual. Biologists, for instance, can test models for biological behaviors on a real, usually simplified, system. As models are tested on real systems new ideas and corrections to the models may rise, improving the understanding of biological behaviors. Scientists, on the other hand, may get from the biological behaviors new ideas on how to solve engineering problems. These new ideas have the advantage of a “working solution” as they are tested and improved on a daily basis in the course of . In this dissertation, we follow the biologically motivated approach to investigate the visual system of the chameleon, or more specific, eye movements in the chameleon. The chameleon has interesting eye movements since the laterally positioned eyes are able to move independently, each over a wide range. The eyes perform saccadic movements - i.e. large and rapid reorientation of the eye from one direction to the other - thus enabling the chameleon to quickly scan its surrounding. This arrangement of the visual system raises questions like what is the correct way to scan the environment, and how it should be carried out. In this dissertation we will try to answer those questions. This work explores several fields which compose the biologically motivated research. In the first part, the advanced abilities of computer vision are used to conduct biological research. A computer-vision system is designed and implemented to track the eye move- ments of chameleons. Using this system, experiments, which are biological in nature, are

3

Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006 4 CHAPTER 1. INTRODUCTION

performed. In the second part, based on inputs from the biological experiments, a model that describes the biological behavior is formed. Last, a robotic system is implemented that mimic the biological behavior.

1.2 Tracking the Eyes of the Chameleon

Tracking the eyes of a chameleon can be considered as the estimation of the 3D pose of an articulated body, where the head of the chameleon is considered as a rigid body, and each of the two eyes has additional two degrees of freedom. There are numerous works dealing with estimating pose of articulated body, many of them in the context of tracking the human body. In [Drouin et al., 2003] for example, the authors track and estimate a skeletal model of a human body. The tracked person is dressed with a black outfit, and distinguishable markers are placed on the joints, which enables easier tracking of the joints. In [Nickels and Hutchinson, 2001] the authors present a model-based approach to track articulated bodies, where the kinematic model is known. In our system, the chameleon’s head is marked with distinguishable features which help to identify the head’s pose. The direction of each eye is found relative to the head. In order to find the direction of the eye, the eyelid is needed to be found. This brings a difficulty since the range of motion of each eye is large, and the eyelid will not always be visible using one camera. Therefore, several views around the chameleon are needed. In order to reduce the computational load of using large number of cameras, a solution combining two cameras and two mirrors is used. Each planar mirror that is seen in the image, can be considered as additional virtual camera. The use of a combination of one camera and mirror was presented in [Mitsumoto et al., 1992], where the authors reconstructed 3D structure using plane symmetry. The theory of stereo using planar mirrors was presented in [Gluckman and Nayar, 2001], where two mirrors on a common screw axis are viewed using one camera. In [Baker and Nayar, 1998] the theory of catadioptric image formation is presented, which includes planar and non-planar mirrors.

1.3 The Chameleon as a Model

Environment visual scanning (EVS) is a critical task for living creatures and artificial systems alike. EVS refers to the process of visual exploration of the environment, used

Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006 CHAPTER 1. INTRODUCTION 5

to identify relevant events and detect threats or targets. In nature, solutions to EVS are diverse and range from: (i) non-moving eyes on top of a movable neck as in the barn owl, through (ii) coupled movements of the eyes as in humans, to (iii) independent eye movements as seen in chameleons. Overall, one can identify three models which represent different levels of eye vs. neck movements and correspond to the three examples mentioned above: the owl-like, the human-like and the chameleon-like models. In the world of robotics, common solutions are based on the first two models. For example, a number of systems have been implemented using two cameras that are either fixed or verged [Bernardino and Santos-Victor, 1999, Vijayakumar et al., 2001, Asada et al., 2000]. These two models have a number of important advantages in terms of facilitating visual calculations such as estimating depth from stereo. Yet, by constraining the relative motion of the cameras they sacrifice search efficiency due to a large overlap between the views of the two cameras. The chameleons are probably the best known example in nature for independent eye movements. Eye movements in chameleons are considered unique. The eyes are laterally positioned and can move independently, each over a wide range, while scanning different sections of the environment. Independent eye movements are seen also in fishes. A well studied example is that of the sandlance which has an optical system and eye movements similar to the chameleon [Pettigrew et al., 1999]. Scanning in chameleons is performed by saccades - i.e. large, independent and rapid movements of the eye from one location to another [Land, 1999a]. The range of movement of each eye is almost 180◦ in the pan axis and about 90◦ in the tilt axis. Preliminary results from our research indicate that the global scanning strategy of the chameleon is based on a “negative correlation” principle. More specifically, if one eye scans forward, then with a high probability, the other will point backwards. Motivated by this fact and the observations mentioned above, the objective of our research is to develop, model and control a robot head based on the principles that govern eye movements in chameleons, without imposing a-priori constraints on the relative motion of the cameras. Figure 1.1 depicts our system as compared to a chameleon, and Figure 1.2 depicts a block diagram of the system architecture. Based on such system architecture we will present a novel algorithm for environment scanning and a control strategy for target tracking. The scanning and control algorithm are not limited to a two cameras architecture. A short introduction to both scanning and control problems follows in the next subsections 1.3.1, 1.3.2 and 1.3.3.

Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006 6 CHAPTER 1. INTRODUCTION

Figure 1.1: The robotic head and a chameleon head

1.3.1 Oculomotor Tasks

In this thesis, the vision system has two main goals:

1. Scanning the environment autonomously while searching for targets or threats.

2. Acquiring and tracking targets or threats once they are detected.

It is worth mentioning that in real-life systems the oculomotor system has additional objectives related to the two mentioned above; an obvious example is range estimation to the prey, which is required for a successful capture. We will focus only on the two aforementioned goals and define three main tasks needed to complete them:

Environment Visual Scanning (EVS). EVS refers to the act of moving an optical de- vices, be them eyes in chameleons or cameras in a robotic system, in order to explore the surrounding environment. EVS requires computing a strategy to guarantee that part or the whole 360 degrees of the environment are covered according to some specified priorities. The strategy is then transformed into commands to the optical device.

Image Processing (IP). IP is a computational task that extracts information and fea- tures from the optical images that are relevant for the operation of the system. The IP activities relevant for the work considered in this thesis are target detection and tracking. Here target tracking refers to the image processing task of tracking and not for the task of actually moving the camera.

Target Tracking (TT). TT refers to the act of moving the optical devices in order to keep a detected target within the field of view or a subset of it.

Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006 CHAPTER 1. INTRODUCTION 7

pan−tilt pan−tilt controller controller

Smooth Pursuit Image and Saccade controllers processing

High level scanning algorithm

Figure 1.2: System architecture

This dissertation will not dwell with the IP task and instead concentrate on EVS and TT which involve electro-mechanical motion.

1.3.2 Environment Visual Scanning

The EVS task includes a decision phase in which the region of the environment to be visited is selected, and a low-level motion phase in which the decision is executed by the system in a controlled manner. The low-level control is considered as a special case of target tracking, as will be explained later. The decision phase of the EVS task can be considered as a special case of Search Theory, originally developed during World War II in the context of antisubmarine warfare. Much of the early work on search theory was summarized by L. Stone in his classic book [Stone, 1992]. The topic has received considerable attention in robotics, given its relevance for planning search and rescue missions. For instance, [Eagle, 1984,Eagle and Yee, 1990] studied the problem of searching for a moving target when the path of the searcher is constrained, and [Lau et al., 2005] presented a search strategy for targets in an indoor environment assuming stationary targets only. The approach followed in this thesis builds on the results of [Wong et al., 2005], where the authors deal with the problem of coordinating the effort of several autonomous unmanned vehicles in the context of search and rescue. The scanning algorithm is designed so as to maximize the

Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006 8 CHAPTER 1. INTRODUCTION

probability of discovering any existing target in the environment surrounding the robot. A Markov process is used to model the probability of appearance and the motion of a target in the environment within the detection radius of the robot. The resulting problem is formulated and solved as a Dynamic Stochastic Programming Problem (see, e.g., [Ross, 1983, Bertsekas, 2000]).

1.3.3 Target Tracking

Active target tracking is one of the fundamental tasks in active vision. In general, a good active target tracking can be achieved by combining smooth pursuit and saccades movements [Rivlin and Rotstein, 2000, Sutherland et al., 2001]. In smooth pursuit, the target is tracked continuously with the camera motion controlled so that the image of the target remains inside a predefined window. Saccades are triggered either by large tracking errors (including the case where the image of the target exits the tracking window) or by the request of moving the camera to a new region of attention as part of the EVS task. Usually, the purpose of saccades is to quickly reorient the cameras in order to allow a smooth pursuit. The smooth pursuit/saccades scheme is a natural consequence of the foveated structure of biological eyes found in many creatures in nature including the chameleon. As shown by [Rivlin and Rotstein, 2000, Sutherland et al., 2001], foveated vision and hence smooth pursuit/saccades can be explained in terms of optimal control theory. As a continuation of that work, Model Predictive Control (MPC) is proposed in this thesis as a more suited approach to design each control loop and their interaction. MPC is a control technique that, until recently, was limited to relatively slow processes, as found in the chemical industry. The solution proposed uses recent developments in MPC that moved most of the computational work to offline and enabled the use of this technique in fast process such as actively tracking a target in real-time.

1.4 Thesis Outline

In chapter 2 the visual system of the chameleon is presented in detail. Chapter 3 summa- rizes related work in the different aspects of this dissertation. In chapter 4 the system for tracking the eyes of chameleons is presented. In chapter 5 the high-level scanning algo- rithm is formulated and solved. Chapter 6 presents the target tracking algorithm, which

Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006 CHAPTER 1. INTRODUCTION 9

includes the smooth pursuit controller and the saccade controller. Chapter 7 contains simu- lation and experimental results of the proposed approach for scanning and target tracking. Finally, conclusions and future work are presented in Chapter 8.

Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006 Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006 Chapter 2

The Visual System of the Chameleon

Chameleons are arboreal that eat mostly insects. They are rather slow and devel- oped, as a compensation, a unique visual system combined with a specialized tongue in order to catch a prey. The chameleon’s eyes are positioned laterally on the head and are able to move independently over a wide range. This enables the chameleon to quickly scan the environment while improving its camouflage, since only its eyes are moving. Once prey is detected, the chameleon directs its head and two eyes toward the prey, and shoots its long, sticky tongue at it. The tongue of the chameleon has a special design for catching the prey. Usually, tongue prehension mechanism is based on surface phe- nomena. In the chameleon however, the prehension mechanism includes a suction force which enables it to catch larger prey items [Herrel et al., 2000]. The tongue is projected m ballistically with accelerations of up to 500 s2 . Regular muscle is not able to produce such accelerations, hence, the chameleon developed a sophisticated energy-storage-and-release mechanism (catapult-like) [de Groot and van Leeuwen, 2004]. According to the authors, the tongue contains sheaths of elastic material. Muscles contract and load the sheaths with elastic energy, which is then released to produce the high acceleration needed for the projection. The tongue has to be projected to the exact distance where the prey is. Undershooting will result in a prey not caught, and overshooting is not desirable for several reasons. One is that the tongue might be damaged if it hits a hard surface that might be positioned behind the prey, second is that the prey may be pushed out of reach by the tongue, and moreover, shooting the tongue involves spending high energy. In [Harkness, 1977], the author found that chameleons estimate the distance of the target by accommodation cues

11

Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006 12 CHAPTER 2. THE VISUAL SYSTEM OF THE CHAMELEON

Figure 2.1: A map of the ganglion cells in the chameleon eye. The fovea is marked with a star and the horizontal streak is visible. Image taken from [Hassni et al., 1997]

(although both of the eyes are pointed towards the prey and stereo method might also be used). In the experiments, negative and positive lenses were placed in front of the eyes of chameleons. The resulting shoots of the tongue were closer or farer from the position of the prey according to the diopter of the lenses. Chameleons were also able to catch prey with one eye covered, although with lesser accuracy. The chameleon’s eye has the basic structure of a simple chambered eye common to all vertebrates. The retina is composed solely of cones photoreceptors [JA et al., 1988] (mentioned in [Bowmaker et al., 2005]) and has a fovea. Figure 2.1 presents a map of the ganglion cells on a chameleon retina. The fovea is seen in the center of the retina while a horizontal area of high distribution of cells is also seen. This area, usually called “horizontal streak” is common to many lizards. In [Bowmaker et al., 2005] it was found that chameleons (at least the four species that were examined) can see four different colors: green, light-blue, dark-blue and purple-UV. The author of [Bowmaker et al., 2005] pointed out that all the four species had similar photoreceptors although they live in different areas and conditions. The lens in the chameleon’s eye, unlike in other vertebrates, has a negative power [Ott and Schaeffel, 1995]. This creates a magnification of the image on the retina, which enables

Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006 CHAPTER 2. THE VISUAL SYSTEM OF THE CHAMELEON 13

(a) Typical Lizard (b) Chameleon

Figure 2.2: Comparison of the refractive power contributed by the lens and the cornea in a chameleon eye and a typical lizard eye. The ray path ended on the retina is due to the combined refractive power. The other ray path is due to the refractive power of the cornea alone. (a) Typical lizard: The lens contribute half of the total refraction power. (b) Chameleon: The lens is negative and so diverges light, which is overfocused by the cornea. Images taken from [Land, 1999b].

a more accurate measurement of image focus and probably supports more accurate distance measurement from accommodation cues. See figure 2.2 for a comparison of the refractive power of the combined lens and cornea between a chameleon eye and a typical lizard eye. Before shooting the tongue, the eyes alternate between states of in-focus and out-of- focus [Ott et al., 1998] which may serve to perform distance estimation by focus/defocus algorithms. Figure 2.3 illustrates this situation. As mentioned in the introduction, in order to scan the environment, chameleons per- forms large, independent saccadic movements of the eyes. Once a target is detected the head axis is directed towards it and both eyes fixate it. If the target starts to move the chameleon tracks it using a combination of head movements and eye movements [Ott, 2001, Flanders, 1985]. In [Ott, 2001] the author highlights that while the saccadic move- ments are independent during the search for the prey, they are synchronous during the tracking of the prey. The chameleon is the only vertebrate known to switch from indepen- dent to synchronous saccades. Of special relevance to the scanning strategy is the Field-of-View (FOV) of the eye.

Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006 14 CHAPTER 2. THE VISUAL SYSTEM OF THE CHAMELEON

Figure 2.3: Focus/De-focus movements of the chameleon eyes. The x-axis is time in seconds, the y-axis is the refractive power of the eye in diopters (D). The target is located as a distance of −25D. First, the right eye was out of focus and the left eye was in focus. Then, the left eye was coupled with the right eye and both were out of focus. The left eye then returned to its original focus, while the right eye was quickly coupled with it to a focused position. Image taken from [Ott et al., 1998].

However, to the best of our knowledge no studies has so far addressed this subject. Al- though we can’t state the exact FOV, we can make the observation that the expected FOV should be smaller than other vertebrates based on the fact that chameleons have negatively powered lens. In addition, the protruding eyes are covered by a protective skin (merged eyelids) within which only a small aperture is opened for vision. This inevitably restricts the FOV. A preliminary results from our research regarding eye movements in chameleons indicate that chameleons use “negative correlation” between the positions of the eyes while scanning the environment. That is, when one eye is pointing to the forward direction, with high probability the other eye will point to the backward direction and vice-versa.

Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006 Chapter 3

Related Work

3.1 3D Pose Estimation

For analyzing the direction of the gaze of the chameleon’s eyes, we use two different meth- ods, both are usually denoted as 3D Pose Estimation. First, the 3D pose of the head of the chameleon should be determined. This pose is the position and orientation of a rigid body in some reference coordinates system, and is described by six parameters, three for the position and three for the orientation. Given the pose of the head, the direction of gaze of the eyes is determined relative to the head. The direction of each eye is described by two additional parameters: the two angles which describe a direction in a 3D world. This can be thought of as determining the pose of an articulated body. That is, estimating the parameters of any degree of freedom that defines the pose of an articulated body in the world. We will refer to the first task as estimating the pose of a rigid body, and to the second task as to estimating the pose of an articulated body. The goal of estimating the pose of a rigid body can be achieved in several different manners. One approach is to use eigenspaces of the body appearance as in [Srinivasan and Boyer, 2002, Darrell et al., 1996] which deal with the pose of a human face. The pose is found by finding the closest eigenface to the image or by interpolating between the eigenfaces. Some of the methods use only regular intensity images, and some combine depth-based images [Morency et al., 2003]. Sometimes the pose estimation is combined with detection or recognition as in [Cootes et al., 2000]. Other appearance based method can use SVMs (Support Vector Machines), [Kwong and Gong, 2002] or a graph based approach [Fleuret and Geman, 2002]. On the other hand, there are the features based

15

Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006 16 CHAPTER 3. RELATED WORK

methods. In these methods pose is calculated using correspondences between a set of 2D points in the image and a set of 3D points in the object. They can be divided to analytical, closed form solution, and to iterative solutions. In the closed form solutions, [Haralick et al., 1991, Dhome et al., 1989] gives the basic solutions to the three-points pose estimation problem. For a larger number of points, usually some method to eliminate outliers, such as RANSAC [Fischler and Bolles, 1981], is used. Other works, extended the closed form solution to general number of points [Quan and Lan, 1999], or to a combination of lines and points [Ansar and Daniilidis, 2003]. Other geometric-based methods use iterative methods to minimize the error between the model points and the points identified in the image. These methods use nonlinear optimization algorithms such as Gauss-Newton or Levenberg-Marquardt [Haralick et al., 1989]. Usually the representation of parameters of the pose is minimal, using for example Euler angles, but sometimes minimization is done using other representation like Quaternions [Ude, 1998]. Most nonlinear methods may converge to local minimums and hence need some sort of initializations in order to start the optimization process close enough to the solution. One exception is presented in [Lu et al., 2000]. Estimating the pose of a rigid body from multiple cameras was also investigated. Usu- ally, the multiple cameras are used as a stereo device and the 3D location of each model point is determined using a stereo algorithm. Then, based on the 3D location, the pose of the body is determined [Delamarre and Faugeras, 1999,Tonko and Nagel, 2000]. In [Chang and Chen, 2004], the authors combine between determining the 3D location of a point which is visible in more than one image, and between minimizing the three-dimensional error of points which appear only in one image. Few works use global minimization of the error in all images, one such example is presented in [Martin and Horaud, 2001]. In our work, linear method is used only for the first frame, in order to get the initial solution for the nonlinear method. In all other frames, a Gauss-Newton iteration is used which is initialized using the pose from the last image. The nonlinear optimization method was chosen to get the maximum accuracy. Our pose estimation algorithm works in a multiple-view environment, composed of cameras and mirrors. The minimization is global and considers all the features in all the views. The task of estimating the pose of an articulated body can also be accomplished us- ing different methods. Again, some methods are view-based [Black and Jepson, 1996] and use eigenspace representation. Some methods use silhouettes [Agarwal and Triggs,

Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006 CHAPTER 3. RELATED WORK 17

2004, Fan and Wang, 2004] and some use geometric characteristics of the joints. The three-dimensional model of the body can be assembled from volumetric structures such as cylinders [Delamarre and Faugeras, 1999], or it can be just a skeletal model of the limbs length [Drouin et al., 2003]. For the first case, the method is usually appearance-based, while in the latter the method is usually based on finding features which are located on the joints. These features are sometimes artificial [Nickels and Hutchinson, 2001]. The model of the body can be learned or updated during the tracking using various methods such as Kalman filtering as in [Nickels and Hutchinson, 2001], where the limbs lengths are updated from image to image. In our system, the articulated pose is reduced to that of finding the direction of the two eyes. A geometric approach is used and the model consists only the center of the eye and its radius to the eyelid. The direction of the eye is found by finding the eyelid, and based on the geometric model. In the system discussed in this thesis, mirrors are used as a replacement for additional cameras in order to achieve a complete coverage of views of the chameleon. The use of mirrors in order to view occluded parts of an object was presented in [Mitsumoto et al., 1992], where the authors deal with 3D reconstruction from a perspective 2D image using mirrors. In their paper, the mirrors are used for forming symmetrical relations between the direct image and the mirror images. By finding correspondences between the direct image and mirror image, the 3D shape is constructed by means of plane symmetry recovering method, using a vanishing point. The 3D reconstruction is applied to polyhedral objects. In [Baker and Nayar, 1998] the authors present how the mirrors in a catadioptric sensor should be arranged in order for the sensor to have a single viewpoint. A catadioptric sensor uses a combination of lenses and mirrors placed in a carefully arranged configuration to capture a much wider field of view. When designing a catadioptric sensor, the shape of the mirror(s) should ideally be selected to ensure that the complete catadioptric system has a single effective viewpoint. The authors derive the complete class of single-lens single- mirror catadioptric sensors which have a single viewpoint and an expression for the spatial resolution of a catadioptric sensor in terms of the resolution of the camera used to construct it. In [Gluckman and Nayar, 2001] the authors analyze the geometry and calibration of catadioptric stereo with two planar mirrors. They present that the relative orientation of a catadioptric stereo rig is restricted to the class of planar motions thus reducing the number of external calibration parameters from 6 to 5. They also derive the epipolar

Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006 18 CHAPTER 3. RELATED WORK

geometry for catadioptric stereo and show that it has 6 degrees of freedom rather than 7 for traditional stereo. Furthermore, they show how focal length can be recovered from a single catadioptric image solely from a set of stereo correspondences. In a recent paper [Sturm and Bonfort, 2006], the authors consider the task of computing the pose of an object relative to a camera, for the case where the camera has no direct view of the object. They use a setup which consists of a camera and a calibration grid, without the camera having a direct view of the grid. Planar mirrors are placed such that the camera sees the calibration grid’s reflection. In our paper, mirrors are used as a replacement for additional camera, where each mirror viewed in an image is considered as a virtual camera. The geometry and calibration of this virtual camera is presented. Of course, pose estimation is a related problem to extrinsic calibration, and also here a global minimization procedure is used.

3.2 Environment Visual Scanning

In robotics, the use of multiple cameras on a robot is usually done in a coupled or fixed manner. In [Asada et al., 2000], for example, the authors uses binocular stereo vision to present an extension of adaptive visual servoing for unknown moving object tracking. Their method does not need the knowledge of camera parameters. They use only one assumption that the system need stationary references in both images by which the system can predict the motion of unknown moving objects. Another example can be found in [Bernardino and Santos-Victor, 1999] where the authors present an active binocular tracking system. They use a space variant sensor geometry implementing a focus of attention in the center of the visual field. They show that the kinematic relations in their system are decoupled and that system dynamics can be expressed in image feature space. In [Vijayakumar et al., 2001] the authors present a biologically inspired artificial ocu- lomotor system on an anthropomorphic robot. They investigate the computational mech- anisms for visual attention, where stimuli in the environment excite a dynamical neural network that implements a saliency map. The presented system computes new targets for the shift of gaze, executed by the head-eye system of the robot. A work more closer to our approach was presented in [Horaud et al., 2004], where the authors use a combination of a wide-angle, static camera and a narrow-angle, active

Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006 CHAPTER 3. RELATED WORK 19

camera. The static camera maintains a large field of view enabling to monitor a large area with low resolution, while the active camera can capture details with high-resolution. The authors address the problem of how to cooperate between the active and static camera. No similar system was found to take advantage of the multiple actively controlled cameras for purposes of detecting events in the environment. However, a related field is the one of “search theory” which was established during World War II in the context of anti-submarine warfare. This field is now used for path planning and searching as can be seen for example in [Wong et al., 2005], where the authors present a Bayesian approach to the problem of searching for multiple lost targets in a dynamic environment by a team of autonomous sensor platforms. In their paper, the probability density function (PDF) for each individual target location is maintained by an independent instance of a general Bayesian filter. The team utility for the search vehicles trajectories is given by the sum of the “cumulative” probability of detection for each target. A dual-objective switching function is also introduced to direct the search towards the mode of the nearest target PDF when the utility becomes too low in a region to distinguish between trajectories. Another example can be found in [Eagle and Yee, 1990,Eagle, 1984], where the authors deal with the case of a target moving in discrete time among a finite number of cells according to a known Markov process. The searcher must choose one cell in which to search in any time period. The set of cells available for the search depends upon the cell chosen in the last time period. They solve the problem of finding the searcher path that maximize the probability of detecting the target. In [Eagle, 1984] the problem is formulated as a partially observable Markov decision process and a finite horizon solution is presented. Where in [Eagle and Yee, 1990] the problem is solved using a branch-and-bound procedure. In [Lau et al., 2005] the authors present an algorithm for autonomous search that minimizes the expected time for detecting multiple targets present in a known indoor environment. Their technique makes use of the probability distribution of the target(s) in the environment, thereby making it feasible to incorporate any additional information, known a-priori or acquired while the search is taking place, into the search strategy. The environment is divided into a set of distinct regions and an adjacency matrix is used to describe the connections between them. The costs of searching any of the regions as well as the cost of travel between them can be arbitrarily specified. The search strategy is derived using a dynamic programming algorithm. The algorithm is illustrated using an example based on the search of an office environment.

Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006 20 CHAPTER 3. RELATED WORK

3.3 Active Target Tracking

Several characteristics of the human oculomotor system have been suggested to be useful also for active vision mechanisms. Among others, foveal vision and a tracking scheme based on two different modes, called smooth pursuit and saccade, have often been postulated or implemented. In [Rivlin and Rotstein, 2000] the authors formulate a setup in which the benefit of implementing these schemes can be evaluated in a systematic manner, based on control considerations but incorporating image processing constraints. The advantage of using foveal vision is evaluated by computing the size of the foveal window which will allow tracking of the largest possible class of signals. By using linear optimal control theory, this problem can be formulated as a one-variable maximization. Also, foveal vision leads naturally to smooth pursuit, defined as the performance that can be achieved by the controller resulting in the optimal size of the foveal window. This controller is relatively simple (i.e., linear, time-invariant) as is to be expected for this control loop. When smooth pursuit fails a corrective action must be performed to re-center the target on the fovea. The use of smooth pursuit and saccades for tracking is also used in [Sutherland et al., 2001], where the authors present the mechanical hardware and control software of an active vision system, which uses a simple and compact controller for real-time tracking applications. The controller consists of two behavioral subgroups, saccade and smooth pursuit and is optimized by using a single trapezoidal profile motion algorithm. In [Milios et al., 1993,Christensen, 1993] the smooth pursuit and saccades are considered as separate mechanisms. The smooth pursuit is using PID’s and some predictors, while the saccade controller is based on an open-loop and linear prediction. The initialization of the saccades is based on a positional error larger than some threshold. Some of the works use Kalman filtering in order to compensate for the delays introduced by the systems [Cr´etual and Chaumette, 2001]. The work in this thesis on target tracking can be seen as a continuation of the work presented in [Rivlin and Rotstein, 2000], where in this work offline Model Predictive Control is used for both the smooth pursuit and saccade controllers. Model Predictive Control (MPC) is a control technique that uses the model of the plant to predict the future evolution of the system [Mayne et al., 2000]. Based on this prediction, at each time step t a certain performance index is optimized under operating constraints with respect to a sequence of future input moves. The first of such optimal moves is the control action applied to

Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006 CHAPTER 3. RELATED WORK 21

the plant at time t. At time t + 1, a new optimization is solved over a shifted prediction horizon. This leads to a feedback control policy. In the past, MPC was limited to relatively slow processes, since solving an optimization problem in real-time is computational expensive. Recent developments have shown that it is possible to solve the optimization problem offline as a function of all feasible states. In particular, for linear and piecewise-linear systems, the feedback solution derived from an MPC scheme with a linear or quadratic performance index and linear constraints is shown to be a piecewise linear function of the states [Borrelli et al., 2005,Bemporad et al., 2002]. Therefore, MPC online implementation resorts to a simple lookup-table evaluation and allows the use of MPC scheme in real-time systems as the one considered in this manuscript. Two of the works that allowed moving most of the computation to offline are [Bemporad et al., 2002, Bemporad et al., 2000]. In those papers the authors considered discrete- time linear time-invariant systems with constraints on inputs and states. They developed an algorithm to determine explicitly, the state feedback control law which minimizes a performance criterion, where the performance index can be either 1−norm, ∞−norm or quadratic. The control law was shown to be piece-wise linear and continuous for the finite horizon problem. Thus, the online control computation reduces to the simple evaluation of an explicitly defined piecewise linear function. Their technique is attractive for problems where the computational complexity of online optimization is prohibitive. The interested reader is referred to the manual of the Multi-Parametric Toolbox [Kvasnica et al., 2004] for a quick introduction to the topic. A detailed description of latest results on the topic can be found in [Borrelli, 2003].

Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006 Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006 Chapter 4

Tracking the Eyes of the Chameleon

In this chapter, the method to track the chameleon’s eyes is presented. Since the eyes’ direction of gaze is determined relative to the head, the method to find the head’s pose is first presented in section 4.1. Next, the method of finding the direction of the eyes relative to the head is presented in section 4.2. Since planar mirrors are used, the geometry and calibration of planar mirrors is presented in section 4.3. Finally, some results of the experiments are presented in section 4.4.

4.1 Tracking the Chameleon Head

4.1.1 Rigid Body Pose

For any rigid body positioned in the world, there is a pose which describes the location and orientation of the body coordinates system relative to the world coordinates system. The pose is defined by six parameters. Three parameters for the location, and three for the orientation. The location of the rigid body frame 1 is described by the column vector

T = [xT , yT , zT ] which is the location of the origin of the rigid body frame in the world frame. The orientation of the rigid body frame is composed from the rotation angles of the body around the three axes of the world frame, sometimes denoted as yaw, pitch, and roll. This representation can be converted to a [3 × 3] rotation matrix R. Note that although the rotation matrix has nine elements, it depends only on the three rotation parameters mentioned above. There are several other representations for the orientation,

1frame will be used as a synonym to coordinates system

23

Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006 24 CHAPTER 4. TRACKING THE EYES OF THE CHAMELEON

ZB

OB YB C Y XC XB ZC OC

W Z YW XW R (R, T) ( c, Tc) OW

Figure 4.1: Body, world and camera coordinate systems

such as quaternions, Euler angels and axis-angle representations. However, all of them are interchangeable and depends only on three parameters. We will usually refer to the pose as the pair (R, T). Given that the body frame is located in (R, T) with respect to the world frame, and given a point X = [x, y, z]T in the body frame, it is seen in the world frame as

Xw = RX + T. (4.1)

Note that the point did not moved, it is just described in a new coordinate system. As a side note, the reasoning for treating the chameleon’s head as a rigid body is presented. When referring to the head of chameleons as a rigid body, we will limit the discussion to the upper part of the head. The upper part is structured from a stiff bone (the scalp) and the skin is firmly attached to it. The eyes of chameleons are located on the upper part of their head and so choosing to track the head’s pose is natural. Note that the bottom part of the head, which is the lower jaw, is ignored and does not considered as part of the head, since it can move with respect to the rest of the head.

4.1.2 Camera Pose and Model

The camera itself has a pose with regard to the world frame. Given a point Xw in the 3D

world frame and given the camera pose is (Rc, Tc), the point is seen as Xc = RcXw + Tc in

the camera frame. Note that the pose (Rc, Tc) describes the pose of the world frame in the camera frame, and not the opposite. See figure 4.1 for an illustration. In our system, the camera frame and the world frame are stationary, while the body frame is moving. Our goal is to find the transformation (R, T) for each image in the video.

Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006 CHAPTER 4. TRACKING THE EYES OF THE CHAMELEON 25

A camera model describes how a point in the world is seen in the image coordinates system. The camera model used is the CCD camera model [Hartley and Zisserman, 2004], with additional three parameters for radial distortion and two parameters for tangential distortion. For simplicity, the dependency on the distortion parameters will not be pre- sented. Although the distortion parameters introduce additional computational difficulty, they do not present conceptually new problems. In the CCD camera model, the intrinsic camera matrix is αx x0 K =  αy y0  (4.2)  1    where αx and αy determine the focal length of the camera in pixel units, x0 and y0 form the principal point of the image in pixel units, and empty elements are zeros. A point T Xc = [x, y, z] in the camera coordinate system, is projected to an homogeneous image point by 0 0 T T Xh = KXc = [u , v , s] = s [u, v, 1] (4.3)

u0 v0 where u = s and v = s . This homogeneous point defines the pixel point x = [u, v]T (4.4)

given in the image coordinates.

4.1.3 3D Pose From 2D Data

Our goal is to find the transformation (R, T), that describes the pose of the head, for each image in the movie. Let {X1, X2, . . . , Xn} be a set of n points in the rigid body frame. This points are the model. The task of finding the pose of the model from the projection of points into images was widely studied. Some of the methods are linear, which produces fast but less accurate results. Some are non-linear which produces more accurate results but are more computationally expensive. Another disadvantage of the non-linear methods is that they can be trapped into local minimas and not converge to the correct solution. In our setup, linear method is used only to initialize the head pose in the first image and from then on only non-linear methods are used. Hence, only the non-linear method in use will be presented. In a specific image there is a set of m visible points {xˆ1, xˆ2, . . . , xˆm} out of the n points in the model. Given a pose (R, T) of the rigid body in the world frame, and by combining

Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006 26 CHAPTER 4. TRACKING THE EYES OF THE CHAMELEON

the transformations of the body and of the camera, the model point Xi is transfered into i the point Xc in the camera frame by

i R R Xc = c( X + T) + Tc. (4.5)

i ˆ i Given a point in the image xˆ , it is transfered to the normalized image point Xn by

ˆ i K−1 i Xn = xˆ . (4.6)

Note that here xˆi is treated as an homogeneous vector. That is, 1 is added as the third element of the 2D vector xˆi. This convention will be repeated whenever a [3 × 3] matrix will multiply a 2D vector. In this thesis the pose of the body is found by minimizing the geometric error in the 3D i ˆ i camera frame. The image point xˆ defines the back-projecting ray L(t) = Xnt, where t is i a scalar. The distance between the back-projecting ray and the model point Xc is defined i , ˆ i by a point on the ray which is closest to the point Xc. For brevity, we define U Xn. So the ray will be denoted by L(t) = Ut. For the next calculation we denote by X the point , i 2 in the camera frame, X Xc. A scalar t∗ that minimizes kUt − Xk2 is found by taking the derivative, d kUt − Xk2 = 0 ⇒ dt 2 d (Ut − X)T (Ut − X)  = 0 ⇒ dt  T  2 (Ut − X) U = 0 ⇒ UTUt − XTU = 0 ⇒ XTU t = (4.7) ∗ UTU

i i Finding t∗, it defines the point X∗ on the line which is closest to Xc,

T Xi U Xi = Ut = U c . (4.8) ∗ ∗ UTU

i With X∗, the error vector of the i model point is defined by

i i i D = X∗ − Xc. (4.9)

Note that Di is a column vector of 3 elements.

Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006 CHAPTER 4. TRACKING THE EYES OF THE CHAMELEON 27

Let d be a column vector of 3m elements, which holds the errors for all visible points. Let p denotes a column vector of the six parameters of the pose. We use the axis-angle representation of the orientation as in [Bouguet, 2005]. The axis-angle representation is a minimal representation in which the three parameters are treated as a column vector. The direction of the vector is the axis of rotation and its length is the angle of rotation around the rotation axis. Converting the three parameters to the rotation matrix is done using the Rodrigues formula which will not be presented here. The 3D pose of the object will be found by minimizing the errors in the least squares . The optimization function is thus defined by 1 f(p) = dTd (4.10) 2 Such minimization can be done by the known Gauss-Newton method which will be presented next for completeness. The Taylor expansion of f around p is 1 f(p + ∆p) ≈ f(p) + ∇f(p)T∆p + ∇2f(p)T∆p + · · · (4.11) 2 where ∇f(p) is the gradient of f, and ∇2f(p) is the Hessian. Expressing the gradient in terms of d gives ∂f ∂p1  ∂f  f ∂p2 T ∇ (p) = . = J(d) d (4.12)  .     ∂f     ∂p6  where J(d) is the Jacobian matrix of d.  

∂d1 ∂d1 · · · ∂d1 ∂p1 ∂p2 ∂p6  ∂d2 ∂d2 · · · ∂d2  ∂p1 ∂p2 ∂p6 J(d) = . . . . (4.13)  . . .. .     ∂d3m ∂d3m ∂d3m   · · ·   ∂p1 ∂p2 ∂p6    The Hessian can be approximated by J(d)

∇2f(p) ≈ J(d)TJ(d). (4.14)

We look for a solution pˆ that minimizes f(pˆ). Such a solution will satisfy two conditions. One is the gradient should be zero ∇f(pˆ) = 0 (4.15)

Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006 28 CHAPTER 4. TRACKING THE EYES OF THE CHAMELEON

and the second condition is that the Hessian should be positive semi definite, so the solution will be local minima and not local maxima,

∇2f(pˆ) ≥ 0. (4.16)

Denoting pˆ = p + ∆p and expressing the condition on the gradient using the Taylor expansion gives ∇f(p + ∆p) ≈ ∇f(p) + ∇2f(p)∆p = 0. (4.17)

Solving for ∆p

−1 ∆p = − ∇2f(p) ∇f(p) −1 (4.18) = − J(d)TJ(d) J(d)Td

which determines the search direction. The solution is just an approximation and so the procedure is repeated in an iterative manner until convergence. Given that the model points are projected into several cameras, the extension of the optimization problem is as follows. Given there are c cameras, and that the vector of errors of the j-th camera is dj, then the new vector of all errors d is just the concatenation of all the error vectors. The calculation of the Jacobian matrix of each camera isn’t changed either, and the global Jacobian is again a concatenation of the Jacobians of each camera.

d1 J(d1)  d2   J(d2)  d = . J(d) = . (4.19)  .   .       c   c   d   J(d )          The overall optimization procedure does not change.

4.1.4 Minimization Details

In this section some parts of the calculations of the Jacobian for the minimization will be ∂ (a) presented. Let denotes the Jacobian matrix of a with respect to b. To calculate the ∂b Jacobian matrix it is sufficient to show the derivative of each element of the error vector of one model point with respect to all parameters. Recall, that the error vector of the i i i i model point was presented in equation (4.9) and is D = X∗ − Xc. For brevity, the point

Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006 CHAPTER 4. TRACKING THE EYES OF THE CHAMELEON 29

index i will be removed: D = X∗ − Xc. The calculation is based on the chain rule for derivatives.

D depends on the pose parameters through both X∗ and Xc. The first step of the

calculation is to take the derivative of D with respect to Xc. Recall from equation (4.8) XcTU that X∗ = U UTU , where U depends on the image point xˆ and on the internal calibration

of the camera. Hence, the Jacobian, or derivative, of D with respect to Xc is

T ∂ (D) UU I = T − (4.20) ∂Xc U U where I is the 3 × 3 identity matrix.

Next the Jacobian matrix of Xc with regard to the pose parameters p will be presented. Recall that p defines the pair (R, T). The Jacobian can be written explicitly as

∂ (X ) ∂ (R (RX + T) + T ) c = c c ∂p ∂p ∂ (RX + T) = R (4.21) c ∂p ∂ (RX) ∂ (T) = R + c ∂p ∂p  

The vector of parameters p is separated to the vector pR, holding the three rotation

pR parameters, and the vector pT, holding the translation parameters, p = . Since " pT # each column in the Jacobian holds the derivatives with respect to one parameter, the

Jacobians with respect to pR and to pT can be calculated and then combined, column- wise, to create the full Jacobian:

∂ (·) ∂ (·) ∂ (·) = . (4.22) ∂p ∂p ∂p  R T  Therefore, if needed, the Jacobians of specific parameters can be presented, instead of

the full Jacobian. Since RX is not dependent on pT, and T is not dependent on pR, ∂(RX) ∂(T) ∂(T) I ∂(RX) only the Jacobian ∂pR and ∂pT are needed. Clearly, ∂pT = . ∂pR can be written ∂(RX) ∂(RX) ∂(R) ∂(R) as ∂pR = ∂R ∂pR , where ∂pR is given from the Rodrigues formula, and will not be ∂(RX) presented here. ∂R is XT ∂ (RX) =  XT  (4.23) ∂R T  X     

Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006 30 CHAPTER 4. TRACKING THE EYES OF THE CHAMELEON

given the derivatives are taken row-wise from R. ∂(D) From the results in the last paragraph and ∂Xc given in equation (4.20), the Jacobian

matrices of D with respect to pR and pT are

∂ (D) ∂ (D) ∂ (X ) = c (4.24) ∂pR ∂Xc ∂pR ∂ (D) ∂ (D) ∂ (X ) = c . (4.25) ∂pT ∂Xc ∂pT

4.1.5 Tracking Features

4.1.5.1 Feature Validity and Appearance

In order to make the tracking of the chameleon’s head easier, artificial features were at- tached to it. The artificial features are stickers of small white pieces of paper with a black rectangle in the middle of each sticker. See figure 4.2 for a chameleon with stickers attached to it. The rectangles were chosen since the four corners of it provide four distinguishable and accurate features. A key element of the tracking algorithm is the use of the expected appearance of a feature in the image, in order to determine if the feature is expected to be visible in the image, and to locate it if it is visible. For each model point Xi, an ordered pair of two neighboring model points (Xn1, Xn2) is defined. This ordered pair creates the black corner which is the feature. See figure 4.3 for an illustration. Given the pose of the object, and the camera model, the model points are reprojected to the image. The model point Xi is projected to the point xi in the image, and the neighboring points are projected to the ordered pair (xn1, xn2). Based on the location of the three points in the image, the vectors of the sides of the corner, are calculated by

n1 i n2 i v1 = x − x v2 = x − x (4.26)

and their lengths are denoted l1 and l2 respectively. Based on these vectors, the angle of the corner is calculated using:

v Tv kv × v k cos(θ) = 1 2 sin(θ) = 1 2 2 (4.27) l1l2 l1l2 where × denotes the cross product. Note that the order of the points in the pair is important, since a reverse order of the cross product gives a negative result. This also

Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006 CHAPTER 4. TRACKING THE EYES OF THE CHAMELEON 31

Figure 4.2: A chameleon with the artificial features attached to it

helps to ignore occluded features. When the back side of the black rectangle is facing the camera, the feature can not be detected. Since θ will be negative, the feature will not be considered as visible either. For the feature to be valid, three conditions should be met:

1. θmin ≤ θ ≤ θmax

2. lmin ≤ l1

3. lmin ≤ l2

where θmin, θmax and lmin are determined so the corner will be detectable by the corner detector in use. Valid features are the features that are expected to be visible in the image. Based on the reprojected points, the appearance of the corner can be determined. Let T Iw be an image window with size (2n + 1, 2n + 1). The center pixel of the window, [n, n] , T corresponds to the reprojection of X. A pixel [i, j] creates an angle θp with v1 and a

length lp from the center of the window. If the conditions about the angle and the length are met, the pixel is set to be black, if not, it is set to be white. This creates a window of the image with a black corner which will be used to find the corner in the movie image. The resulting pattern of the corner will not be very realistic, hence it is smoothed using a gaussian filter. The pattern is also re-centered so a corner detector (which will be presented

Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006 32 CHAPTER 4. TRACKING THE EYES OF THE CHAMELEON

Xn1

Xi YC xn1 XC v1 xi θ

v2 Xn2 xn2 ZC

Figure 4.3: The parameters of the appearance of a feature.

next) will find the corner in the pattern exactly in the middle of the window. See figure 4.4 The tracking is started by marking manually the features in the first frame of the movie, hence the problem of initialization is solved. In each frame, tracking the features is divided into three phases:

1. Tracking Visible Features

2. Removing Bad Features

3. Recovering New Features

Each of them will be explained in the following sections.

4.1.5.2 Tracking Visible Features

For all the features detected in the last image, a correlation map between the expected appearance of the feature and the current image is calculated. For each feature, the correlation map is centered in the location of the feature from last image. Its size is wide enough so the feature will be inside it. The size of the correlation map depends on the characteristics of the movements of the chameleon, and on the frame rate of the video. The correlation map is calculated using the normalized cross correlation measure. After the correlation map is calculated, the maximum of the map is found. The max- T imum location [xc, yc] serves as the basis for a refinement of the feature point by using

Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006 CHAPTER 4. TRACKING THE EYES OF THE CHAMELEON 33

(a) Raw corner pattern (b) Smoothed pattern (c) Re-centered pat- tern

Figure 4.4: The pattern of a feature - (a) Raw pattern, (b) Pattern smoothed using a gaussian filter, and (c) The final, re-centered pattern

T a subpixel harris corner detector, which returns the point [xh, yh] . To eliminate errors in T the harris corner detector, the value of the correlation map in [xh, yh] is checked and if it T T is lower of some threshold, or if the distance between [xh, yh] and [xc, yc] is larger from a threshold, the feature is considered as “lost”, and is not used.

4.1.5.3 Removing Bad Features

Based on the new locations of the features in the current image the pose of the object (chameleon’s head) is calculated, as presented in section 4.1.3, where the pose in the last image is served to initialize the minimization process. Given the pose, the length of the geometric 3D error is calculated for each visible feature. If the error’s length is larger from a threshold, the feature is considered as an outlier and is removed. The calculation of pose and the removal of bad features is then repeated until no features are removed. In our setup, features were considered as outliers from time to time, usually only one feature was removed in this process.

4.1.5.4 Recovering New Features

The validity of all the non-visible features is calculated as explained in section 4.1.5.1. For any feature that is found to be valid, the tracking algorithm in section 4.1.5.2 is used to search the feature starting from its expected location (the reprojection point). If a feature was found, it is not automatically added to the visible features, but first the pose of the

Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006 34 CHAPTER 4. TRACKING THE EYES OF THE CHAMELEON

leye

Xec

Figure 4.5: Eye Model - Xec is the center of rotation of the eye in the head frame. leye is the radius of the eye from the center to the eyelid.

object is recalculated and the feature is considered as visible if the least squares error, from the minimization process, does not grow up by more than a predefined factor.

4.1.6 Tracking the Chameleon Head - Summary

The algorithm for tracking the chameleon head was presented. The chameleon head is treated as a rigid body, artificial features are attached to it and their 3D model in the head frame is learned. The 3D pose is found by the Gauss-Newton non-linear minimization of the geometric 3D errors considering all the views of the head. By using the model-based approach, the appearance and validity of the features in the images can be calculated which improves the features tracking performance. For tracking the features, a strategy of invalidating and validating the features based on their expected appearance is used. Due to this strategy there is always a set of trackable features to calculate the pose from.

4.2 Tracking the Eyes

4.2.1 Calculating Eye’s Direction

Our goal is to find the direction of the eye relative to the head. A model-based approach is used. In our simplified model, an eye of a chameleon has a spherical shape and is rotated around the center of the eye inside the orbital cavity. The model of the eye will consist

the center of the eye Xec in the head frame, and the radius of the eye leye from the center of eye to the eyelid. See figure 4.2.1 for an illustration.

Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006 CHAPTER 4. TRACKING THE EYES OF THE CHAMELEON 35

Given the eyelid is found in location x˜el in the image, the back-projected ray of the −1 eyelid is defined by K x˜el. Given that the head’s pose is (R, T) and that the eyelid is

found in a camera positioned at (Rc, Tc). Then the back-projected ray as seen in the head frame is described by the line equation

L(t) = L0 + Nt (4.28)

where L0 is the location of the camera center in the head frame, which is a point on the line, and N is the direction of the ray in the head frame. They are calculated by

−1 −1 −1 −1 L0 = R Rc 0 − Rc Tc − R T (4.29) −1 −1 −1 −1 −1 N = R Rc K x˜el − Rc Tc − R T − L0 (4.30) The eyelid must lie on the line L(t) in the head frame, since it is seen in the location

x˜el in the image. On the other hand, given the model of the eye, the eyelid must lie on

a sphere centered at Xec with radius leye. Hence, the intersection between the line and the sphere is the location of the eyelid. See figure 4.6 for an illustration. The intersection between a line and a sphere is either two points, one point or empty. If the intersection is only one point then the line is a tangent to the sphere. Due to the appearance of the eyelid, this situation is not feasible, since the eyelid will not be visible in the image in such a case - the ellipsis of the eye will be degenerated and seen as a line and will be hard to identify. When the intersection is empty, the line doesn’t cross the sphere, this is of course not feasible situation in our setup. When the intersection contains two points, the correct solution is the one that is closer to the camera. The other solution is not feasible, since when the eyelid is farer from the camera it will be hidden by the eye itself. A point X is on the sphere if it satisfies the sphere equation

2 T 2 kX − Xeck2 = (X − Xec) (X − Xec) = leye. (4.31) Substituting X with the line equation (4.28), we get

T 2 (L0 + Nt − Xec) (L0 + Nt − Xec) = leye. (4.32) After rearrangement it becomes

T 2 T T T T T 2 N N t + 2N L0 − 2N Xec t + L0 L0 + Xec Xec − 2L0 Xec − leye = 0 (4.33) whic h is a quadratic equation in t. The equation is solved in the regular way to get the two

solutions t1 and t2. The eyelid location is found by substituting each solution in equation (4.28) and choosing the one that is closer to the center of the camera.

Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006 36 CHAPTER 4. TRACKING THE EYES OF THE CHAMELEON

YC XC L(t) leye Xec

x˜el

I y xI

ZC

Figure 4.6: The direction of the eye is found by the intersection between the back-projected ray of the eyelid and the eyelid sphere

4.2.2 Tracking the Eyelid

Finding the eyelid in the image is the key to calculating the direction of the eye. For each image of the video, the location of the eyelid is known for the previous image, where for the first frame, the location is determined manually. Given the location in the last image is known, there are two options for the current image: (i) the eye didn’t move, or (ii) the eye started a saccade and moved from its last position. The process of tracking is divided into two steps: in the first step a first estimate of the position of the eye is determined, whether the eye moved or not. In the second step the location is refined.

4.2.2.1 Finding If and Where the Eyelid Has Moved

In order to identify if the eyelid moved or not, a patch Ip1 around the location of the eyelid from the previous image is compared to the expected location of the eyelid in the current image. It should be noted that the expected location in the current image is not the same as the location in the previous image even if the eye didn’t moved, since the head of the chameleon is moving. Therefore, given the location of the eyelid in the previous

image, x˜el, the eye’s direction is determined as explained in section 4.2.1. Given the eye’s

direction, and the model of the eye, the 3D location of the eyelid Xel is known. For the current image, given the new parameters for the head’s pose, (R0, T0), the 3D location of 0 the eyelid Xel is projected into the image to location xel. This location is the expected location of the eyelid in the current frame, and serves as a starting point to search for

Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006 CHAPTER 4. TRACKING THE EYES OF THE CHAMELEON 37

(a) Previous Image (b) Current Image (c) Aligned Difference Image

Figure 4.7: Difference image reveals the location of the eyelid

the eyelid. The first step is to perform a normalized cross correlation between the patch 0 Ip1, centered around x˜el, and a patch Ip2, centered around xel. If the correlation is above some predefined threshold, the eye didn’t moved. The normalized cross correlation is also checked in a small environment around the expected location of the eyelid, since the eye might perform small movements, for example, when a saccade has just been started. If the eye has moved, the new location of the eyelid is needed to be determined. The eyelid of the chameleon in our movies is usually a dark circle surrounded by a brighter ring, therefore, a difference image between the previous and current images can help to find the new location. Since the chameleon’s head is moving, the difference image calculated without aligning the images will hold mostly information on the movement of the head. The method to align the images is similar to the one presented in the previous section

(4.2.2.1), and is based on the location of the center of the eye Xec in the previous and current images, given the different pose parameters of the head. This process can be seen as the registration of the current image to the previous one using only a translation transformation. In figure 4.7, the difference image between the aligned images is presented. The new location of the eyelid is revealed in the difference image as a local maximum (a bright area). In other situations, the eyelid might not be that clear. Hence, all the local maximums in the difference image are considered as suspected points for the location of the eyelid. Another quick filtering of the suspected points is to remove points in which the current image is not dark enough. The last step to identify the correct point is to compare each

Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006 38 CHAPTER 4. TRACKING THE EYES OF THE CHAMELEON

point with the expected appearance of the eyelid, given the model of the eye. For each suspected point, a pattern of the eyelid appearance is created. The pattern takes into account the model of the eye and additional parameters regarding the diameter of the eyelid. The normalized cross correlation of the pattern and the suspected point is then calculated and the best match is considered as the new location of the eyelid.

4.2.2.2 Refining the Location of the Eye

The second and last step of finding the location of the eyelid is to refine the location found in section 4.2.2.1. This is needed first in order to produce subpixel accuracy of the eyelid’s location. Another reason, is that the location of the eyelid might drift due to accumulated errors in the correlation process. The refinement of the location is based on the appearance of the eyelid. Connected components are marked in an image patch around the eyelid’s location. Since the eyelid is a black circle surrounded by a bright ring, it is a connected component. The center of this connected component is determined as the eyelid’s location.

4.3 Calibration of Multiple Cameras and Mirrors

Our experiments setup includes cameras and planar mirrors, where a mirror serves as an additional virtual camera. In this section we present how a planar mirror is treated as a virtual camera and how the system of cameras and mirrors is calibrated.

4.3.1 Planar Mirror as a Virtual Camera

Given a planar mirror, the plane of the mirror is described by the hessian normal form equation nˆTX = −a (4.34) where nˆ is the unit normal vector to the plane, a is the distance of the plane from the origin, and X is a 3D point on the plane if it satisfied the equation. Note that the minimal number of parameters that describe the plane is three: two angles for the direction of the normal vector and a. Given the two angles θ and ψ, the unit normal vector is cos(θ) cos(ψ) nˆ =  cos(θ) sin(ψ)  . (4.35)  sin(θ)     

Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006 CHAPTER 4. TRACKING THE EYES OF THE CHAMELEON 39

The parameters vector describing a plane will be compactly written as

pm = [ψ, θ, a] . (4.36)

A planar mirror creates a reflection

X0 = X − 2 nˆTX + a nˆ (4.37)

where X is the original point and X0 is thereflected point. Given a planar mirror, the reflection from the mirror is treated as additional virtual camera. For this virtual camera,

a pair (Rm, Tm) is calculated that transfer a point Xc from the camera frame to the point

Xv in the virtual camera frame

Xv = RmXc + Tm.

Note however, that since the virtual camera is a reflection, Rm is not a rotation matrix but

rather an improper rotation or a rotation with a flip. The determinant of Rm equals −1 and not 1 as the determinant of a rotation matrix and the “handiness” of the coordinate

system after applying Rm is opposite to the original one. That is, if the original coordinate system is right-handed, the coordinate system of the virtual camera is left-handed (and vice-versa).

The calculation of Rm and Tm is based on the general method of calculating rigid

transformation. Tm is the location of the camera origin as seen from the virtual camera,

and the columns of Rm are respectively the X-axis, Y-axis, and Z-axis of the original camera

as seen from the virtual camera. Assuming a camera positioned at the origin, that is Rc = I

and Tc = 0, (Rm, Tm) are given by,

T Rm = I − 2 nˆ I + a nˆ (4.38) T Tm = 0 − 2nˆ 0 + a nˆ = −2anˆ (4.39)

Note that a reflection is symmetric,hence the reflection equation from the virtual camera to the real camera is identical to the reflection from the camera to the virtual

camera. Therefore, Rm and Tm are also the same in the two directions, hence by definition they satisfy −1 −1 Rm = Rm and − Rm Tm = −RmTm = Tm. When the camera frame does not coincide with the world frame, additional step is

required. Given the mirror parameters, pm = [ψ, θ, a], in the world coordinate system,

Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006 40 CHAPTER 4. TRACKING THE EYES OF THE CHAMELEON

0 YV a OV a nˆT −1T

−Rc c

¡ ¡

¢¡¢¡¢

¡ ¡ ¢¡¢¡¢

¡ ¡ 0 0 ¢¡¢¡¢ 

a nˆ C ¡ ¡

XV ¢¡¢¡¢ Y

¡ ¡

¢¡¢¡¢ ¡ ¡

¢¡¢¡¢ C ¡ ¡

¢¡¢¡¢ O

¡ ¡

¢¡¢¡¢

¡ ¡

¢¡¢¡¢ ¡ ¡

¢¡¢¡¢ XC

¡ ¡

¢¡¢¡¢

¡ ¡

¢¡¢¡¢

¡ ¡

¢¡¢¡¢

¡ ¡

¢¡¢¡¢ ¡ ¡

¢¡¢¡¢ − ¡ ¡ ¢¡¢¡¢ 1T

−Rc c ¡ ¡

¢¡¢¡¢ nˆ (Rc, Tc) ¡ ¡

¢¡¢¡¢ a ¡ ¡

¢¡¢¡¢ OW

¡ ¡ ¢¡¢¡¢ YW XW

Figure 4.8: Mirror as a virtual camera - The distance of the mirror plane from the origin in the camera frame

0 0 0 0 the mirror parameters in the camera coordinate systems, p m = [ψ , θ , a ], are calculated. First, the unit normal vector nˆ is calculated using equation (4.35). Then the unit normal vector in the camera frame is calculated by

0 0 0 0 R nˆ = nˆx, nˆy, nˆz = cnˆ. (4.40)   From which the normal vector angles in the camera frame are calculated by

0 0 −1 nˆy ψ = tan 0 nˆx   (4.41) nˆ0 θ0 = tan−1 z . 02 02 1/2 nˆ x + nˆ y !

The last parameter a0, which is the distance of the plane from the origin in the camera frame, is calculated by 0 T −1 a = a + nˆ −Rc Tc (4.42)

which is the distance in the world frame, plus the vector from the camera frame to the world frame, projected on the normal of the plane. See Figure 4.8 for an illustration. The 0 parameters pm are used to calculate the improper transformation from the camera to the virtual camera and vice-versa.

Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006 CHAPTER 4. TRACKING THE EYES OF THE CHAMELEON 41

4.3.2 Pose Estimation and Tracking the Eyes With Mirrors

Note that both in section 4.1.3 and in section 4.2, the process to identify the head’s pose and the one to identify the direction of gaze of the eyes, are not dependent on the source

of the camera pose (R, T). This means that given (Rm, Tm) that are calculated using the method described above, both algorithms can continue to work in exactly the same manner they were described.

4.3.3 The Calibration Process

The purpose of the calibration process is to identify all the parameters that describe cam- eras’ and mirrors’ locations. In the setup there are two cameras, front and rear, and two mirrors, left and right. For the world coordinate system, the coordinate system of the front camera is chosen, hence the following parameters are needed to be found: the pose of the rear cameras (six parameters), the parameters of the left planar mirror (three parameters), and those of the right mirror (three parameters). A total of twelve parameters which fully describe the pose of all the cameras, real or virtual, with each other. The calibration process is based on the one used in the Matlab Calibration Tool- box [Bouguet, 2005] for stereo calibration. A calibration rig is positioned in M different locations and its image is taken in each location from all the cameras. The calibration rig has n distinguishable corners located in 3D position [X1, . . . , Xn] in the rig coordinate system. The rig pose in the j-th location is described by the six parameters pj or in the transformation form as (Rj, Tj). Since the pose parameters of the rig pose are not known, they are added to the unknown parameters needed to be found. Hence, the number of parameters is 12 + 6M. Given a point Xk, where the calibration rig is located at (Rj, Tj), its position in the i view is

k Ri Rj k j i Xv = v X + T + Tv, (4.43) 

where the pose of a view (Rv, Tv) can be the pose of a real camera or the pose of a virtual camera (that is, a mirror). This point is projected to the image homogeneous point by the k K k camera model Xh = Xv.

Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006 42 CHAPTER 4. TRACKING THE EYES OF THE CHAMELEON

4.3.3.1 Minimization Details

The calibration process is based on the same Gauss-Newton minimization presented in section 4.1.3. In a similar manner the calculation of the Jacobian is based on the chain rule of derivatives. However, since the number of parameters is larger, the Jacobian is needed to be calculated for more parameters. As in section 4.1.4, each row in the Jacobian is associated to one appearance of a feature point of the calibration board in an image or mirror. Each column in the Jacobian is associated to one parameter. Thus, the Jacobian row will be presented, for each parameter. The calibration is based on the same 3D

geometric error as in section 4.1.3, which is defined as D = X∗ − Xc. The Jacobian

calculation is presented for a general point viewed in a general view pose (Rv, Tv), given the board is located in a the j position (Rj, Tj). The view pose can be the pose of a virtual camera, where the real camera that view the mirror is not located at the origin. This is the most complicated case for the calculation of the Jacobian. Hence, it will be presented first, and the modifications for the other cases will be presented later. To emphasize the dependence of the view pose on both the parameters of the mirror and the parameters of the camera that view the mirror we write explicitly,

R R v = v(pRc , pTc , pm) (4.44)

Tv = Tv(pRc , pTc , pm). (4.45)

The Jacobain, or derivative, is again calculated using the chain rule. The first step is ∂(D) to calculate the derivative of the error vector D with regard to Xv. This derivative, ∂Xv ,

was given in section 4.1.4. The next step is to compute the derivative of Xv with regard to the pose paramters. The pose paramters are divided between the pose parameters of the cameras and mirrors, and the pose parameters of the calibration board for each position pj. The board pose is the same as a rigid body pose. Hence, when viewing the board in position j, the derivates with regard to pj are calculated in the same manner as in section 4.1.4. The derivates with regard to pi, where i =6 j, are zeros matrices with appropriate size.

Next we introduce the Jacobian of Xv with regard to the pose parameters of the camera

and mirror, pRc , pTc , and pm. As a first step, the Jacobian with regard to the view pose

Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006 CHAPTER 4. TRACKING THE EYES OF THE CHAMELEON 43

is calculated by,

T RjX + Tj ∂ (Xv) T =   RjX + Tj  (4.46) ∂Rv T  RjX + Tj     ∂ (X )   v = I.  (4.47) ∂Tv

All is left to show is the Jacobian of Rv and of Tv with regard to the parameters. It is given by,

0 0 ∂ (Rv) ∂ (Rv) ∂ (nˆ ) ∂ (Rv) ∂ (a ) ∂ (Rc) = 0 + 0 (4.48) ∂p c ∂nˆ ∂R ∂a ∂R ∂p c R  c c  R 0 0 ∂ (Tv) ∂ (Tv) ∂ (nˆ ) ∂ (Tv) ∂ (a ) ∂ (Tc) = 0 + 0 (4.49) ∂p c ∂nˆ ∂T ∂a ∂T ∂p c R  c c  T 0 0 ∂ (Rv) ∂ (Rv) ∂ (nˆ ) ∂ (Rv) ∂ (a ) ∂ (Rc) = 0 + 0 (4.50) ∂p c ∂nˆ ∂R ∂a ∂R ∂p c T  c c  R 0 0 ∂ (Tv) ∂ (Tv) ∂ (nˆ ) ∂ (Tv) ∂ (a ) ∂ (Tc) = 0 + 0 (4.51) ∂p c ∂nˆ ∂T ∂a ∂T ∂p c T  c c  T 0 0 ∂ (Rv) ∂ (Rv) ∂ (nˆ ) ∂ (Rv) ∂ (a ) = 0 + 0 (4.52) ∂pm ∂nˆ ∂pm ∂a ∂pm 0 0 ∂ (Tv) ∂ (Tv) ∂ (nˆ ) ∂ (Tv) ∂ (a ) = 0 + 0 , (4.53) ∂pm ∂nˆ ∂pm ∂a ∂pm where nˆ0 and a0 are the unit normal vector and the distance from the origin of the mirror in the camera frame. They are given in equations (4.40) and (4.42), and are

0 nˆ = Rcnˆ (4.54) 0 T T a = a − nˆ Rc Tc. (4.55)

The pose of the view is given using equations (4.38) and (4.39), but now based on the parameters nˆ0 and a0,

0T 0 0 Rv = I − 2 nˆ I + a nˆ (4.56) 0 0 Tv = −2a nˆ.  (4.57)

∂(Rv) ∂(Rv) From equations (4.54)–(4.57) the following derivatives can be calculated: ∂nˆ0 , ∂a0 , 0 0 0 0 ∂(Tv) ∂(Tv) ∂(nˆ ) ∂(a ) ∂(nˆ ) ∂(a ) ∂(Rc) 0 , 0 , , , , and . The derivative is given from the rodrigues ∂nˆ ∂a ∂Rc ∂Rc ∂Tc ∂Tc ∂pRc

Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006 44 CHAPTER 4. TRACKING THE EYES OF THE CHAMELEON

0 formula and the derivative ∂(Tc) is just the identity matrix. The last derivatives, ∂(nˆ ) and ∂pTc ∂pm ∂(a0) ∂pm , are calculated from equations (4.54), (4.55), (4.35), and (4.36). The details of the actual calculation will not be presented here. If the view pose is that of a mirror that is viewed in camera located at the origin, there is no need to change the calculations. The only difference will be that the calculation of nˆ0 and a0 will be simplified to nˆ0 = nˆ and a0 = a. This will cause all the derivatives of nˆ0 and 0 a with regard to Rc and Tc to become zero matrices. Hence the derivatives with regard to

pRc and pTc will become zero, which is correct since in this case the camera pose is known to be the origin, and is not updated by the calibration process. If the view is not a view

of a mirror but rather that of a real camera, then Rv and Tv should be replaced directly

with Rc and Tc and the appropriate derivates were presented earlier.

4.4 Experiments

As presented in section 4.2, in order to find the direction of the eyes, the eyelid should be found in the image. However, due to the wide range of motion of the eyes in chameleons, the usage of one camera in not sufficient. The eyes can move forward, backward, up or down hence the usage of several cameras is needed. The cameras should be arranged in such a manner that the eyelid will be visible in at least one image, regardless of its direction. However, the number of cameras that can be used is also limited. The main reason is the ability to capture, synchronize and record all the cameras together. Another reason is the cost of high-end cameras. The solution chosen was to use planar mirrors instead of additional cameras. The setup includes two cameras and two planar mirrors. Since each planar mirror can be seen as a virtual camera, this setup corresponds to a setup of six cameras. In the experiments the chameleon is located on a stick where the two cameras are positioned above it in front and rear locations, and the two mirrors are positioned below it. See figure 4.9(a) for a picture of the actual setup and figure 4.9(b) for an illustration of the setup. The experiments equipment included two identical black-and-white Dragonfly cameras from PointGrey, with a resolution of 640 × 480, and a frame rate of 30fps. The cameras are connected using a firewire link to a PC computer, and are automatically synchronized. The streams are captured in the PC and encoded using with the Microsoft MPEG4-V2 encoder, using the Fire-i application. The analysis of head and eye movements was done

Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006 CHAPTER 4. TRACKING THE EYES OF THE CHAMELEON 45

CCD CCD

Mirrors

Chameleon's location

(a) Experiments setup - picture

(b) Experiments Setup - Illustration

Figure 4.9: Experiment Setup - (a) A picture of the setup. The chameleon is located on a stick above two mirrors, and is videoed using two cameras above it. A front camera and a rear camera. (b) Illustration of the setup - each real camera sees two mirror hence create additional two virtual cameras.

Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006 46 CHAPTER 4. TRACKING THE EYES OF THE CHAMELEON

(a) Chameleon in the experiment setup (b) Chameleon with Eyes’ directions marked

Figure 4.10: Tracking eye movements of the chameleon - the chameleon is placed on a stick above two mirrors. The left figure shows the chameleon in the experiment setup. In the right image, extracted data is plotted on the image. White crosses mark the location of tracked features and large white cones indicate the line-of-sight of each eye. Note that the direction of the left eye was found by using the rear camera.

offline using matlab scripts and functions. In figure 4.10 an image of a chameleon in the experiments setup is presented. Figure 4.10(a) presents a raw image of the chameleon, while in figure 4.10(b) the same image is presented with data from the tracking system plotted on top of it. Preliminary results from our experiments indicate that chameleons use a “negative correlation” for the overall strategy of scanning the environment. That is, when one eye is directed forward, the other, with high probability, will be directed backwards.

Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006 Chapter 5

Scanning the Environment

In this chapter the optimal scanning algorithm is presented. The scanning method is a part of the Environment Visual Scanning (EVS) task of the oculomotor system. In this part the system needs to decide on a strategy to cover the environment in a manner that will optimize the system’s goals. This decision phase is formulated in the context of ”search theory“ and optimization problem is solved in order to maximize the probability of finding the target and to minimize the cost of moving the cameras. In order to simplify the computational load, the environment is partitioned into M regions, with each region representing a location that the system can explore. Time is also discretized into time steps which are assumed to be long enough to guarantee the completion of a search step. A search step comprises the actions of moving the camera to a new location and inspecting this new location by means of visual processing. Obviously the visual processing algorithm might not be able to identify a potential target, even if one exists in the region being explored. Next we present a probabilistic model where the target can move from one region to another or even to appear in and disappear from the range of detection of the system. An additional region, denoted by region 0, is introduced to represent the set of points outside the range of detection. In the model, a target appears in the system range of detection, when it moves from region 0 to a region different from 0, and disappears when it moves from a region different from 0 to region 0. A Markovian model is used to model the movement of the target between regions. The Markovian model is defined by a transition probability

matrix A, whose element in row i and column j is denoted by aij. Let xk be a random integer variable denoting the location of the target at time step

47

Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006 48 CHAPTER 5. SCANNING THE ENVIRONMENT

k+1 k k. The elements of A are aij = p(x = i|x = j), i.e., the conditional probability for a target to move from region j to region i, assuming that the target is in region j. In addition, a Probability Distribution Function (PDF) p(xk) is defined for time step k, representing the probability for the target of being in each region j at time step k. The PDF is updated according to the Markovian model, p(xk+1) = p(xk+1|xk)p(xk) which can be compactly rewritten as: p(xk+1) = Ap(xk). (5.1)

Even if the target is located in a certain region j, the camera might not find the target when looking in that region. The chances of finding the target, or detection rate, is a characteristic of each region, and depends on several factors. For example, in an area with uniform color, detection rate is expected to be high, while in areas with a large variety of color and condense texture the detection rate will be lower. The estimation of detection rate for each region is complex and influenced by several parameters. We assume that

it can be computed by means of computer vision algorithms and denote di the detection rate of region i. Note that we assume only false-negative errors and no false-positive ones. That is, if no target is located in region j the inspection of that region will always result k a “no-detection” answer. Let yi denote the result of inspecting region i at time step k. In k ¯ our model, the result yi of inspecting a region is restricted to the binary set [D, D] which denotes “detection” (D) and “no-detection” (D¯) events. Based on the assumption of only false-negative errors the probabilities p(yk|xk) are given:

0 j =6 i p(yk = D|xk = j) = (5.2) i  di j = i and  1 j =6 i p(yk = D¯|xk = j) = (5.3) i  1 − di j = i k k At time step k the camera inspects region j and returns an output y = yj . A PDF p(xk|y[1..k−1]) denotes the probability of the target being in region xk = j for each j, at time step k, given all the past observations y[1..k−1] = y1, y2, . . . , yk−1 . Given a new k observation y , the updated PDF is  

p(xk|y[1..k]) = Kp(xk|y[1..k−1])p(yk|xk) (5.4)

Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006 CHAPTER 5. SCANNING THE ENVIRONMENT 49

where K is a normalization factor that keeps the PDF sum equals 1

K = p(xk = j|y[1..k−1])p(yk|xk = j) (5.5) j∈X[1..M] Pk k Pk k Pk To simplify the notation, let ≡ p(x ) and i ≡ p(x = i). Consider a PDF at time k ¯ step k. If the system inspects region i but fails to find a target there, that is, yi = D, then by using equations (5.2) and (5.3) we can formulate the update process in equation (5.4) compactly as: k Pj k j =6 i 1−diPi (T (Pk)) =  i j   k  (1−di)Pi k j = i 1−diPi  k , k [1..k]  where (Ti(P ))j p(x = j|y ). If the system inspects region i, the updated posterior distribution at time step k + 1 is given by the combination of the posterior update and the Markov process: k+1 k P = ATi(P ). (5.6)

Note that if the result of inspecting region i is D (a detection of a target), then by using the update process in equation (5.4) and our definitions in equations (5.2) and (5.3) the resulting PDF is: 1 j = i P = (5.7) j  0 j =6 i which means the target is in region i.  During EVS, the motion of cameras between regions have an associated cost in terms of time and energy spent during the motion, while the scanning of a region j has an associated cost in terms of the time required for performing the image processing task. The EVS is transformed into an optimization problem with a cost function c(i, j) defined for each pair (i, j) of regions. This cost reflects the effort required to drive a camera from region i to region j plus the effort to scan region j. A cost of ∞ is set if the camera cannot move between i and j. For example, in the case where each camera is constrained to guard its own hemisphere, the cost of moving the right camera to the left side will be ∞. On the other hand the system gains a reward R if a target is captured. The value of this reward relative to the cost function will determine the system behavior and it is a tuning parameter.

Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006 50 CHAPTER 5. SCANNING THE ENVIRONMENT

With this formalism, a dynamic programming problem can be formulated in order to minimize a weighted sum of costs and rewards. The state vector at a given time step k is denoted by (Pk, i), where i is the region visited by the system at time k − 1 and Pk is the current posterior probabilities after visiting region i. The state update function is defined by equation (5.6). The following dynamic programming recursion [Eagle, 1984,Ross, 1983, Bertsekas, 2000] captures both the reward for acquiring a target and the effort needed to perform the scan:

n n Vn(P , i) = min [c(i, j) + (R − djPjR) + (1 − djPj) Vn+1(ATj(P ), j)] , (5.8) j

n where Vn(P , i) is the value function (or cost-to-go) at time n. At each time step n the dynamic programming recursion in (5.8) is solved backward in

time, starting from Vn+N = 0 in order to compute the optimal cost Vn over the horizon N and the optimal search policy. At the next time step the process is repeated for the same horizon N. During the search, the system acquires data on the different regions. Such data might be used to update A in order to better fit the new set of data. The procedure illustrated in this chapter can be extended to more than one camera. The state vector will store the last location of all the cameras and the corresponding posterior probabilities matrices. A cost function will be associated to each camera and the optimization problem will involve all the cameras and the union of the regions which can be reached by all the cameras. Each camera can scan only a subset of the regions (the intersection of the subsets is not empty). Given that the last regions inspected by

the cameras where [i1, i2, . . . , iC ], where C is the number of cameras, the update process corresponding to equation (5.6) is

Pk+1 Pk = ATi1 (Ti2 (. . . TiC ( ) . . . )). (5.9)

Which is equivalent to the “independent opinion pool” mentioned in [Wong et al., 2005].

Let Pf denote the probability of failure to find the target by all the cameras. Pf can be easily calculated from the update process in equation (5.9). Then, the stochastic dynamic programming for multiple-cameras can be compactly formulated as:

k Vn(P , i1, . . . , iC ) = min Σl∈[1..C](cl(i, j)) + Pf R+ j1,...,jC  Pk +Pf Vn+1(ATj1 (. . . TjC ( ) . . . ), j1, . . . , jC ) . (5.10) 

Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006 CHAPTER 5. SCANNING THE ENVIRONMENT 51

Simulations for this chapter will be presented in chapter 7. In the next chapter, the methods to perform the target tracking, which is the second main task of the system, are presented.

Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006 Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006 Chapter 6

Low Level Control - Smooth Pursuit and Saccade

In this chapter we describe how the cameras are controlled in order to actively track a target or to move the camera to scan a different region. The task of moving the cameras is performed using a switching mechanism between a smooth pursuit controller, and a saccade controller. We introduce the smooth pursuit controller, the saccade controller, and the method they are combined together. The smooth pursuit controller is a robust controller and relies on the assumption that the target model is excited by an unknown and bounded disturbance which brings it away from the center of a given tracking window. The saccade controller is a minimum-time optimal controller that takes the opposite approach: it assumes that the target movement is known, without any disturbances, and tries to center the target in the tracking window as fast as it can. The saccade is also performed when the camera is reoriented in order to scan a different region. This case can be seen as a saccade toward a virtual target, where the location of the virtual target is the next region to be scanned, given by the high-level scanning algorithm, and its speed and acceleration are zero. We start by presenting the used dynamical model of a single pan-tilt head.

6.1 Active Vision Model

Next we introduce the dynamical model of a single pan-tilt head (Figure 6.1a). We denote by ψ and θ the pan angle and the tilt angle, respectively, and by ψ˙ and θ˙ the pan and tilt speeds, respectively. By collecting the model variables, the head state vector is denoted by

53

Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006 54 CHAPTER 6. LOW LEVEL CONTROL - SMOOTH PURSUIT AND SACCADE

wZ Target wZ cX cZ δθ cY c δψ X wY

θ θ wY ψ ψ wX cY wX

(a) (b)

Figure 6.1: Pan-tilt model and coordinate systems - The axes marked with w define the world coordinate system. The axes marked with c define the coordinate system in the camera frame- work. The camera is pointing toward its X axis (and not to the standard Z axis). The axes corresponding to the image hight and width are the Z and Y axes respectively. The pan and tilt axes are shown in sub-figure (a). The measured tracking errors δψ and δθ are shown in sub-figure (b).

T xp = ψ, ψ˙, θ, θ˙ . We use discrete-time linear time-invariant (LTI) model with sampling time equalh to thei sampling time of the camera. With abuse of notation the same notation is T used for the corresponding discrete time variables, xp = ψ, ψ˙, θ, θ˙ . The head dynamics are described by the model h i

xp(k + 1) = Apxp(k) + Bpu(k)

where Ap is a 4 × 4 matrix, Bp is a 4 × 2 matrix and u is the 2 × 1 input vector to the local pan-tilt controllers. The target dynamics are modeled as a double integrator in the radial coordinate system. T If At and Bt define the dynamics of double integrator, and xt = ψt, ψ˙t, θt, θ˙t is the target state, then the target dynamics can be written as h i

xt(k + 1) = Atxt(k) + Btv(k)

where v(k) is the acceleration of the target which is unknown to the cameras. We define

the global tracking error as ∆x = xt − xp, and we separate it into global tracking error

in the pan axis, ∆Ψ = Ψt − Ψt and global tracking error in the tilt axis, ∆Θ = Θt − Θp, T T where Ψ = ψ, ψ˙ and Θ = θt, θ˙t . We will denoteh i by δψ and δhθ theimeasured tracking errors. It should be noticed that the measured tracking error are given in the camera coordinate system (Figure 6.1b) which is

Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006 CHAPTER 6. LOW LEVEL CONTROL - SMOOTH PURSUIT AND SACCADE 55

different from the global coordinate system in our configuration. It should also be noticed that only the position tracking error is measured and that the tracking error speed is estimated from it. Given the measured tracking errors δψ and δθ and the current position of the camera ψ and θ, the target position in the world coordinate system can be calculated as

w w 2 w 2 ψt = − arcsin Tz/ Tx + Ty w w θt = arctan ( Ty/ Tqx) 

w w w w where T = [ Tx, Ty, Tz] is a normal vector pointing toward the target in the world coordinate system. wT can be calculated from the line of sight in the camera coordinate system and by using cT , through the equation:

w c T = Ry (θ) · Rz (ψ) T,

c where RO (α) is the rotation matrix around the axis O, with α radians. The vector T is calculated in a similar way using δψ and δθ.

c T T = Ry (δθ) · Rz (ψ) · [1, 0, 0] .

The global tracking error can then be easily calculated by subtracting the current camera

position, ∆ψ = ψt − ψ. An identical procedure can be written for the tilt axis. Clearly the global tracking errors ∆ψ and ∆θ are not linearly dependent from the measured errors δψ and δθ. In order to make use of a linear MPC, we preprocess the measured errors with a nonlinear static inversion. In particular, after measuring δψ and δθ, the global tracking errors ∆ψ and ∆θ are computed by a non-linear function (denoted IK). The MPC controller receives as inputs the latter tracking errors. Since a minimization of ∆θ implies the minimization of δθ, we obtain a linear MPC with the same final objective. Figure 6.2 presents the system model. The large left block performs the control. The large block to the right is the plant. In the control block the IK non-linear function computes the tracking errors ∆ψ and ∆θ from the measured tracking errors δψ and δθ

and the current head position ψc and θc. In the plant, the block marked as P represents the local pan-tilt controllers and the pan-tilt engines. It receives the input from the MPC controllers and considers the target movements as a disturbance. The outputs of P are the tracking error ∆ψ, ∆θ and the head position. However, through image precessing one can

measure only δψ and δθ, outputs of the static non-linear function Kim.

Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006 56 CHAPTER 6. LOW LEVEL CONTROL - SMOOTH PURSUIT AND SACCADE

ψt θt

u ? ? δψ ∆ψ ψ ∆ψ δψ - - mpc - - -

δθ ∆θ 6 uθ ∆θ δθ - - mpc - - - IK Kim ψc - ψc 6 P ψ-c ψ-c

θc - θc θc- θc-

Figure 6.2: System model - The large block on the left is the control. The large block to the right is the plant.

Besides the linearity of the problem, by using the global tracking error rather the measured ones, we gain the separability of the control problem. The pan and tilt controllers can be now separated since they are dynamically decoupled.

6.2 Smooth Pursuit Control

We combine the camera states and target states into the following LTI dynamical model:

x (k + 1) A 0 x (k) B 0 p = p p + p u(k) + v(k) "xt(k + 1)# " 0 At# "xt(k)# " 0 # "Bt#

We apply a linear state transformation by replacing the target states with the tracking

error states: x = [xp, x˙ p, ∆x, ∆x˙], where ∆x = xt − xp and ∆x˙ = x˙ t − x˙ p. The resultant model will be compactly written as

x(k + 1) = Ax(k) + Bu(k) + Ev(k) (6.1)

The goal of the smooth pursuit controller is to maintain the target inside a predefined tracking window. The acceleration v(k) that drives the target can be seen as a disturbance that competes against the controller goal. The smooth pursuit controller is designed by solving a min-max game where the disturbance is the opponent and the controller tries to minimize the worst-case performance index while satisfying the constraints for all the possible disturbances. State constraints will include the presence of the target within a pre-specified window and bounds on position, variation and acceleration of the head angles.

Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006 CHAPTER 6. LOW LEVEL CONTROL - SMOOTH PURSUIT AND SACCADE 57

We design a min-max MPC controller by solving the following optimization problem

min kRukkp + max Qxk+1|k + . . . v ∈V p u ∈ U  k k n

s.t xk ∈ X ∀vk ∈ V

min kRuk+j−1kp + U uk+N−1 ∈ n s.t xk+N−1 ∈ X ∀vk+N−1 ∈ V

+ max Qxk+N|k (6.2) v − ∈V p k+N 1  n o where N is the prediction horizon, Q ∈ Rn×n defines the weights on the states and R ∈ m×m R defines the weights on the control inputs. The vector xk+j|k denotes the predicted

state at time k + j starting from state xk|k = x(k) and applying the input sequence

uk, . . . , uk+j−1, and the disturbances sequence vk, . . . , vk+j−1 to the model (6.1). The sets X, V, and U are polyhedral sets that define constraints on the states, disturbances and inputs. In particular, the set X constraints the target position to be in a certain window and the set V defines the maximum amplitude of the target acceleration. Solving this optimization problem (6.2) in real-time might be challenging. In the case of a piecewise-linear performance index (p = 1, ∞), an offline solution to problem (6.2) can be computed as a function of the current state vector x [Bemporad et al., 2003], i.e., ∗ ∗ ∗ ∗ U (k) = f(x(k)), with U = uk, . . . , uk+N−1 . In [Bemporad et al., 2003] it is shown that the state-feedback solution fto problem (6.2) is a piecewise linear function of the states. That is, the states space is partitioned into polyhedra and in each polyhedron the optimal controller is a linear function of the states. Hence, at each time step, the online solution of (6.2) is reduced to the evaluation of f. This consists of searching for the polyhedron containing x(k) and computing the corresponding linear function in order to obtain U ∗. ∗ ∗ Once U is obtained, the first input signal uk is applied, and the procedure is repeated over a shifted horizon. Note that f is time-invariant and it is computed offline only once.

6.3 Saccade Controller

When the smooth pursuit controller fails, or when one prefers to move the camera and scan another section, the saccade controller is used. In both cases we want to complete

Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006 58 CHAPTER 6. LOW LEVEL CONTROL - SMOOTH PURSUIT AND SACCADE

500

400

300

200

100

0

−100

−200 tracking error speed (step/sec) −300

−400

−500 −30 −20 −10 0 10 20 30 tracking error (step)

Figure 6.3: Smooth-pursuit min-max MPC controller. State-Space partition: the horizontal axis is the tracking error and the vertical axis is the tracking error speed. Axes units are steps and steps per second. Each step is roughly 0.05◦.

the task as fast as possible, hence the saccadic motion. For this reason, the saccade controller, in general, requires a more accurate model of the target motion in opposite to the smooth pursuit controller, which is a robust controller. The need of a more accurate model is translated to the assumption that the disturbance acting on the target is known and constant. We design a saccade controller by computing the state feedback closed-loop solution to the following constrained finite-time optimal control problem [Borrelli, 2003]:

N

min J = Qxk+j|k p + kRuk+j−1kp U=uk,...,uk+N−1 j=1 X s.t.

xk+1|k = Axk|k + Buk + Ev¯

xk+j|k ∈ X j = 1, . . . , N

yk+j|k ∈ Y j = 0, . . . , N − 1

uk+j|k ∈ U j = 0, . . . , N − 1

xt+N|t ∈ Tset

where the disturbance is assumed constant over the horizon and with a terminal state

constraints Tset. The choice of v¯ is discussed in section 6.4. The offline solution is piecewise affine [Bemporad et al., 2002], i.e, the set of feasible states (states that can be brought

Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006 CHAPTER 6. LOW LEVEL CONTROL - SMOOTH PURSUIT AND SACCADE 59

(a) N=2 (b) N=3 (c) N=4

Figure 6.4: Saccade controllers. State-space partition for controllers fN (x(k)) for different pre- diction horizons N. The horizontal axis is the tracking position error and the vertical axis is the tracking speed error in the pan or tilt axis. The partitions depicts a cut of the state-space where the head position, head speed and target acceleration are zero. As the prediction horizon increases, the feasible area of the controller increases. The axes units are steps and steps per second. Each step is roughly 0.05◦.

to Tset in N steps) is partitioned into polyhedra and in each polyhedron the optimal control is a linear function of the states. In order to obtain the state-feedback solution

to problem (6.3), we solve (6.3) offline for all horizons N ≤ Nmax for a given Nmax. This

procedure yields Nmax controllers, one for each horizon:

∗ u (k)N = fN (x(k)), N = 1, . . . , Nmax.

These controllers will be referred to as “explicit controllers”. The feasible area of the

explicit controllers fN (x(k)) will increase with N, as can be seen in Figure 6.4, since as the horizon is getting longer, state vectors which are farer from the terminal set can be driven into the terminal set.

We implement a feedback controller that brings x to the terminal set Tset in minimum

time by using the explicit controllers fN (x(k)) in a combined way. At each time step,

for a given current state x(k), we look for a feasible controller fN (x(k)) with the smallest horizon N. Once this has been found we implement the corresponding optimal control policy. It should be noted that checking whether or not the state vector is inside a feasible region of a controller is simple: the feasible region of a linear MPC controller is

a polyhedron. We remark that in the proposed scheme, the systems state reaches Tset in minimum-time, under no model mismatch. There is a trade off between the real-time computational complexity and the choice of

Nmax. The use of a long horizon Nmax allows a wider tracking capability (due to the increase

Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006 60 CHAPTER 6. LOW LEVEL CONTROL - SMOOTH PURSUIT AND SACCADE

of the feasible area). On the other hand a long prediction horizon requires higher real-time computational capabilities, since such controller has a large number of polyhedral regions.

In our approach Nmax is chosen to be the largest possible given the computational limits of our system. If x(k) is outside the feasible area of all the explicit controllers, we implement a different saccade controller (6.3) without terminal set and tuned with a high weight on the position error compared to the other weights. This type of controller, tries to minimize the position errors as fast as it can. When the errors becomes smaller the explicit controllers

fN (x(k)) become feasible and are used until the state vector enters the terminal set. As an alternative solution to reduce the complexity of the saccade controller (6.3) the method in [Grieder et al., 2003] could be applied. In [Grieder et al., 2003] the authors propose a minimum-time low complexity controller. While this approach provides a wider tracking area with a lower controller complexity, the overall performance degrades compared to our approach. In summary, at each time step, the proposed strategy for the saccade controller consist in implementing the explicit controller with minimum N. If no explicit controller is feasible, then a saccade controller with no terminal set is applied.

6.4 Switching between Smooth Pursuit and Saccade

Consider the case where the smooth pursuit controller is active and it becomes infeasible (i.e, the target exits the tracking window). Since the smooth controller is robust to dis- turbances on the target motion, then an acceleration larger than modeled is acting on the target. In order to apply the saccade controller we estimate the acceleration using the last and current states. With this estimation as input to the target model, we implement the saccade controller as described in the previous section. During the operation of the sac- cade controller, no image processing is performed, and the control is based on the estimated states of the target. This is due to the fact that it is hard to find the target in the image when the relative speed between the camera and the target is high. Any attempt on image processing would fail. This also preserve the ballistic nature of saccades seen in primates. When the saccade is completed, based on the estimated movement of the target, the system starts to look for the target in the center of the image. If the estimates were correct, the target will be found. If the target is not found, the system switches back to EVS mode and starts to scan the expected area of the target.

Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006 Chapter 7

Simulations and Experiments

In this chapter we first present simulations of the high-level scanning algorithm and then experimental validation of the smooth pursuit and saccade controllers. The experiments are carried on a robotic head which is briefly described next.

7.1 System Architecture

Inspired by the arrangement of the chameleon visual system, the robotic head is composed of two cameras installed each on top of a pan-tilt unit. The two cameras can be moved independently covering large sections of the environment thanks to the wide range of motion of the pan-tilt units. At the high level, an algorithm that scans the environment runs continuously. This algorithm is a heuristic algorithm based on the principles that govern global scanning in the chameleon. The whole system runs on a standard PC computer. The cameras are sending images to the computer through a firewire link. In the PC, the image processing module is responsible to find the target in the images. The smooth pursuit controller and the saccade controller are also hosted on the PC: they receive information from the high-level scanning algorithm during scan operation and directly from the image processing and target location module during target tracking operation. They send references to the pan-tilt units controllers which are located outside the PC and are responsible to perform the movements (see Figure 1.2 on page 7). The pan-tilt units’ model is PTU-D46-17.5 of DirectedPerception. Two flea cameras of PointGrey are mounted on top of the pan-tilt units and are working with a frame rate

61

Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006 62 CHAPTER 7. SIMULATIONS AND EXPERIMENTS

of 30Hz. The cameras are connected to a PC computer using a firewire link and the controllers of the pan-tilt units are connected to the computer using a serial RS232 cable.

7.2 Scanning Method Simulations

In this section simulation results for the algorithm introduced in chapter 5 are presented. In order to facilitate the understating of the results, we first present simulations for scanning a 1D circle and then 2D simulations.

7.2.1 1D Simulations

The environment considered here is a 2D world where the system is in the middle of a circle and the cameras can be directed in angles of 0◦ − 360◦. The circle is divided to 10 regions where each camera can scan its own side and the forward and backward regions, as detailed in Figure 7.1. The horizon of the dynamic programming in (5.8) was set to three. A is set to represent an appearance probability of 0.3 and probability of staying in the same region of 0.63. All regions have exiting probability of 0.3 and transition probability that is divided uniformly between the neighbors. The detection rate of all regions (with the exception of the “no-target” region) is set to 1. The cost function was set to c(i, j) = (i − j)2 + 1 if the movement is possible, and c(i, j) = ∞ if the movement of the camera from i to j is not possible. The cost to move to and from the “no-target” region was also set to infinite, which means our system will always continue to search for targets. The results are shown in Figure 7.2(a). The y-axis represent the angle of each camera from the forward direction. The right camera has positive angles and the left camera has negative angles. An angle of 180◦ in the right camera is of course the same as an angle of −180◦ in the left camera. We highlight that the positions of the cameras are negatively correlated, that is, when one camera is looking forward the other is usually in a backward position and vice-versa. This behavior is the same as the one we observed in chameleons. In Figure 7.2(b) we present simulation results where the front regions have larger prob- ability of having a target. In this case the intuitive optimal solution resorts to keep one camera, (for example, the left camera) looking forward while the other should scan its side (the right side in our example). Next, the cameras should switch, and the right camera

Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006 CHAPTER 7. SIMULATIONS AND EXPERIMENTS 63

1 Forward 10 2 Right Camera Range 9 θR 3 Right Camera θ L 0

Left Camera Range 8 4

Left Camera

7 Backward 5 6

Figure 7.1: Simulation setup for the 1D case - The world is a circle with 10 regions. Region 0 represents the system. Right camera can be directed to regions [1 − 6]. Left camera can be directed to regions [1, 6 − 10]. The wide arrow represents the forward looking direction of the system. The two thinner arrows represents possible locations of the right and left cameras.

200 200

150

100 100

50

0 0 degrees degrees −50

−100 −100

−150 −200 −200 0 5 10 15 20 25 30 0 5 10 15 20 25 30 seconds seconds

(a) (b)

Figure 7.2: Simulation results for search method in 1D - the right camera has positive angles and the left camera has negative angles. In subfigure 7.2(a) all the regions are identical and target movement is uniformly distributed between them. The cameras start at the same location, directed forward, but develop a negative correlation quickly. In subfigure 7.2(b), the front regions have larger probability to have the target, and they are covered by one of the cameras most of the time.

Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006 64 CHAPTER 7. SIMULATIONS AND EXPERIMENTS

should look forward while the left camera is free to scan the left side. This is also a behav- ior observed in chameleons by the authors. As can be seen in Figure 7.2(b), the scanning pattern created by our model in this case is similar to the strategy mentioned above. In the last simulation the control of four cameras is demonstrated. The setup is similar to the previous simulations. The 1D circle is now divided into twelve regions, which are denoted by 1 to 12 as depicted in Figure 7.3(a). Each camera inspect a quarter of the circle with an overlap of one region between neighboring cameras. The cameras are denoted by their main direction: front-left, front-right, rear-left or rear-right. We choose the matrix A in order to model a preferred path for the target. The path is located at the front position and goes between regions 2, 1, 12, 11 and 10. The target, with high probability, will enter and exit the environment in regions 2 or 10, and will travel along this path. The target is not able to exit the environment whilst it is in the path, nor is able to enter the path in regions other than 2 and 10. In the other regions, the target may enter or exit the environment but with lower probability. In Figure 7.3(b) the simulation results are presented. As seen in the figure, the front-left and front-right cameras guard regions 2 and 10 where the target have high probability to enter the environment. From time to time one of the front cameras inspect the other regions of the path, in case the target wasn’t detected and is still in one of them. The rear cameras cover the other regions 3 to 9.

7.2.2 2D Simulations

The environment considered in this section is tridimensional. The scanning area is a twodimensional since it is described by two parameters: the pan and tilt angles of a camera. A system of two cameras scans the upper hemisphere (in a realistic setup, the lower hemisphere is usually below the ground surface). Each camera can scan half of the hemisphere where front and back regions can be scanned by both the cameras. Simulation settings were similar to those in the 1D case. The results for the pan axis can be seen in Figure 7.4(a) and for the tilt axis in Figure 7.4(b). A negative correlation is clear from Figure 7.4(c) which depicts the angle between the cameras during the search. This angle can range between 0◦ to 180◦. An angle of 0◦ represents that the cameras point to the same direction, while an angle of 180◦ indicates the cameras are directed to opposite directions. As can be seen most of the time the angle between the cameras is larger than 90◦ which indicates a negative correlation between them.

Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006 CHAPTER 7. SIMULATIONS AND EXPERIMENTS 65

Front−Right Camera Range 12 200 ◦ Front−Left 11 0 1 ◦ Camera Range −30◦ Forward 30

10 2 100 −60◦ 60◦

◦ ◦ 9 −90 0 90 3 0 degrees

−120◦ ◦ 8 120 4 −100 Rear−Left Camera Range ◦ −150 ◦ Backward 150 7 180◦ 5 6 −200 Rear−Right Camera Range 0 10 20 30 40 50 60 seconds

(a) Setup for 1D - four cameras (b) Simulation results - four cameras

Figure 7.3: Simulation results for search method of four cameras - (a) Setup of the four cameras simulation. Large arrows mark the regions with high probability to enter or exit the environment. (b) Plots of each camera. The plots from top to bottom represents respectively the rear-left, front- left, front-right, and rear-right cameras. The two front cameras guard the two regions where the target have high probability to enter the environment, while the rear cameras scan the rear regions.

200 100

150 50 100 degrees

0 0 5 10 15 20 25 30 100 0 degrees degrees 100

−100 50 50 degrees

−200 0 0 0 5 10 15 20 25 30 0 5 10 15 20 25 30 0 5 10 15 20 25 30 seconds seconds seconds

(a) Pan axis (b) Tilt axis (c) Angle between cam- eras

Figure 7.4: Simulation results for search method in 2D - (a) Pan axis of the two cameras. Right camera has positive angles and the left camera has negative angles. (b) Tilt axis. The upper plot is for the right camera and the lower plot is for the left camera. (c) Angle between the cameras. The angle is usually larger than 90◦ which indicates a negative correlation pattern.

Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006 66 CHAPTER 7. SIMULATIONS AND EXPERIMENTS

7.3 Smooth Pursuit and Saccade Controllers: Exper- imental Validation

The smooth pursuit and saccade controllers were first simulated in matlab and then experi- mentally validated on the robotic head. Below we present the results from the experiments on the robotic head Delays is one of the main concerns when dealing with visually guided active tracking. In our case the image arrives with a delay of almost one time step. During tracking, the target is located using the normalized cross correlation on the predefined window. Once the target is located, the tracking error and tracking error speed are estimated and the control signal is computed by finding the polyhedral region of the controller based on the current state. This process is fast enough (about 1msec for control computation). Therefore the total delay in the system is considered to be one time step. Next we present two simplified tracking tasks. The system tracks successfully a target represented as a black circle. The results for tracking a target moving along a sinusoidal path with constant amplitude and different frequencies can be seen in Figure 7.5. When the frequency is low, target acceleration is low and the smooth pursuit controller performs a good tracking. When the frequency is high, the saccade controller is activated due to the high acceleration during the direction change, and drives the target back to the smooth- pursuit tracking window (Figure 7.5(d)). Figure 7.6 depicts experimental results when the target is moving with a constant acceleration. Each experiment begins with an initialization phase that brings the target to a base speed. The base speed is maintained for 0.25 seconds. This is represented by the phase of zero acceleration in the acceleration plots of Figure 7.6 (plots (a)-(d)). After this phase, the test acceleration is applied until the target exits the experiment area. As the acceleration increases, the rate of saccades needed to track the target increases as well. With larger accelerations than the ones reported in Figure 7.6, after the first saccade, the system could no longer find the target.

Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006 CHAPTER 7. SIMULATIONS AND EXPERIMENTS 67

48 48 48 48

46 46 46 46

44 44 44 44 degrees degrees degrees degrees 42 42 42 42

40 40 40 40

4 5 6 7 2 3 4 5 3 4 5 6 2 3 4 5 seconds seconds seconds seconds

(a) ω = 0.7Hz (b) ω = 0.8Hz (c) ω = 0.9Hz (d) ω = 1Hz

Figure 7.5: Experimental results for sinusoidal signal - Each plot shows the target and camera position (in degrees) vs. time (in seconds) for different frequency ω of the signal. The target is represented as a dashed line and the camera position as a solid line. Saccades are indicated with a thicker line. The start of a saccade is marked with a triangle directly below or above the line. In sub-figure 7.5(d) the system uses the saccade controller to bring back the target into the tracking window.

150 150 150 150

100 100 100 100 2 2 2 2

50 50 50 50 degrees/sec degrees/sec degrees/sec degrees/sec 0 0 0 0

−50 −50 −50 −50 1.5 2 2.5 3 3.5 6.5 7 7.5 8 8.5 7.5 8 8.5 9 9.5 10 10.5 11 11.5 seconds seconds seconds seconds

(a) v = 60◦/s2 (b) v = 105◦/s2 (c) v = 120◦/s2 (d) v = 150◦/s2

60 60 60 60

55 55 55 55

50 50 50 50

45 45 45 45 degrees degrees degrees 40 degrees 40 40 40

35 35 35 35

30 30 30 30

1.5 2 2.5 3 3.5 6.5 7 7.5 8 8.5 7.5 8 8.5 9 9.5 10 10.5 11 11.5 seconds seconds seconds seconds

(e) v = 60◦/s2 (f) v = 105◦/s2 (g) v = 120◦/s2 (h) v = 150◦/s2

Figure 7.6: Experimental results for constant acceleration - plots (a) to (d) show the acceleration profile of the target (in degrees/sec2) vs. time (in seconds). Plots (e) to (h) show the position of the target and the camera (in degrees) vs. time, for the corresponding acceleration profile in the upper plots. The target is represented as a dashed line and the camera as a solid line. Saccades are marked as in Figure 7.5. The final constant acceleration is applied after the target is brought to a base speed, which is maintained for 0.25 seconds. The value of the constant acceleration v is provided in the plots labels. As the acceleration increases, more saccades are needed to track of the target.

Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006 Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006 Chapter 8

Conclusions

In this thesis we presented a biologically motivated research regarding the chameleon visual system. The thesis involved a number of fields which make up this kind of research. From a biological research, which included a complex computer vision system to analyze eye movements in chameleons, through the creation of a new method to scan the environment, and to a real-time implementation of a system which scans the environment and actively tracks targets. In the first part, where computer-vision aids biological research, a complex computer- vision system was constructed in order to analyze eye movements in chameleons. The system includes two cameras and two mirrors. Tracking the head uses a model-based approach. The head’s pose is determined by minimizing the 3D geometric errors between the model points and the back-projecting rays induced by the location of the features found in the image. The pose estimation was performed using multiple-cameras and mirrors, where each feature found in each view contributes to the final pose. This is the first time where this kind of pose estimation is presented for a system consisting cameras and mirrors. In order to have a good set of features to track, the validity of each feature is defined and features are added to the set of tracked features, or are removed from it, based and their validity. The expected appearance of the features is used to get a first estimate on their location, which reduces the case of outliers to minimum. Tracking the eyes is performed using the geometrical model of the eye, which includes the eye’s center and the radius from the center to the eyelid. The eyelid is identified in one of the images or in the mirrors and the direction of the eye is calculated based on this geometric model and the pose of the head. The procedure to identify the eyelid includes computing of a correlation between the

69

Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006 70 CHAPTER 8. CONCLUSIONS

eyelid as appeared in the previous image and the current image. If the eyelid has moved its new location is estimated using the difference image, which gives several candidates for the new location. The final location is determined with the help of the eyelid’s expected appearance. Also, the geometry of a planar mirror was presented, and the calibration process of a system that includes cameras and mirrors was presented and performed. The calibration process is done using an enhanced version of the Camera Calibration Toolbox for Matlab. This process, like the pose estimation process, minimizes the 3D geometric errors, and the cost function includes all the errors in all the views together. In the second part, where biological systems inspire robotic systems, a novel method for visual scanning and autonomous target tracking was presented. We employed a sys- tematic and optimization-based approach to the problem, both for the high-level scanning algorithm and to the low-level tracking control design. Both algorithms use a model-based approach combining robotic head model and target model to optimally solve the scanning and the tracking problem. Our new high-level Environment Visual Scanning algorithm is formulated in the context of “search theory”. The environment is divided into discrete regions and the algorithm is working in discrete time steps. A PDF function is maintained, which describes the prob- ability the target is in each region. The PDF is updated in each time step based on a Markovian model for the movement of the target and the result of inspecting the current region. The problem is formulated as a dynamic stochastic programming problem. A cost function combines the cost of the movement of the cameras and the reward from finding the target and is solved for a limited horizon in order to find the best movement of the cam- eras. Our new scanning algorithm has been simulated with a setting of two cameras that scans the hemisphere, a setup which is similar to those of chameleons. The optimal scan- ning pattern shows a “negative correlation” between the cameras. This scanning pattern has a distinguishable resemblance to the scanning patterns seen in chameleons in nature. This may suggests that the principles of our scanning strategy are similar to those that governed the evolution of the scanning strategy of the chameleon. Moreover, the system was simulated for different conditions and demonstrated other behaviors such as a “cover- your-back” behavior which was also identified in nature. The scheme is not limited to two cameras, and a simulation presented its behavior for four cameras. As was exampled, the visual scanning scheme is also flexible and information about roads or traveling paths in the environment can be incorporated into the model and change the scanning pattern and

Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006 CHAPTER 8. CONCLUSIONS 71

tracking algorithm accordingly. For low level tracking controller we have designed and implemented linear and robust MPC algorithms. The MPC technique has several advantages: (i) It is model based, (ii) it allows to include the system and target physical constraints in the control design, (iii) it is robust to abrupt and bounded changes in target directions. Thanks to most recent developments on MPC theory we have computed and implemented the piecewise affine state feedback solution to the MPC problem. This solution is computed offline in the design phase of the system. During the operation, implementing the control low for both the smooth-pursuit controller and for the saccade controller, requires only searching a lookup- table for the current region, based on the current state of the system. This makes the implementation real-time capable, and even better than traditional systems. For example, in [Rivlin and Rotstein, 2000] the saccade controller optimization was solved only once, and was then applied in an open loop manner. Also, solving the optimization problem of the saccade controller took significant amount of time of several time steps. This relatively long time degrades the performance of the system since the saccade is bound to work on an accurate model of the target, a model which degrades in time. To the contrary, in our system, solving for the optimization problem for the saccade was done offline, and the online search for the correct region takes less than half the time step. This enables to apply the saccade immediately and moreover, to apply the saccade in a closed loop manner on the predicted states of the target. Another innovation in our method is how the smooth- pursuit and saccade are combined. Usually, the saccade is triggered when the positional error is larger than some threshold, meaning the target exited some predefined window in the center of the image. In our system a saccade is initiated when the smooth-pursuit controller has failed. This doesn’t necessarily mean that the target exited the tracking window, rather it means that the smooth-pursuit controller is no longer feasible. That is, for the given state, under our assumptions of bounded disturbances, the smooth-pursuit controller will not be able to keep the target in the tracking window for all the states in its prediction horizon. This enables faster initiation of the saccade controller, which evidently result in a shorter and more accurate saccades. The work in this thesis can be expanded in the following directions. In the eye tracking system, a method to automate the initialization of the tracking will improve the usability of the system. Another improvement will be to compute the direction of the eyes based on a complete 3D appearance model of the eye, rather than a geometrical model of the eyelid.

Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006 72 CHAPTER 8. CONCLUSIONS

This enhancement can also help us to relax the dependency on the center of the eye in the model. An interesting improvement to the EVS algorithm is to solve it offline in a similar manner to the offline MPC method. This direction is not trivial since the update function of the states in the EVS problem is not linear, which is a requirement for the usage of offline MPC methods. Another future direction is to formulate the problem so it will be able to search for multiple targets or a combination of targets and threats. Last possible improvement is to change the PDF update function so it will be able to deal with a wider range of inputs. Currently, the input, which is the result of inspecting some region, is limited to “detection” or “no-detection”. Allowing a wider range of legal inputs, for example, continuous input that represents the probability for a target to be in the inspected region, will describe real world scenarios more closely, and will enable the system to handle them correctly. In the low level target tracking control algorithms an interesting expansion is the use of image processing algorithms during the saccade. Although saccades in primates have a ballistic nature, we don’t have to bound our system to such assumptions. Performing image processing in the regular method, based on correlation between the target pattern and the image, will not give good results due to the motion blur induced by the high speed of the target on the image. Yet, analyzing this motion blur can give input on the performance of the saccade while it is performed, and will enable the handling of estimation errors or changes in the movement of the target. Another issue is the relevance of our work to describe saccades seen in nature. The results of our smooth-pursuit and saccade tracking are similar to the general behavior of tracking in primates. It will be interesting to see if our model can be changed to fit more closely the phenomena as described in recent works. The biological part of the work includes only preliminary results. We would like to continue the analysis of eye movements in chameleons, in order to have stronger evidence of the negative correlation nature of the scanning in chameleons. In the complete biological research the scanning pattern will be analyzed with and without a prey. Another aspect of the work is to analyze how the transition between environment scanning to target tracking with two eyes is performed in chameleons. The last issue is to analyze the saccades events themselves (to the contrary of the scanning pattern as a whole), and to see if inter-eye or intra-eye saccades are correlated in terms of size, direction or temporal pattern. To conclude, this thesis is spanned over a number of fields bringing its contribution to

Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006 CHAPTER 8. CONCLUSIONS 73

each one of them: (i) using a combination of cameras and mirrors for pose estimation, (ii) introducing a new method for environment visual scanning based on stochastic dynamic programming, (iii) resemblance of the environment visual scanning to scanning method seen in nature in the chameleon, and (iv) designing and implementing a new control method for target tracking by a combination of smooth-pursuit and saccades and using offline MPC controllers.

Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006 Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006 References

[Agarwal and Triggs, 2004] Agarwal, A. and Triggs, B. (2004). 3d human pose from silhouettes by relevance vector regression. In CVPR ’04: Proceedings of the 2004 Conference on Computer Vision and Pattern Recognition-Volume 2, pages 882– 888, Washington, DC, USA. IEEE Computer Society.

[Ansar and Daniilidis, 2003] Ansar, A. and Daniilidis, K. (2003). Linear pose estima- tion from points or lines. IEEE Trans. Pattern Anal. Mach. Intell., 25(5):578–589.

[Asada et al., 2000] Asada, M., Tanaka, T., and Hosoda, K. (2000). Adaptive binoc- ular visual servoing for independently moving target tracking. In Proc. of IEEE Intl. Conf. on Robotics and Automation (ICRA’00), volume 3, pages 2076–2081.

[Baker and Nayar, 1998] Baker, S. and Nayar, S. K. (1998). A theory of catadioptric image formation. In ICCV ’98: Proceedings of the Sixth International Confer- ence on Computer Vision, pages 35–42, Washington, DC, USA. IEEE Computer Society.

[Bemporad et al., 2000] Bemporad, A., Borrelli, F., and Morari, M. (2000). The Explicit Solution of Constrained LP-Based Receding Horizon Control. In IEEE Conference on Decision and Control, Sydney, Australia.

[Bemporad et al., 2003] Bemporad, A., Borrelli, F., and Morari, M. (2003). Min-max Control of Constrained Uncertain Discrete-Time Linear Systems. IEEE Transac- tions on Automatic Control, 48(9):1600–1606.

[Bemporad et al., 2002] Bemporad, A., Morari, M., Dua, V., and Pistikopoulos, E. (2002). The Explicit Linear Quadratic Regulator for Constrained Systems. Auto- matica, 38(1):3–20.

75

Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006 76 REFERENCES

[Bernardino and Santos-Victor, 1999] Bernardino, A. and Santos-Victor, J. (1999). Binocular visual tracking: Integration of perception and control. IEEE Transac- tions on Robotics and Automation, 15(6):1937–1958.

[Bertsekas, 2000] Bertsekas, D. P. (2000). Dynamic Programming and Optimal Con- trol, volume I. Athena Scientific, Belmont, Massachusetts, 2nd edition.

[Black and Jepson, 1996] Black, M. J. and Jepson, A. D. (1996). Eigentracking: Robust matching and tracking of articulated objects using a view-based represen- tation. In ECCV ’96: Proceedings of the 4th European Conference on Computer Vision-Volume I, pages 329–342, London, UK. Springer-Verlag.

[Borrelli, 2003] Borrelli, F. (2003). Constrained Optimal Control of Linear & Hybrid Systems, volume 290. Springer Verlag.

[Borrelli et al., 2005] Borrelli, F., Baotic, M., Bemporad, A., and Morari, M. (2005). Dynamic programming for constrained optimal control of discrete-time hybrid sys- tems. Automatica, 41:1709–1721.

[Bouguet, 2005] Bouguet, J.-Y. (2005). Camera calibration toolbox for matlab.

[Bowmaker et al., 2005] Bowmaker, J. K., Loew, E. R., and Ott, M. (2005). The cone photoreceptors and visual pigments of chameleons. J Comp Physiol A, 191(10):925–932.

[Chang and Chen, 2004] Chang, W.-Y. and Chen, C.-S. (2004). Pose estimation for multiple camera systems. In ICPR ’04: Proceedings of the 17th International Conference on Pattern Recognition-Volume 3, pages 262–265, Washington, DC, USA. IEEE Computer Society.

[Christensen, 1993] Christensen, H. I. (1993). A low-cost robot camera head. Inter- national Journal of Pattern Recognition and Artificial Intelligence, 7(1):69–87.

[Cootes et al., 2000] Cootes, T. F., Walker, K., and Taylor, C. J. (2000). View- based active appearance models. In FG ’00: Proceedings of the Fourth IEEE International Conference on Automatic Face and Gesture Recognition 2000, pages 227–232, Washington, DC, USA. IEEE Computer Society.

Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006 REFERENCES 77

[Cr´etual and Chaumette, 2001] Cr´etual, A. and Chaumette, F. (2001). Application of motion-based visual servoing to target tracking. Int. Journal of Robotics Re- search, 20(11):878–890.

[Darrell et al., 1996] Darrell, T., Moghaddam, B., and Pentland, A. P. (1996). Active face tracking and pose estimation in an interactive room. In CVPR ’96: Proceed- ings of the 1996 Conference on Computer Vision and Pattern Recognition (CVPR ’96), pages 67–72, Washington, DC, USA. IEEE Computer Society.

[de Groot and van Leeuwen, 2004] de Groot, J. H. and van Leeuwen, J. L. (2004). Evidence for an elastic projection mechanism in the chameleon tongue. In Proc. of the R. Soc. London B, volume 271, pages 761–770.

[Delamarre and Faugeras, 1999] Delamarre, Q. and Faugeras, O. (1999). 3d articu- lated models and multi-view tracking with silhouette. In Proceedings of the 16th International Conference on Vision Interface, pages 716–721.

[Dhome et al., 1989] Dhome, M., Richetin, M., and Lapreste, J.-T. (1989). Determi- nation of the attitude of 3d objects from a single perspective view. IEEE Trans. Pattern Anal. Mach. Intell., 11(12):1265–1278.

[Drouin et al., 2003] Drouin, S., He’bert, P., and Parizeau, M. (2003). Simultaneous tracking and estimation of a skeletal model for monitoring human motion. In 16th International Conference on Vision Interface, pages 81–88.

[Eagle, 1984] Eagle, J. N. (1984). An optimal search for a moving target when the search path is constrained. Oper. Res., 32(5):1107–1115.

[Eagle and Yee, 1990] Eagle, J. N. and Yee, J. R. (1990). An optimal branch-and- bound procedure for the constrained path, moving target search problem. Oper. Res., 38(1):110–114.

[Fan and Wang, 2004] Fan, B. and Wang, Z.-F. (2004). Pose estimation of human body based on silhouette images. In ICIA ’04: Proceedings of the 2004 Interna- tional Conference on Information Acquisition, pages 296–300.

Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006 78 REFERENCES

[Fischler and Bolles, 1981] Fischler, M. A. and Bolles, R. C. (1981). Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM, 24(6):381–395.

[Flanders, 1985] Flanders, M. (1985). Visually guided head movement in the african chameleon. Vision Research, 25(7):935–942.

[Fleuret and Geman, 2002] Fleuret, F. and Geman, D. (2002). Fast face detection with precise pose estimation. In ICPR ’02: Proceedings of the 16 th International Conference on Pattern Recognition (ICPR’02) Volume 1, volume 01, pages 235– 238, Los Alamitos, CA, USA. IEEE Computer Society.

[Gluckman and Nayar, 2001] Gluckman, J. and Nayar, S. K. (2001). Catadioptric stereo using planar mirrors. Int. J. Comput. Vision, 44(1):65–79.

[Grieder et al., 2003] Grieder, P., Parrilo, P., and Morari, M. (2003). Robust Reced- ing Horizon Control - Analysis & Synthesis. In IEEE Conference on Decision and Control, pages 941–946, Maui, Hawaii.

[Haralick et al., 1989] Haralick, R., Joo, H., Lee, C., Zhuang, X., Vaidya, V., and Kim, M. (1989). Pose estimation from corresponding point data. IEEE Trans. Systems, Man, and Cybernetics, 19(6):1426–1445.

[Haralick et al., 1991] Haralick, R. M., Lee, C., Ottenberg, K., and Nolle, M. (1991). Analysis and solutions of the three point perspective pose estimation problem. Technical report, Hamburg, Germany, Germany.

[Harkness, 1977] Harkness, L. (1977). Chameleons use accommodation cues to judge distance. Nature, 267:346–349.

[Hartley and Zisserman, 2004] Hartley, R. I. and Zisserman, A. (2004). Multi- ple View Geometry in Computer Vision. Cambridge University Press, ISBN: 0521540518, second edition.

[Hassni et al., 1997] Hassni, M. E., M’Hamed, S. B., Rep´erant, J., and Bennis, M. (1997). Quantitative and topographical study of retinal ganglion cells in the chameleon (chameleo chameleon). Brain Research Bulletin, 44(5):621–625.

Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006 REFERENCES 79

[Herrel et al., 2000] Herrel, A., Meyers, J. J., Aerts, P., and Nishikawa, K. C. (2000). The mechanics of prey prehension in chameleons. The Journal of Experimental Biology, 21(203):3255–3263.

[Horaud et al., 2004] Horaud, R., Knossow, D., and Michaelis, M. (2004). Camera cooperation for achieving visual attention. Technical Report RR-5216, INRIA, INRIA Rhˆone-Alpes, Montbonnot. submitted to Machine Vision and Applications Journal.

[JA et al., 1988] JA, A., F, P., J, A., and JM, G.-G. (1988). The photoreceptors of the chameleon retina (chamaleo chamaleo). a golgi study. J Hirnforsch, 29(4):403– 409.

[Kvasnica et al., 2004] Kvasnica, M., Grieder, P., and Baoti´c, M. (2004). Multi- Parametric Toolbox (MPT). http://control.ee.ethz.ch/mpt/.

[Kwong and Gong, 2002] Kwong, J. N. S. and Gong, S. (2002). Composite support vector machines for detection of faces across views and pose estimation. Image Vision Comput., 20(5-6):359–368.

[Land, 1999a] Land, M. F. (1999a). Motion and vision: why animals move their eyes. J Comp Physiol A, 185(4):341–352.

[Land, 1999b] Land, M. F. (1999b). Visual optics: The sandlance eye breaks all the rules. Current Biology, 9(8):286–288.

[Lau et al., 2005] Lau, H., Huang, S., and Dissanayake, G. (2005). Optimal search for multiple targets in a built environment. In Proc. of IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IROS), pages 228–233.

[Lu et al., 2000] Lu, C.-P., Hager, G. D., and Mjolsness, E. (2000). Fast and globally convergent pose estimation from video images. IEEE Trans. Pattern Anal. Mach. Intell., 22(6):610–622.

[Martin and Horaud, 2001] Martin, F. and Horaud, R. (2001). Multiple-camera tracking of rigid objects. Technical Report RR-4268, INRIA.

Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006 80 REFERENCES

[Mayne et al., 2000] Mayne, D., Rawlings, J., Rao, C., and Scokaert, P. (2000). Con- strained model predictive control: Stability and optimality. Automatica, 36(6):789– 814.

[Milios et al., 1993] Milios, E. E., Jenkin, M. R. M., and Tsotsos, J. K. (1993). Design and performance of trish, a binocular robot head with torsional eye movements. International Journal of Pattern Recognition and Artificial Intelligence, 7(1):51– 68.

[Mitsumoto et al., 1992] Mitsumoto, H., Tamura, S., Okazaki, K., Kajimi, N., and Fukui, Y. (1992). 3-d reconstruction using mirror images based on a plane symme- try recovering method. IEEE Trans. Pattern Anal. Mach. Intell., 14(9):941–946.

[Morency et al., 2003] Morency, L.-P., Sundberg, P., and Darrell, T. (2003). Pose estimation using 3d view-based eigenspaces. In AMFG ’03: Proceedings of the IEEE International Workshop on Analysis and Modeling of Faces and Gestures, pages 45–52, Washington, DC, USA. IEEE Computer Society.

[Nickels and Hutchinson, 2001] Nickels, K. and Hutchinson, S. (2001). Model-based tracking of complex articulated objects. IEEE Transactions on Robotics and Au- tomation, 17(1):28–36.

[Ott, 2001] Ott, M. (2001). Chameleons have independent eye movements but syn- chronise both eyes during saccadic prey tracking. Experimental Brain Research, 139(2):173–179.

[Ott and Schaeffel, 1995] Ott, M. and Schaeffel, F. (1995). A negatively powered lens in the chameleon. Nature, 373:692–694.

[Ott et al., 1998] Ott, M., Schaeffel, F., and Kirmse, W. (1998). Binocular vision and accommodation in prey-catching chameleons. Journal of Comparative Physiology A: Sensory, Neural, and Behavioral Physiology, 182(3):319–330.

[Pettigrew et al., 1999] Pettigrew, J. D., Collin, S. P., and Ott, M. (1999). Conver- gence of specialised behaviour, eye movements and visual optics in the sandlance (teleostei) and the chameleon (reptilia). Current Biology, 9(8):421–424.

Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006 REFERENCES 81

[Quan and Lan, 1999] Quan, L. and Lan, Z. (1999). Linear n-point camera pose determination. IEEE Trans. Pattern Anal. Mach. Intell., 21(8):774–780.

[Rivlin and Rotstein, 2000] Rivlin, E. and Rotstein, H. (2000). Control of a camera for active vision: Foveal vision, smooth tracking and saccade. Int. J. Comput. Vision, 39(2):81–96.

[Ross, 1983] Ross, S. M. (1983). Introduction to Stochastic Dynamic Programming. Academic Press, Inc., Orlando, FL, USA.

[Srinivasan and Boyer, 2002] Srinivasan, S. and Boyer, K. L. (2002). Head pose esti- mation using view based eigenspaces. In ICPR ’02: Proceedings of the 16 th Inter- national Conference on Pattern Recognition (ICPR’02) Volume 4, pages 302–305, Washington, DC, USA. IEEE Computer Society.

[Stone, 1992] Stone, L. D. (1992). Theory of Optimal Search. Military Applications Se America, 2 edition.

[Sturm and Bonfort, 2006] Sturm, P. and Bonfort, T. (2006). How to compute the pose of an object without a direct view? In Proceedings of the Asian Conference on Computer Vision, Hyderabad, India, volume II, pages 21–31.

[Sutherland et al., 2001] Sutherland, O., Truong, H., Rougeaux, S., and Zelinsky, A. (2001). Advancing active vision systems by improved design and control. In ISER ’00: Experimental Robotics VII, pages 71–80, London, UK. Springer-Verlag.

[Tonko and Nagel, 2000] Tonko, M. and Nagel, H.-H. (2000). Model-based stereo- tracking of non-polyhedral objects for automatic disassembly experiments. Int. J. Comput. Vision, 37(1):99–118.

[Ude, 1998] Ude, A. (1998). Nonlinear least squares optimisation of unit quaternion functions for pose estimation from corresponding features. In ICPR ’98: Proceed- ings of the 14th International Conference on Pattern Recognition-Volume 1, pages 425–427, Washington, DC, USA. IEEE Computer Society.

[Vijayakumar et al., 2001] Vijayakumar, S., Conradt, J., Shibata, T., and Schaal, S. (2001). Overt visual attention for a humanoid robot. In Proc. of IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems, volume 4, pages 2332–2337.

Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006 82 REFERENCES

[Wong et al., 2005] Wong, E.-M., Bourgault, F., and Furukawa, T. (2005). Multi- vehicle bayesian search for multiple lost targets. In Proc. of IEEE Intl. Conf. on Robotics and Automation (ICRA05), pages 3169–3174.

Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006

£¥¤ ¦¨§ © £ § §§   § !§ "# £ $¥§%

¦¨§& !§%'( £¤ § ) *+ ¦ ,¥*-

§%./0) *1§32 4)

Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006 Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006

£¥¤ ¦¨§ © £ § §§   § !§ "# £ $¥§%

¦¨§& !§%'( £¤ § ) *+ ¦ ,¥*-

54687(9 :<;05>=@?BAC7

DFEHGJI¡ILKNM OPK8IBGRQTSUDVW¥KNQ SUOPKYXSJGRKZSU[]\^Q_K

\`SUaFV[&KbDdcBe SUfg[

M Q X[&W]SRahV[&M

ikjFlnm opirqtsdm yRGzSRv{whc W¥cLv{e|K8Q_fuGUW

KNEHDFQpSUK8SRfuGRKYGUvxwhc¡yRGRw[ —

DFM3[&M GUv W&ƒSUX „† ‡e|Q I0yRGUQ X }~€~‚

Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006 Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006

ˆ

Au:?BAu5Š‰ =†‹Œ 54?B7 5/=†Ž/=†54#‘’ACA“7(”‹? ‹T•–;&”—54687(9˜‹

? •17(9˜‹ Au;/‰H9˜:‹pš—:=†6›4?

™

X[&WIBƒhGUO`IŸžYDhGREHK SRM–GUILw3SU[4IBG GJI^SJSJXvxWKNa¡yUSRKNMSUD¡V¢GUW£E¤DFM X¦¥ ƒhGUDhƒFK§W4V¢GU[SUvxE

œ

SUv{GREHMW&SUfuGRKYGzSRM K1fgGJXW[DSR¨¢O SzVf’¥ ƒhGUDhƒFK1IBGJV3GRWK©SUv{GR¨¢DFMª«y¬w¥GR[&w ­®Ww¢GUDFEHW0DhO

œ

¥ DVK]\Bf8IBGzV¢GJI ­®W„¯DhOPX[ KNQ SUfgGUKYGJSUM3W V¨W ILE GUI^SzSUX vxW KNaŸW&ƒhSJX#I c^SJe_DFM

GR[&w­®W&DhO^M W]IBDhGUISU[4GUX IBM\PIBDF„xa–KNa–SUKYDFGUM!GJOPe°¥ ¨v@DFƒ¦¥ DVKYG/yUSJSRcLQ|cBGUD–DhG¬cBO^W

SUw¢GI^SzV[/SU[ ILKYI±W&X GRv{IžYGJDFaFSUQEHQ GRvxMGUI^Dd„{a¡KNa›yRGzSUX GUESUQ_K—I^GJV¢GUWK—SRv{GU¨¢DFM!ª²y¬w

œ

­³SUKNQ GR„xK8e SJe|M3w©W£Q [/SRQ DFQ_E¡ª²GRKNQ#WM%SJOPaZWILvxw¢GUIKNQ#DFGJO^[&WV¢GJOILEOPƒSJe ­´SUIBGU[&KYILQ_WM!SJILKNM%SJOPQ#W£MSJV¢vxW]I^SJƒFe_wW¤Ww3SU[4ILW¤KNaµyRGzSRvxwhcLK›W4V¢GU[SUvxE

Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006 Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006

¤§%.£§3§. ¶¢, >¦

· ¸ ¹gº%» ¼

­P­^­P­B­^­P­^­^­P­^­P­B­^­P­^­^­P­^­B­ I^SUfuGRKYGJSUM-W4SU¨M%SRcLGU[¾\BaµWQpSRf ¿­À¿

½

­P­^­P­B­^­P­^­^­P­^­P­B­^­P­^­^­P­ I^SJO`SU„xW]KNQ \PSzSUv@SRadWŸDhXE¾WM%SJOPa ¿­

Á

}

­P­^­P­B­^­P­^­^­P­^­P­B­^­P­^­^­P­^­B­P­^­P­^­^­P­^­B­P­+KZV3GR[&w1I^SJO`SU„xW ¿­

Á

½

­P­^­P­B­^­P­^­^­P­^­P­B­^­P­^­^­P­^­B­P­^­P­^­^­P­^­B­P­^­ŠW„@SJILW¤W£v{M3[ ¿­

Á

Â

¼€¼ Ã<ĆÅdćÆÇ©ÈFÉ ÇÄ{¸<ÊÇ+ÃËÊYÌ» Í

¼gΠùgÊZÏЩʢÅFÐ ·

¿ÒÑ ­P­^­P­B­^­P­^­^­P­^­P­B­^­P­^­^­P­^­B­P­°IPSJV¢[/SR[ ILKYI0WX

½

œ

¿ ­P­^­P­B­^­P­^­^­P­^­P­B­^­P­^­^­P­^­B­ÓWM%SUM e_W¥KNQ I^SUKNE GR„@SJG4W&O(SUDhe ­

}

Â

½

­P­^­P­B­^­P­^­^­P­^­P­B­^­P­^­^­P­^­B­P­^­P­^­^­P­ W&DZcL[DhXE¾WM%SJOPa ­

}~

½ ½

̀· Ã<ĆÅFÄÔÆÇ©ÈFÉ Õ3ÄÔćÖ{ÄÔÌ<ǦÊh׸Ç<º ĆÅdÌ È-ÃËÊYÌ» Ø

­P­^­P­B­^­P­^­^­P­^­P­B­^­P­^­^­P­^­B­P­^­ IPSUO(SR„{W]Q_EHDµDhXE¾WM%SJOPa ­À¿

Á

}

½

­P­^­P­B­^­P­^­^­P­^­P­B­^­P­^­^­P­^­B­P­^­P­^­^­ \PSzSUv@SRadWŸDhXE¾WM%SJOPa ­

Á Á

}

½

­P­^­P­B­^­P­^­^­P­^­P­B­^­P­¥I^GR[&KN¨[4G4IBGUEHDF[]Dhƒhe|[KNQŠW4SU¨¢DFM%SUKYO ­

Á

Â

½ ½

­P­^­P­B­^­P­^­^­P­^­P­B­^­P­^­^­P­^­B­P­^­P­^­^­P­^­B­P­^­P­^­^­\PSzSUGJe SUv ­

Á€Á Á Á

؀٠Ç<º Ä{º%Ð ÇÚÈhÉ ÇÅFÄxÊdÐ Î

v

Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006

ÛÜÞÝRÜßÜßݬà|áãâ€äßå vi

Îæ· çÞÕ ÄÔèÅFÐé<Õ3ćè צÕ3ćֆ¹“ÅFÄ{ù(ÇÅZȢפÇÏhÄÔè%ʵê^Ç<ʢź Çà ëÄ‡É ì

Ñ ­P­^­P­B­^­P­^­^­P­^­P­B­^­P­^­^­P­^­B­P­^­P­^­#I^SUM%SRcBO^EW4SRE DFK›KZV¢GR[ ­À¿



½

Ñ ­P­^­P­B­^­P­^­^­P­^­P­B­^­P­^­^­P­^­B­P­^­P­^­^­P­íW&OPKYXWM%SJO^adK›DhOPM ­

  }

Ñïî ­P­^­P­B­^­P­^­^­P­^­P­B­^­P­^­^­P­^­B­P­^­P­^­^­P­^­B­P­ \`SzV¢O`e|W¤DhOPM ­



½

­P­^­P­B­^­P­^­^­P­^­P­B­^­P­^­^­P­^­ V3OPe_K¡W&OPKYXWM%SJO^a–yUSRM!DdM aZ[ ­

Á

€~ 

ìF¼ Õ3ÄÔć¹@Ðhćֆ¹(ù@Äxð%ÈF¹†»ÄÔÐ Ù

¿ ­P­^­P­B­^­P­^­^­P­^­P­B­^­P­^­^­P­^­B­P­^­P­^­§ILw¢DFaZ[&W]IBDhG¬cBO^cBSUw¢DFE îæ­À¿



­P­^­P­B­ WM%SUM e_W¥KNQ I^SUKNE GR„@SJG4W&O(SUDhe|Kb\PI^SUDhGUf†KNEHK›IBGJSU¨KYGR[/SJe îæ­

k} }

­P­^­P­B­^­P­^­^­P­^­P­ \`SJSJGUepSRv \PSzV¢O`e|KYG&W&OPKYX!WMSUOPaZKn\`SUDhOPM îæ­

€

½

œ

ì‚ñ Õ ¹†ËZćРò

ـΠùuÊd¹“Åh»¦Ã»ćɧÊ

ix ÊZÄxð£Å¢Ã

Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006

¤§*© §) ¦ ¡§¤ *

­P­^­P­B­^­P­^­^­P­^­P­B­^­P­^­^­P­^­B­IPSUO(SR„{W¤Q EHDhG4SRcLGUM GJDdW¤Q_E DFW ¿­À¿



î ­P­^­P­B­^­P­^­^­P­^­P­B­^­P­^­^­P­^­B­P­^­P­^­§ILw¢DFaZ[&W]IBDhG¬cBO^cBSUw¢DFE ¿­

}

¿ ­P­^­P­B­^­P­^­^­P­–I^SUO(SU„^KNQ#I^SJILQpSJDdM\PSzSUv{GJSUKNfgvxf \PSUE I¡KNQ SUGJƒSR[ ­À¿

} }

¿ ­P­^­P­B­^­P­^­HI^SUv@SJSJƒhGREWE/cLK›KNQ yUSUaZK8I^SJO`SU„^KNQ#yUSUaµyUSUM-WE GUGRQ W ­

} }

½

¿ ­P­^­P­B­^­P­^­^­P­^­P­B­^­óI^SJO(SR„xMe_GJOPGJƒdKYG [WELSR¨3SJG4W&e SUvxw©SJGRv@SUQ ­

Á

}

½

œ

­P­^­P­B­^­P­^­^­P­^­P­B­pW£[&KN¨[&W&G4\^KYGRadW¾ª²ôZGRfgW¤KNQ \PSJDSR¨-I^GRw¢DFaZ[ ­À¿

Á Á

}

¿ ­P­^­P­B­^­P­^­^­P­^­P­B­^­P­^­^­rIBGJSJIBGUwE>KN[]yUSzSRv†SRaIBGJV3GUOPv^\Ba–I^SJO`SU„ ­

Á

}

½

­P­^­P­B­^­P­^­^­P­^­P­B­^­P­^­›yUSzSRv†SRaI^V¢GJO^vLKNQ#WaFƒFGUWK›\`SUDdcL[4DhƒFW ­

Á

}

½ ½

­P­^­P­B­^­P­^­^­P­^­P­B­^­P­^­^­P­^­B­P­^­P­^­(yUSzSRv†SRa–I^V¢GJO^vBKNQ#I^SUvxM%I ­

Á Á

½€½

­P­^­P­B­^­P­^­^­P­^­P­B­^­P­^­^­P­^­B­P­^­P­^­^­P­^­B­P­^­P­^­>yUSRadWKZV¢GR[ ­´Ñ

Á Á

½

­P­^­P­B­DhGJV¢wK¡GJOyUSUM!žYGJI^SJXSJV S4KNaµyUSRadW]IBGRKNw¢IBe_W¤yRGJGJSUw1ILE SU¨[ ­

Á

 

½

î ­P­^­P­B­^­P­^­^­P­^­P­B­¨yRGRQTSREHW¤\PGUO(SR[ŸIBE±WKNfg[Ÿ\PSUQ DhƒFW¥ILv{GU[4I ­õî

Á

½

­P­^­P­B­^­P­^­^­P­^­P­B­^­P­^­^­P­^­B­P­^­˜IPSRKNE G¬cBDSUGW[&KN¨[&wWEHDF[ ­

Á Á

~

Â

Ñ ­P­^­P­B­^­P­^­^­P­^­P­B­^­P­^­^­P­^­B­P­^­P­^­^­P­SUGJe SUvxW]ILM%SUM e yRGRfuDFE ­³ö

Á Á

­P­^­P­B­^­P­^­^­P­^­P­B­^­P­^­^­P­^­B­P­r\`SJSUv@SUaZWŸDhXE±W£MSUOPaµI^GREH¨¢GUI ­÷¿

Á Á

 ~

IL[&KN¨[&K›KZV¢GU[ ­÷¿ Ñ ­P­^­P­B­^­P­^­^­P­^­P­B­^­P­‚\`SJDhSU¨-IBGUw¢Ddad[4G Á

pan-tilt 

Ñ ­P­^­P­B­^­P­^­^­P­^­P­B­^­P­^­^­P­^­B­P­^­óILw¢DFad[&W¥KNQ \`SUO`GRKNM©KZV¢GR[ ­

  }

Ñ ­P­^­P­B­^­P­^­^­P­^­P­B­^­ WOPKYXÚWM%SJO^adKn\^GU[/SUe_OP[ \PGR[/SUv@SR[ŸDhOPM ­



Â

½

œ

тö ­P­^­P­B­^­P­^­^­P­^­P­B­^­O`ƒhGREŸKNQ \`SRv@GRQ \`SRw¢DhGUE>M!\`SzV¢O`e SUDhOPM ­

Á 

vii

Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006

ÛÜÞø€äÞÜÞù›å3úFÜÞû ø viii

­P­^­P­B­^­P­^­^­P­^­P­B­^­P­^­^­P­^­B­%V[/SU[ V¢X M©IBGJSU¨KYGR[/SJe|K›W&DhGR¨¢I îæ­À¿



½

œ

­P­^­P­B­^­P­^­^­P­^­P­B­^­P­^­^­P­^­B­V¢[/SR[ V3XM©IBGzSR¨KYGU[/SUe I^GREH¨¢GUI îæ­

 }

½

œ

Ñ ­P­^­P­B­^­P­^­^­P­^­P­B­^­°IBGR[&KN¨[aZM%DFEDhGRM anIBGzSR¨KYGU[/SUe I^GREH¨¢GUI îæ­



½

Ñ ­P­^­P­B­^­P­^­^­P­LV[/SU[ GJV¢M-W&O`SJDhe|W¤ILcBSUQ_K8IBGzSR¨KYGU[/SUe I^GREH¨¢GUI îæ­

Á



œ

î ­P­^­P­B­^­P­^­^­—SJV SRE GUe_GUv@SJe KNvxf“SJe DhXE¾WM%SJOPaZKn\`SJSJGUe SUvPI^GREH¨¢GUI îæ­´Ñ



î ­P­^­P­B­^­P­^­^­P­ WadGUM OÓW¨¢GRE IKNaZM-KNvxf“SJe DhXE±W£MSUOPaµI^GREH¨¢GUI îæ­

 

Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006

*1§ ¤¦

KZV¢GU[4G/WE DFQ_WŸIBGUGUW[SJXW±\^KYGRaZ[IBGJSUfuGRKYGzSRM+IBGUw¢DFaZ[¾WMWQpSUf†M¦I cBO`GRv|GR„`W„@SJI

SJILQ ­³I^SJV3GUX

SJILKZSRM¥IBGRaFGRv{I SzV3SBKNa1WM%SUM e|W¡ILEŠIBGUO`DhGUe ª´Q EHDFWSzV3SU¨MIBGU[4OPGU[&[&Wªü\PSzSUv@SRadW

SzV3S`KNaILaZ¨M%IL[ÓW&O(SUDhe|W ­õM X DF[&W¡KNQ DhXE#DhGR„xE I^OPDhGUe yUSUaÚKNwQ žNwª«IBGzSUGUKYI

W&O`SJDhe¾DhGR„{E>[ÚyUSRadW1KNQ¾W&DSRW[-W&DFM3adW I^GJSzV¢O`e¾IBGUadGRv@I

œ (saccadic movements)

w¤KNQpG˜ª²SJO`ƒFGUE>WŠDSR¨M w]KNQ M%X D+WaFGRv{I X

90◦ œ 180◦

M X

­³\`SU[4GJSJGREDFX E±GRESRKNELSU¨v‡cBGUƒ–ôZDdcDhXE±WQ ƒhXM¦ª´WMSRM%e|Q

IBGUadGRv@I DhOPX KŸILM3Q X

Wvxw¢GUIILw¢DFaZ[4G<ª«IBGUEHDF[-SUIBQ_[4G¢IBGU[&KN¨[1SJILQ_[1ILM w¢DhGR[1ILw¢DFaZ[&W­³IBGJSJO`SU„xM›\PSzSRv†SRa

M3QpGUX[ ª²I^SRQ EHD ­®Q EHDFK!IPSUe_XS_yUSUa+KNw¤KNQ IBGUKNw¢IBe|WWÓyUGUGJSUw¥IBE ILM Q X[¨DdQ E

ª´Q EHDFW¡M GUM%SJe_G(\^GJO`SU[ÓSzV3S`KNaDdE GUIB[0M X DF[&M¤Q_E DdWM ¨[ ­®M X

Q_E DFW–M%GRMSUe_G£ªü\BKYGUaZW1KNQ±\`SJDhSU¨W©ILw¢DFad[&M’W4V¢GJO^vSJV S KNatDFE GUIL[!\PGUO(SU[&W-DdQ E>w

V¢GJDFKNaª«M GUe_M e ýüWadGUv{ILW SUDSU¨IBQ GRKNQ_M¡M%GRMSUe|W IBGzSUGJGR„¯ILQ GUKNQ SJV S˜KNaŸDFEHGJIL[

KNaŸIBGzV¢GUOPvóKNQ SzV[/SU[ ILKYI KZV3GR[ IB[&EHILW SzV3SKNa]M Q GUX[ Q_E DFW M3¨[ ­õKYGRfgKNfuG

œ

\^KYGRaZM¦GJO¨WDSzVfg[¾Wv{GU[4ILM¦W4V¢GJO^v`KNw­õWv{GR[4ILM+I^GJV¢GJO^v{W¾\^GJO`SU[&K©ª²I^SJO`SU„xWQ_EHD

IBGzSUDdcL[4GREHfgW!IBGUE SUfgQ_W!KNQ¡DhGRad„@SU[¦SzV3S

œ

I c^SUQ IBGRad¨[&E>MaZ¨M%IL[ W„tždSRKNW&I ­³\`SRaFGRMSUDŸ\PGR[/SUv@SU[ KNQ y¬M GU[&MV¢[/SR[ ILKYIBM

Wv{GR[4IBM©yUSUaZW¥yUGRQpSUE¾ILELSR¨[ŸSzV3SKNanM3Q GJX[Q_E DFK¡IPSUe_XS4yUSUaZWyRGUGzSRw1­õyUG¬cBGJSUv e_GUE>f

œ yRGUQpSREHW\PGUO(SR[]­®Q_E DFK—I^SJe_X

ix

Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006

ø‚Üßþ€ÿå x

DhGJV¢wWGGJO^W žYGUIPSUX ­õDhGJV¢w±DSzVfg[ yUSRadW KZV3GR[4G ªüV[/SU[ ILKYILMGJO DSJV¢f†[ Wv{GR[4IBM

ILEIBGRW„xK8ILvx[]KNa–­®Q_E DFW¥yRGJGJSUw–ILEy¬wKYG4V[/SU[]ILKYILM©yUGRQpSUE>W\^GJO`SU[]ILE±DSzVfg[

IBGU[&KN¨[SUILQ M¥DhGR[&EHw+\PSUQ_[4ILQ_[0GUvxE ª«IPSUO(SR„{W¡KNQ WO(SJDFe_WXW

­õIPSRKNE G¬cBDSUG¢W£[&KN¨[W&DSJV¢fg[¦ª´Wv{GU[4ILM—WaFSJƒhGR[ELSRW1DFQ_EHw8ª´WE Dd[KNwµ­³IBGRE DF[-SUILQpG

W4V¢GUM3aZMŸ\`SUDFE GUIL[W&SU¨¢DFM%SUKYO^WŠžZSUKNWI^GPI^SUKNE G¬cBDSUGUW¡W£[&KN¨[&W¡KNQ W4SUDdcL[4GRELSUf†W

žNw¾KNa\`SRahSRM ¨[ \`SJSUfuGRKYGzSRM W \PSzSUGJe SUvxW KNQ I^GJSUv{GRQ_E DIBGREH¨¢GUI ­³I cBDhGUƒF[#W&DFGU¨M

DFQ_EHwª«DF[4GRKNw ­³I^SRKZSUKNQ W4SR¨KYDhGUO IBKNaZM]W&O`SJDhe IBO`SUvxwhcLM±IBQ_[4ILQ_[¨I^SJO`SU„xWQ

W&DF„xfgW¥ILE¾O`GUDhe_IÓª«W£W&GUM3fPI^GUDFM IBe_W£M+ª´W4SRv{Q_WŸª²I^SR[/V SUOW&Dd„{f†M-I^OPDhGUeŠIBXEyUSUa

­³žYƒFWKYGHª«IPSUDhGUX E>W

KNabIBGU[&KN¨[¥Dhƒhe|[¤SzV3SKNa8WMSRM%e|WKNQŠW&O`SJDhe|K—W£c^SRQ¨IBDFE GUIL[±ª²SUvxQ_W¤O^KYXM

SzVMŸ\PSJDhGR„xEHK©WMSRM%e|WILE IBOPKYX[ŠGR„TWc^SUQ ­´\BKYGUaZMWaZvxWWDdcL[0E GR¨[&K©IBvx[

œ

ILaFGRv{I KNQ KZV3GR[ IBKNKYGRw¢Gp\`SzV3SzVM¡y¬[&„ SUDhGR„@X[&KYGT\`SzV

(Markov Model) SUM GJOPDF[

ILadvxW IBGUKNa ªü\PSUv{GRQ W \PSJDhGR„xEHM¨IBGR[&KN¨[&W KNQ SUGUKZSRfgW ILKYGUw3StKNQ KZV¢GU[ ª´W&Ddc [&W

M3¨[±DFGRcBOPG/WO(SU„{X[¾ILw¢DFad[&W­õWDdcL[&W]ILELSR¨[±KNaµKYGU[&fgI^G>DhGR„{E>KbDhGR„{E>[ŸW[&KN¨[

w3VaFIL[ DFQ_E \`SUv{GRQ_W \PSJDhGR„xEHM¨WDdcL[&W IBGUE>¨[/SUWK±IBGJDdM%IBe|WW ILE O`SU„{X[&W

œ

IBGJDdM%IBe|WWÓyJSRMŸILM KNQ_[ŠM3¨[&WÓDhGRcLO`G`KNa+DSUX[¨I^SzSR¨¢OPv{GJƒ ­õDhGR„@X[&K!DhGR„@X[&[ŠIBv

žZSUKNW&I ­õW&DZcL[&W#ILELSR¨[ KNaKYGR[&fuILW>IBGR[&KN¨[&W ILadvxW DSJX[ ª´W&DdcL[&W ILE SU¨[&K

ª²\`SUM GUDhOPWÚyR[&„xWÚSJDhGR„{X[&MnWO(SJDFe_WSJDhGR„xE¤ILEE GR¨[&KóILvx[¦KNa¯KNaFƒhGR[ÚW&SU¨„@SU[/S¬cBƒhGRE

KNQ I^GRKNaZWŸDhGUaZ„@SU[±žYGJIW&DdcL[&W]ILE SU¨[&[SJGUƒF¨WŸX GJGUDFWŸ\^GJe_O(SR[&KbELSUM ILQ W&DFGU¨M

­õW[&KN¨[&W¥IBGUadGRv@I

KNQ W4SUEHDFW0ILw¢DFad[ÓILE#W[/V[&W0IBw¢Ddad[W&ILvxM vóª´W4V¢GUM3adW¡KNQ SUQpSRKNQ W0O^KYXM

V¢GJDFKNa M GJe|M eÓIBGJV SUXSILw¢DFad[]­³I^SJO`SU„xW

œ

ILadvxWKIBGRc^SRQ_M V3O^[4IBW ILw¢DFaZ[&W Q GR[/SU[ ­

žYDhGU¨K±IBGU[&KN¨[&W (pan-tilt units)

DhO^M–y¬vxw¢GJI]W„HžYDhGU¨K¡­®W&DdcL[+DhXE¥WMSUOPa¡žZDFGU¨KYGSUvxQ_K—V3XE¥W&O(SUDhe0DhGU„xE>[+DFM aZ[

¨3SUM3KDFOPM GTW&ƒSR¨¢DWMSUOPa¾DhO^M ý÷\PSUv{GRQ W&DhO^M0SU[4I^SJDFGUfgKNE SRv{Q KZSUw[&W M KYGRQ_[

œ

„xw¢DF[&MŸyRGUKYXžZGUILM]W&DdcL[&WIBDSR[&Q KSUEHDhXE WƒSU¨¢DFW0WM%SJO^adWÓDhO^M ­³\`SJV3OPe aFG

DhO^M ­®Wv{GR[4IBW „xw¢DF[&KWDdcL[&W ILEHM3WK¡SRE DhXE \`SJV3OPe|W DhO^M V3GRaZM ª«W£v@GR[4ILW

Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006

xi ø‚Üßþ€ÿå

DhGR„xE¦ILE+I^GRvxQ_Kt\PSU¨¢GUD¯GUvxE¦DFQ_E>w’GRE]ª«KNQ_wvWMSUOPaZW-DhOPM¡DFQ_EHw¯KNadƒhGU[!\PSzV¢O`e|W

V¢GJe SUWIBGUX vxWM1\`SUKZVM3W¤\PSJDhDFGUfLKNadƒhGR[¤DhO^M–KNw–\^WM1\`SUv{GRQ_W¤\PSUM3¨[&W]­®W&O(SUDhe|W

y¬v@SUWW&DZcL[&W¤KNaµIBGUKNadGJƒdWIBGU¨¢GRE ILWŸSRw©XSRvx[¾W£MSUOPaZW]DhOPM­õWDdcL[&WILadGUv{IÓSUM3fgK

W&DhO^[&W¨yRILv@SUWM¾Wv{GU[4ILWӄ{w¢Dd[&MyRGRKYX MŸW&DZcL[&W¨ILE DhSUEHQ_WKÚGJIBDdc [4G§ª²IBGR[4GJX I

WadGUM O Wv@SUWW¨¢GRE ILWŠSUwX

EHW¦KNwwnDFW£[¦Wv{GU[4ILW+„xw¢DF[&K W&DdcL[&WIBE]E SUM3WK—GJIBDdc [4GDhO^M W+ILKYGRaFƒ›žYKNW[&M

œ

IBe_e GRM3[ W&DhOPMIBGRad¨[&E>MÓaZ¨M%ILW \PSJDhO^M W Q GU[/SR[ ­³GR„ W¨¢GRE I yRILv†SRWM ª«DFQ ƒ

œ œ

W&DhO^M¡I c^SUQ Wv@SUW [> Ô[&M¡­ [> z[&M KZV3GR[

(Model Predictive Control - MPC) œ

\`S¬cLKYO yRILv@SUWMSzV3SJILaZW#GUM3¨[#KNa¥IPSR„{X

IBGUw¢Ddad[ŸyRGRfgwÚª«IPSUe_XS IBGJSRc^SREIBGRw¢DFad[&KbILKNM GJO^[W&DhO^M-W£c^SUQ Wv@SRW[> z[&M¦­®Q adDhG

DhOPX [&MIBGRv@GUDhXEHW\`SRv{Q_W[IBGJSJGR[/V3OPIBW ­õW£[4GzVw¢G^\`SzSR[/SUw¥\`SRw3SUKNW&ILM¥IBGUKYƒdc [&W

SUQ_WÚIBKNadƒFWždSRKNW&ILQŠžNwn\`SRM%D›\`SU[/SJV3O^[\`SRM%GRQpSJX1ad¨M3K GUDFQ ƒSUE[> z[&M WÚI c^SUQ

œ

\`SRM ¨[&WM X

DSRW[&W]žZSUKNW&ILMÚ\^fP[> Ô[&M3WI c^SRQ IBE±Q_[&[&K›GRv{KbDdQpƒdE¾W£„`žZSUKNWI0­³ILw¢DFaZ[&WKNQ

­®W&DdcL[DhXEIL[&Ey¬[&„xM-WM%SJOPanKNQ

aZW ILw¢DFaZ[ KNQ GtW&O`SJDhe|W I c^SUQ KNQ \`SJSJGUepSRv{G—IBGJSU¨KYGR[/SJe ILKNKYGRwÓW&V3GRM aZW

œ

DhGRM a cL[ ILvxw¢GJILM GUaZ¨¢GUM W&O`SJDhe|W I c^SRQ KNQ IBGzSR¨KYGU[/SUe ­®WM%SJO

(matlab) M3K

IBGzSR¨KYGU[/SUe|W ­³SzV[/SU[ILKYI GRE¨GJV¥\BKYGUaÚIBGJOPDhGJe DFQ_EŠI^GR[&KN¨[0adM DFE#GRE#\PSzSUIBQ

DhGR„xE¦WM›WM%SUM eŠª²I^SRv{fgGU[4GRW!WDdcL[&W-ILaFGRv{I¤WM›W£MSRM%eŠª«yRGUfgwnªü\PSUv{GRQ0\`SREHv{ILMbGUaZ¨¢GUM

adGUvxK¡W&SJGUƒF¨–W&DZcL[&W+WM1WM%SUM e G ª´WW&GRM f^W&Ddc [&W¥ILadƒhGRWK¡IBGUDFM I^e|WW¥GRM©\`SzSUGJe|[

KNa©IL[&f“V3GR[&Q W&O`SJDhe|WIPSRvxM%ILKŸ­³\^¨[4GR¨[\`SUDhGU„xEÓDhƒhe|[yUSRMW£W&GUM3fIBGUDFM I^e|WM

ª«yRwbGR[&w–­³IBGJSJO(SR„ SJV S KNa¯W&IBƒF¨v/DFQ_E¥W&O`SJDhe|WÚIPSRvxM%ILK¯M%D’yRGzSR[/SzV8IBGzSR¨KYGU[/SUe_W¦SzV3S IBGU[&KN¨[SzV3S4KNa8W£MSUOPaZW¤ILw¢DFaZ[KNQ#\PSzSUGJe SUv^GRad¨¢GRM

\`SR[/SUf“V[Ÿ\PSzSJGUe SUvxWŸ­ pan-tilt

ýÀIBGUDdcL[¦SRfuGUeÓSRvxQ0GJO`V¢M3vB­®W&DdcL[ÚDhXE¥ILw¢DFaZ[&WÚKNQIB[&E¤y¬[&„{MbWM%SJO^adWÚILKYGUw3S ILE

\`SR[/SUf“V[¾\`SJSJGUe SUvxW¾­õW£aFGRM%O0W¨¢GRE ILM-WaZv{W¤W&DdcL[4G&e_GRv†SUe#KNf(IBDhGR¨M©W£advxWW&DZcL[

W&ƒSR¨¢DWM%SJOPa]KNQ M GUKZSRQ žYGJI W&DdcL[&W DFX E M GUOPaZK¥M3KYGUQ_[&W#DFOPM3W ILKYGRw3SILE

DhXEM GUOPaZKbV3GRanKYGUw3S4GUv@SREW&ƒSR¨¢DFW¤WM%SJO^adWDhOPM©DFQ_EHw1\`SUKNadƒhGR[&Wª²\`SzV¢O`e \Ba

Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006

ø‚Üßþ€ÿå xii

­®W&Ddc [&W

DFM IBe_W KZV3GR[ KNa¾IBe_e_GUM3[&W WM%SUM e|W KNQ W&O`SJDhe|K¥W£c^SUQ W£f“SU¨[ W&V3GRM aZW

œ

IBe_e GRM3[&W ª´W„8fuGUe_[ W&O(SUDhe I c^SUQ GRv{ILahSJV S¯M3c^SR[&K ­®W&Ddc [&W ILadGUv{I KNQ SUI^G

­õIBM3Q X GU[&[&WW4SUEHDFW#\^GJX ILMŸWfg¨¢GRW¨\^Ddc ª²\`SJDFX E \`SR[4GJX IL[#W&O`SJDhe IBG¬c^SUQ KNa

WKYGRadQ SJƒFwI^GJSJO`SU„¯KNQ WO(SJDFe_W I c^SUQ_K¤M D¾yUGJSU[/SJVILad¨¢GR[&W W&O`SJDhe|W I c^SUQ_K

KNa¾ahV3SU[ Ww¢GUILMÓKZSRKNwWK]IBDFQ ƒFEH[ Wc^SUQ_W ­õGUv{DhOPX [ KNQ IBGJSUv{GUQ_EHDIBGUE>¨¢GJIL[

ILadvxWŸžYDhGR¨Kb­õI^GJSU¨KYGR[/SJe|M-W&O`V¢M3v^\`SUDhOP[KNQ \PSUv{GRQ \`SUfgGJe|M©W&IBGUfgW£v@ILW&G/\BKYGUaZW

WQ_aZv ª«WE>¨[&vxQ¡DhXEHKíW&Ddc [ÚDhXE¦WM%SJO^adKYG%DhXEHK§V3XE¦W&O`SJDheDhGU„xE>[ÚIBGR[&KN¨[&W

\`SRM%GRQpSJXM ILQ_[4IBQ_[&W KZV¢GU[ IBe e_GRM [ W&DhOPM WQpV¢X W&DhO^M#I c^SUQ_M#Q GR[/SUQ

œ

I c^SUQ ­®W&DdcL[ DhXE W[&KN¨[ KNQ WM%SJOPaŠSUw¢DFGU¨KW£v@GRQ_E DFW \^adƒFM \`SR[/SzV¢OP[

œ

SUw¢DFGU¨Kª«y¬w¾GU[&w ­õIB[&E y¬[&„{MW&DdcL[ DhXE ILM O`GRadQ ILw¢DFad[ KNa]IL[&f“V¢GU[ W&DFOPM3W

œ

KN¨[ILM KNQ_[&W0ILM w¢DFGU[ÓILw¢DFad[0W&ILvxM3v˜IBGzSUO(SU„pKNQ \PSzSRv†SRadWŠILaFGRv{I KNa¦DhOPX[&W

œ

WQ_[4ILQ_WŠQ_E DFWŠDhXE WM%SJO^adW ­³\`SJSUv@SUaZW I^GRadGUv{I X

­®W4V3GUOPv¨¢GUWG%IBƒhe_GUW©žZSUKNWIBM

žZSUKNW&IaZ¨¢GRM%GWv{GR[4ILM1WahSUƒhGR[&W¥I^SUDhGUQpSR[¥WEHDF[KNQŠW4SUDdcL[4GRELSUf†W¥W&DdE GUIy¬w–GR[&w ­õILw¢DFad[&W¤KNQ#W4SU¨¢DFM%SUKYO

Technion - Computer Science Department - M.Sc. Thesis MSC-2006-27 - 2006