<<

Mobile Navigation Using Visual Servoing

T. TEPE DC 2010.018

1

DYNAMICS & CONTROL TECHNOLOGY GROUP

MOBILE USING VISUAL SERVOING M.Sc. INTERNSHIP

Supervisor : Prof. Dr. Henk NIJMEIJER

Coach : Dr. Dragan KOSTIĆ Student : Tufan TEPE Student ID: 0666323

2

ABSTRACT

Equipping with vision systems increases the versatility of the robots but also complexity of their control. Despite the increasing complexity, vision remains an attractive sensory modality for navigation since it provides rich information about the robot's environment.

In this work, a problem of visual servoing based on a fixed monocular camera mounted on a mobile robot is investigated. A homography based control method is used for autonomous navigation of a mobile robot with nonholonomic motion constraints. The visual control task uses the idea of homing. With this approach, an image is taken previously at the desired position. Then, the robot is driven from an initial position towards the desired position by using the information extracted from the target image and the images taken during movement of the robot.

3

Table of Contents

1 INTRODUCTION ...... 5

2 DESIGN ISSUES ...... 5

2.1 Camera configuration ...... 5

2.1 Servoing architectures ...... 6

3 AN INSIGHT INTO VISUAL SERVOING METHODS ...... 7

3.1 The geometry of image formation ...... 7

3.1 Analysis of visual servoing methods ...... 9

4 PROJECT DESCRIPTION ...... 13

5 HOMOGRAPHY BASED VISUAL SERVOING of a NONHOLONOMIC MOBILE ROBOT...... 13

5.1 Homography and its estimation ...... 13

5.1.1 Geometric transformations ...... 14

5.1.2 Situations in which solving a homography arises ...... 18

5.1.3 How to find the homography? ...... 19

5.2 Motion model of the mobile robot ...... 29

5.3 Input-output linearization and control law ...... 33

5.3.1 Input-output linearization ...... 34

5.3.2 Control law ...... 35

5.3.3 Desired trajectories of the homography elements ...... 36

5.4 Stability analysis ...... 38

6 SIMULATIONS ...... 39

7 EXPERIMENTAL ARRANGEMENTS ...... 58

8 CONCLUSIONS ...... 59

APPENDIX A ...... 60

APPENDIX B ...... 62

4

1-INTRODUCTION

Robots are electro-mechanical machines which are designed in such a way that they interact with their environment. In order to realize that interaction in a desired manner, they must be equipped with appropriate sensory modalities. In today's world so far, most of robotic applications take place in known environments or in the environments which are arranged to be suitable for robots. Robots have been rarely used until lately in the work environments which can not be controlled fully or about which not much information is available. The main reason of this limitation lies under the insufficient sensory capabilities of the robots. In order to compensate for the lack of information obtained from the surroundings, integration of different sensors to the robots is made to be one of the crucial steps in the design of the robots and vision is recognized to be very important to increase the versatility of robots. In the last couple of decades, a lot of work and investigation have been carried out successfully in the area of robotic vision [1], [2], [3]. Increased computing power and developed pixel processing hardware enable analysis of images at a sufficient rate to guide the robotic manipulators without touching the objects [1]. With the use of vision devices and the information obtained from them in robotic applications, the term "visual servoing or visual servo control" is started to be used. "Visual Servo Control" refers to closed loop control of the pose of a robot by utilizing the information extracted from vision sensors and it relies on the offerings and techniques from many elemental areas such as image processing, , kinematics, dynamics and control theory.

2-DESIGN ISSUES

While designing a vision-based control system, one can raise many questions ranging from the type of the camera to be used to the type of the lens, from the number of cameras to where to place the cameras, from which kind of image features to utilize to whether to derive three dimensional description of the scene or to use two dimensional image data or combination of both etc. Since vision has a broad application area and new techniques and solutions are being developed day by day, the number of this type of questions can be increased easily. However, two very crucial issues in the design step of vision based control systems are explained to stay in the bounds of this project and people can consult with numerous academic sources easily to obtain detailed information for other aspects.

2.1. Camera Configuration

One main issue when constructing a vision based control system is the determination of the place where the camera is positioned. There are two main options: the camera can be placed at a fixed location and it does not possess any motion or it can be mounted to the robot. These configurations are named as "fixed camera" and "eye-in-hand" configurations respectively.

If a fixed camera configuration is used, the camera is placed at a location that it is allowed to observe the task space and the robot/manipulator. Since the camera is not exposed

5 to any motion, the geometric link between the task space and the camera does not change. However, the clear view of the task space of the camera can be hampered by the manipulator motion and this kind of occlusions can create severe degradation of the performance or even some instability issues.

With an eye-in-hand system, the camera is mounted on the robot/manipulator. This configuration enables the camera to see the task space without any occlusions while the robot travels around the work space. As opposed to the fixed camera configuration, the geometric relationship between the task space and the camera alters when the robot moves in this configuration. On the other hand, the scene that the camera sees can change very drastically when the position of the camera attachment point is exposed to large and fast movements. This drawback may be encountered especially with multiple link robotic manipulators and could have undesired performance consequences.

2.2. Servoing Architectures

Different servoing architecture classifications are offered by different people in the literature but the mostly used one is based upon the question: "Is the error signal or the task function defined in three dimensional work space coordinates or directly in terms of the image features?" and the answer to this question resulted in such a taxonomy that the error signal can be defined in 3D workspace coordinates or directly in terms of image features or combination of them.

2.2.1. Image Based Visual Servoing

This approach uses the image data directly to control the robot motion and the task function is defined in the image such that there is no need to estimate the pose error in Cartesian space explicitly. The image measurements that are used to determine the task/error function are the pixel coordinates of a set of image features such as interest points and the task function is isomorphic to the camera pose. A control law is constructed to map the image error to robot motion directly. A system can either use a fixed camera or eye-in-hand configuration. In either case, the motion of the robot results in changes of the image provided by the vision system. Hence, determination of an image based visual servoing task necessitates an appropriate definition of an error e such that when the task is accomplished, error becomes zero.

2.2.2. Position Based Visual Servoing

The vision data are used to build the 3D representation of the scene with this approach, that is, the task/error function is expressed in Cartesian space. Features extracted from the image and/or 3D model of the object are used to find out the position and orientation of the target with respect to the camera. Using this information, an error between the current pose and the desired pose of the robot is defined in the work space and suitable coordinates can be provided as set points to the controller.

6

2.2.3. 2D ½ Visual Servoing(Hybrid Visual Servoing)

The task function is expressed both in Cartesian space and in the image such that the rotation error is estimated explicitly in Cartesian space and the translational error is expressed in the image. 2D 1/2 visual servoing is based on the estimation of the partial camera displacement from the current to the desired camera poses at each iteration of the control law. Contrary to position based approaches, it does not need 3D model of the object and contrary to image based methods, it can avoid some stability problems in the whole task space [4].

3-AN INSIGHT INTO VISUAL SERVOING METHODS

In order for the reader to get an insight into visual servoing methods, an analysis of these methods is carried out and related references are given in this section. Before going on with this analysis, the geometry of image formation is explained as a preliminary subject.

3.1. The geometry of Image formation

A digital image is a data structure representing a generally rectangular grid of pixels. The word pixel is based on a contraction of pix ("pictures") and el (for "element"). Pixels are normally arranged in a 2-dimensional grid, and are often represented using dots or squares. The image is formed by directing the light onto a two dimensional array of sensing elements. Each pixel has a value which is corresponding to the intensity of the light focused on a particular sensing element [5]. The mediums used to focus the light onto the sensing elements are the lens and the sensing elements are composed of charge coupled device sensors. A charge-coupled device (CCD) is a device for the movement of electrical charge, usually from within the device to an area where the charge can be manipulated, for example conversion into a digital value [6].

3.1.1. The Camera Coordinate Frame

Image plane is the plane that contains the sensing elements and the camera coordinate frame is assigned as follows: i) z axis is chosen to be perpendicular to the image plane and along the optical axis of the lens, ii) The origin of the camera coordinate frame is λ (focal distance of the camera) much behind the image plane, iii) x and y axes are assigned according to the right hand rule and they are taken to be parallel to the horizontal and vertical axes of the image plane respectively.

The origin of the camera coordinate frame is called center of projection and the point where the optical axis crosses the image plane is the principal point. An illustration of the coordinate frame is given in Figure 3.1. Any point on the image plane can be represented by the coordinates of (u,v, 휆) with respect to the camera coordinate frame.

7

Figure 3.1. The Camera Coordinate Frame

The point P whose coordinates with respect to the camera coordinate frame are (x,y,z) is projected on to the image plane with coordinates (u,v, 휆). The relation between these coordinates with an unknown positive constant k is given as:

 x  u  k y   v       z     From this equality, following equations can be obtained easily.

휆 푥 푦 푘 = , 푢 = 휆 , 푣 = 휆 (3.1) 푧 푧 푧 This relation is defined for perspective projection method which is a widely used camera projection method. There are also other camera projection methods such as scaled orthographic projection and affine projection offered by S.Hutchinson [2]. Analysis of visual servo control methods will be based upon perspective projection method in this report.

3.1.2. The Image Plane and the Sensor Array

The row and column indices for a pixel are denoted by pixel coordinates (r,c). In order to establish a relation between the coordinates of image points and their corresponding 3D world coordinates, the image plane coordinates(u,v) and the pixel coordinates(r,c) must be related.

Let the pixel coordinates of the principal point be denoted by (or, oc) and let the origin of the pixel array be attached to the corner of the image. The horizontal and vertical dimensions of a pixel are given by sx and sy respectively. sx and sy are the scale factors relating pixels to distance. Also, the vertical and horizontal axes of the pixel coordinate system usually point in opposite directions from the horizontal and vertical axes of the camera frame [5]. Therefore,

8 combining all the information above reveals equation (3.2) which relates the image plane coordinates and pixel coordinates.

푢 푣 − = 푟 − 표푟 , − = 푐 − 표푐 (3.2) 푠푥 푠푦

3.2. Analysis of Visual Servoing Methods

As it is stated before, there are mainly three classes of visual servoing methods and explanation of each method is generally specific to the application. For this reason, a lot of works are being added to the visual servoing knowledge and all of them deserve its own tutorial so it is not possible to cover all the available methods here. Thus, only the classical image based visual servoing method is considered here in order to gain some basic insight and some appropriate references are pointed out for other methods.

The aim of all vision based control schemes is to minimize an error usually defined by

푒 푡 = 푠 푡 − 푠∗. s(t) denotes a vector of image feature values that are tracked during motion and 푠∗ contains the desired values of those features. If a single point is used as an image feature, then s(t) can be defined in terms of image plane coordinates of that point as such

푢(푡) 풔 풕 = . 푣(푡)

The time derivative of 풔 풕 is called as an image feature velocity and it is linearly related to 푣 the camera velocity. If the camera velocity is represented by 흃 = in which 푣 stands for 휔 linear velocity of the origin of the camera and 휔 stands for the angular velocity of the camera about the z axis of camera coordinate frame, then the relationship between the image feature velocity and the camera velocity becomes

풔 = 푳 풔, 풒 흃. (3.3)

The matrix L is called image Jacobian matrix or interaction matrix and it is a function of image features and position of the robot. In order to derive the interaction matrix which relates the velocity of the camera(흃) to the time derivatives of the coordinates of the projection of a 3D fixed point 푷 in the image (풔 ), it is necessary to find out an expression for the velocity of point 푷 with respect to the moving camera. Using homogeneous transformation equations, the relation between the coordinates of point 푷 with respect to the world frame and with respect to the moving camera can be established as 푷풐 = 푹 풕 푷풄 풕 + 풐(풕). In this equation, 푷풐 stands for the coordinates of P with respect to the world coordinate frame and 푷풄 is the coordinates of P relative to the moving camera frame. Also, 푹 풕 and 풐(풕) are the rotation matrix and the translation vector respectively between the world frame and the camera coordinate frame. Thus, the coordinates of P relative to the camera frame can be obtained as in the following equation

9

푷풄 풕 = 푹푻 풕 푷풐 − 풐 풕 (3.4) since 푹푻 풕 = 푹−ퟏ 풕 . By taking the time derivative of equation (3.4), we get

푷 풄 풕 = 푹 푻 풕 푷풐 − 풐 풕 − 푹푻 풕 풐 풕 (3.5) since 푷풐 is invariant in time. Plugging 푹 = 푺 흎 푹 and 푹 푻 = 푹푻푺(흎)푻 = 푹푻푺(−흎) into equation 3.5 and after some manipulations, the following equation is obtained [5].

푷 풄 풕 = −흎풄 풕 퐱 푷풄 풕 − 풐 풄 풕 (3.6)

Here, 흎 풄 and 풐 풄 are the angular velocity and linear velocity of the camera respectively expressed in the camera coordinate frame. If the arguments in equation (3.6) are defined explicitly and the cross product and subtraction operations are done, a system of three independent equations are obtained.

푥 푡 푥 푡 휔푥 푡 푣푥 (푡) 풄 풄 풄 풄 푷 (풕) = 푦 푡 , 푷 (풕) = 푦 푡 , 흎 (풕) = 휔푦 푡 , 풐 (풕) = 푣푦 (푡)

푧 푡 푧 푡 휔푧 푡 푣푧 (푡)

The coordinates of point 푷 relative to the moving camera as well as the angular and linear velocities of the camera with respect to the camera coordinate frame are time dependent. However, the explicit time dependence will not be shown in the following equations for the sake of simplicity of the notation.

푥 휔푥 푥 푣푥 푦 = − 휔푦 퐱 푦 − 푣푦 (3.7) 푧 휔푧 푧 푣푧 Equating the right hand side and the left hand side of the equation (3.7) results in a system of three equations (3.8)-(3.10).

푥 = 푦휔푧 − 푧휔푦 − 푣푥 (3.8)

푦 = 푧휔푥 − 푥휔푧 − 푣푦 (3.9)

푧 = 푥휔푦 − 푦휔푥 − 푣푧 (3.10)

Combining these equations with equation (3.1) gives the equations (3.11)-(3.13). 푣푧 푥 = 휔 − 푧휔 − 푣 (3.11) 휆 푧 푦 푥 푢푧 푦 = 푧휔 − 휔 − 푣 (3.12) 푥 휆 푧 푦 푢푧 푣푧 푧 = 휔 − 휔 − 푣 (3.13) 휆 푦 휆 푥 푧

10

It is also necessary to find the time derivative of the image plane coordinates. While taking the time derivative of image plane coordinates, equations (3.11)-(3.13) are used wherever necessary.

푧푥 − 푥푧 휆 푢 푢푣 휆2 + 푢2 푢 = 휆 = − 푣 + 푣 + 휔 − 휔 + 푣휔 (3.14) 푧2 푧 푥 푧 푧 휆 푥 휆 푦 푧 푧푦 − 푦푧 휆 푣 푢푣 휆2 + 푣2 푣 = 휆 = − 푣 + 푣 − 휔 + 휔 − 푢휔 (3.15) 푧2 푧 푦 푧 푧 휆 푦 휆 푥 푧 Equations (3.14) and (3.15) can be represented in the matrix form [5]:

푣푥

푢 푢푣 2 2 푣푦 − 휆 0 휆 +푢 푢 푧 휆 − 푣 푣푧 = 푧 휆 (3.16) 휆 푣 휆2+푣2 푢푣 휔 푣 0 − − −푢 푥 푧 휆 휆 푧 휔푦 휔푧

The first three columns are dependent on the image plane coordinates (푢, 푣) and the depth, 푧, of the 3D point relative to the camera frame. Therefore, the interaction matrix must estimate or approximate the value of 푧 for any control scheme using this form. This depth information can come from stereotype cameras, multiple cameras, a single camera but with multiple views or proper range sensors/finders. 푧 can be estimated, for instance, by triangulation for at least two views of the scene. As can be seen, the part of the interaction matrix which includes the depth value is related to the translational part and rotation part is just dependent on image plane coordinates.

When more than one point is tracked in the image, the interaction matrices for each point can be stacked in one general interaction matrix in order to find the camera movement.

휆 푢1 푢1푣1 2 2 − 0 휆 +푢1 푧1 푧1 휆 − 푣1 푢 휆 푣푥 1 휆 푣 휆2 +푣 2 푢 푣 푣 0 − 1 1 − 1 1 −푢 푣 1 푧 푧 휆 휆 1 푦 1 . 1 . . . . . 푣 . 푧 = ...... 휔푥 휆 푢푛 푢푛 푣푛 2 2 푢 휆 +푢푛 휔 푛 − 0 푧 휆 − 푣 푦 푧푛 푛 휆 푛 푣 2 2 휔푧 푛 휆 푣푛 휆 +푣푛 푢푛 푣푛 0 − − −푢푛 푧푛 푧푛 휆 휆

Thus 푳 ∈ R2nX6 and therefore three points are sufficient to solve for 흃 given the image measurements 풔 and desired camera velocity 흃 can be used as the control input. In order to find 흃, if possible, the interaction matrix must be directly inverted. Otherwise, pseudoinverse(Moore-Penrose inverse) must be used. If k many features are tracked in the image and the camera has a velocity which is consisting of m components and rank(L)=min(k,m), i.e., L is full rank, then there are three possibilities for the inversion of the interaction matrix.

11 i)If k=m, 흃 = 푳−ퟏ풔 Enough number of features are observed. ii)If km, 흃 = 푳+풔 where 푳+ = 푳푳푻 −ퟏ푳푻More than sufficient number of features are observed.

Proof of the stability can be done with the help of a suitable Lyapunov function for the error system.

1 푉 푡 = 풆(풕) 2 2 The Lyapunov candidate must be positive definite in the space except at the origin of the error system and its time derivative 푉 (푡) = 풆푻풆 must be negative definite excluding the origin. Stability of the system is proven, if 풆 is chosen as 풆 = −휅풆 (휅 being a positive constant). The choice of the time derivative of the error for the vision system can be made as the following:

풆 풕 = 풔 풕 − 풔∗

풆 풕 = 풔 풕 = 푳흃

Since 풆 = −휅풆 must be satisfied, 푳흃 = −휅풆 must also be satisfied.

If k=m and rank(L)=min(k,m), then the exact inverse of the interaction matrix exists so we can use 흃 = −휅푳−ퟏ풆(풕) as the control signal. Also, the time derivative of the Lyapunov function stated above becomes

푉 = 풆푻풆 = 풆푻푳흃 = −휅풆푻푳푳−ퟏ풆 = −휅풆푻풆 < 0 and this proves the asymptotic stability.

If k>m or km and k

푉 = 풆푻풆 = 풆푻푳흃 = −휅풆푻푳푳+풆 ≤ 0 since 푳푳+ is positive semidefinite. Therefore, system stability can be proven but this is not valid for asymptotic stability.

The analysis of the position based visual servoing and hybrid approaches may vary from an application to one another. Several of important basic works can be enumerated as [4], [7], [8]. For these kinds of visual servoing methods, the main aim is to minimize the error 풆 풕 = 풔 풕 − 풔∗ too as the case in classical image based control method but this time the ingredients of 풔 change depending on the available information, set-ups and the aim of the application.

12

4-PROJECT DESCRIPTION

Having provided an introduction to visual servoing and been familiar with the classical methods and the way of constructing the control law, description of the project and the rest of the work will be more appropriate to build on top of the basics. In this assignment, a problem of visual servoing based on a fixed monocular camera mounted on a mobile robot is investigated. The objective is to design a control law for autonomous navigation of the robot with non-holonomic motion constraints. The visual control task uses the idea of homing. With this approach, an image is taken previously at the desired position. Then, the control law drives the mobile robot from an initial pose towards the desired pose by processing the image information extracted from the target image and the current images taken during the movement of the robot. Off beat the classical methods, homography based visual servoing method is adopted in order to achieve this task without the need of depth estimation or any measurements of the scene. With this approach, the controller is obtained by an exact input- output linearization of the geometric model in which homography elements are chosen to be the outputs of the system [9].

5-HOMOGRAPHY BASED VISUAL SERVOING of a NONHOLONOMIC MOBILE ROBOT

In this chapter, detailed analysis of homography based visual servoing is carried out. Section 5.1 describes the homography and develops an understanding of it and section 5.2 derives the motion model of the mobile robot. In section 5.3, input-output linearization of the system is done through the homography and control law is constructed based upon that linearization scheme. Then, in section 5.4, stability analysis of the system is conducted.

5.1. Homography and Its Estimation

A two dimensional point 푿ퟐ푫 = (푥, 푦) which lies on a plane can be represented by a three ퟑ푫 ퟐ푫 ퟑ푫 dimensional vector as well like 푿 = (푥1, 푥2, 푥3). Here, 푿 is the scaled version of 푿 푥 푥 by its third elements such as 푥 = 1 푎푛푑 푦 = 2 . When points on a projective plane 푥3 푥3 are represented with respect to a coordinate frame whose x and y axes are on the very same projective plane, all points possess the same depth value such that "z" coordinate does not mean much. Therefore, all points are scaled by the third element and "z" becomes 1 for all points. This kind of representation(푿ퟑ푫) is used in homography analysis and it is called homogeneous representation of a point lying on a projective plane 푃2. Then, homography can be defined as a mapping of these points from one projective plane to another projective plane and it has the property of invertibility. Synonymies of homography are projectivity, planar projective transformation and collineation. According to [10], a homography is an invertible mapping from P2 to itself such that three points lie on the same line if and only if their mapped points are also collinear and its algebraic definition is as such: A mapping from P2 → P2 is a projectivity if and only if there exists a nonsingular 3x3 matrix H such that for any point in P2 represented by a vector x, it is true that its mapped point is equal to Hx.

13

5.1.1. Geometric Transformations

There are several geometric transformations each of which has some properties peculiar to them and homographies are one of them. Homographies will be better understood if it is explained in a context which includes other types of geometric transformations. A detailed description of all geometric transformations can be found in [10].

i)Isometries

Isometries(Iso=same, metric=measure) are transformations of the plane 푃2 that preserve Euclidean distance. An isometry can be described by equation 5.1.

푥′ 휖푐표푠휃 −푠푖푛휃 푡 푥 푥 푥 푹 풕 푦′ = 휖푠푖푛휃 푐표푠휃 푡 푦 = 푦 (5.1) 푦 ퟎ 1 1 0 0 1 1 1 where 휖 = ∓1. If 휖 is 1, the isometry is preserving the orientation and it becomes a Euclidean transformation. If 휖 is -1, then it is reversing the orientation. Euclidean transformations represent the rigid body motion. The isometry consists of planar rotations and translations. If the rotation matrix becomes identity matrix, this means the points are just 2D translated. Also, if the translation vector becomes a zero vector, then the points are exposed to pure 2D rotation. A planar Euclidean transformation has three degrees of freedom: one d.o.f. for rotation(휃) and two d.o.f. for the translation(푡푥 푎푛푑 푡푦 ). The distance between two points is kept same when they are mapped by an isometry transformation and so is the angle between two lines and the area.

ii)Similarity Transformations

A similarity transformation(or a similarity) is an isometry but with a difference of isotropic scaling and its representation is given in equation (5.2).

푥′ 푠푐표푠휃 −푠푠푖푛휃 푡 푥 푥 푥 푠푹 풕 푦′ = 푠푠푖푛휃 푠푐표푠휃 푡 푦 = 푦 (5.2) 푦 ퟎ 1 1 0 0 1 1 1 where the isotropic scaling is direction invariant. "s" adds one more degree of freedom to isometries and the similarity has four degrees of freedom. A similarity no longer preserves the distance between the points when 푠 ≠ ∓1. However, it keeps the ratio of the distances and the angles between lines invariant so it preserves the shape. An example is shown in Figure 5.1[11].

14

Figure 5.1 Similarity Transformation

iii)Affine Transformations

An affine transformation(or an affinity) is a non-singular linear transformation followed by a translation [10]. It is like a similarity but it has two rotations and two non-isotropic scalings. It is represented by

′ 푥 푎11 푎12 푡푥 푥 푥 ′ 푨 풕 푦 = 푎21 푎22 푡푦 푦 = 푦 (5.3) ퟎ 1 1 0 0 1 1 1

It has six degrees of freedom corresponding to 푎11 , 푎12 , 푎21 , 푎22 , 푡푥 , 푡푦 . The affine matrix 푨 can be decomposed as

휆 0 푨 = 푅 휃 푅 −휙 퐷푅 휙 푤푕푒푟푒 퐷 = 1 . 0 휆2

Therefore, what the affine matrix 푨 does is a rotation by 휙, a scaling of 휆1 in the direction of x and another scaling of 휆2 in the direction of y, a rotation by – 휙 and another rotation by 휃. An affinity has two more degrees of freedom than a similarity. Those are corresponding to the angle 휙 which shows the direction of scaling and the ratio of scaling parameters 휆1/휆2. Figure 5.2 shows the interpretation of the action of the affine matrix 푨.

Figure 5.2 Effect of Affine Transformation

If the affine matrix is considered in two parts like 푨 = [푅 휃 | 푅 −휙 퐷푅 휙 ], then 푅 휃 corresponds to a rotation preserving the shape and 푅 −휙 퐷푅 휙 part corresponds to the

15 deformation of the shape in the axis defined by 휙 and in the axis that is perpendicular to the axis defined by 휙 and the amount of distortion is dependent on the scaling factors 휆1and 휆2. Figure 5.3 [12] shows some examples of affinity transformation.

Figure 5.3 Visual examples of affinity transformations

The distances between the points and the angles between the lines are not preserved in affine transformations. However, there are some invariants such that parallel lines in one image remain parallel in the mapped image, ratios of lengths of parallel line segments and the ratios of areas are kept unchanged.

iv)Perspective Projection

Perspective projection is the projection of three dimensional points in the Cartesian space to two dimensional points. This projection is an important projection method which is widely used. This projection describes the mapping of points in the space into the image plane when images are taken by the cameras. A perspective projection can be described by

풙 = 푷푿 where P is 3x4 projection matrix, 풙 is an image point represented by a homogeneous 3-vector and X is a point in the space represented by a homogeneous 4-vector[13]. In the projection matrix, there are 12 elements but they are defined up to a scale constant, i.e., the ratios of the elements are significant so it has 11 degrees of freedom. These 11 degrees of freedom come from internal and external camera matrices. Internal(Intrinsic) camera matrix or camera calibration matrix provides 5 degrees of freedom and external(extrinsic) camera matrix

16 provides 6 degrees of freedom. Perspective projection can be split into two phases in terms of its actions. First, it finds the coordinates of the point, which is in the 3D space, with respect to the camera frame by the help of homogeneous transformation matrix. Then, it projects those coordinates which are relative to the camera frame into the image plane and this is done by using intrinsic camera matrix.

Extrinsic camera matrix can be defined as [R|t] which accounts for the rotation matrix and the translation vector between camera and world frames, so six external parameters relate the camera orientation to the world coordinate system. Those six parameters are 3 rotations expressed by 3x3 rotation matrix "R" and three translations denoted by 3x1 vector "t".

Intrinsic camera matrix "K" can be defined as

훼푥 푠 푥표 푲 = 0 훼푦 푦표 . 0 0 1

In this matrix, 훼푥 and 훼푦 are the focal lengths of the camera in terms of pixel dimensions in the x and y directions respectively. 푥표 and 푦표 are the coordinates of the principal point in pixels in the image and 푠 is the skew parameter which shows the deviation of pixels from orthogonality (or perpendicularity of the sides of the pixels). 푠 = cot(휍) where 휍 is the angle between sides of the a pixel. Generally, pixels are rectangular so 휍 = 90° and then 푠 = cot 휍 = 0.Hence, intrinsic camera matrix 푲 explains 5 internal parameters

(훼푥 , 훼푦 , 푥표 , 푦표 , 푠).

Thus, the projection matrix can be represented by the combination of extrinsic and intrinsic camera matrices 푷 = 푲[푹|풕].

An example of perspective projection is given in Figure 5.4 [14].

Figure 5.4 Perspective Projection

17

All 3D world points are mapped into 2D image points as illustrated in figure 5.4. The perspective projection gives the most realistic impression of depth, although it is not possible to know the exact depth information from a single image. A perspective projection produces a similar view to the way the human eye perceives its environment. Remark: When you close one of your eyes fully and try to touch something around you, you will see that you are not as accurate as when you touch the same thing when both of your eyes are open. In other words, you could not touch the object in a relaxed and comfortable manner when one of your eyes is closed. This shows that perspective projection really gives a realistic but not perfect impression of depth. However, when you open both of your eyes, you know the exact depth of the point. This is called Stereopsis in human sense of depth.

v)Projective Transformation

A planar projective transformation or a homography is a transformation on homogeneous 3-vectors represented by a nonsingular 3x3 matrix H such that 풙′ = 푯풙. The matrix H can be changed by multiplying it by a nonzero scale factor without altering the projective transformation. Hence, H is called a homogeneous matrix since only the ratios of the matrix elements are important. There are 8 independent ratios so homographies have 8 degrees of freedom. None of the invariants of affine transformation is valid for homographies. However, as it is mentioned at the beginning of this chapter, if three points are on the same line in one image, they will also be on the same line when they are mapped to another image. A projective transformation can be written as

푨 풕 푯 = where V=(푽 , 푽 ). 퐕 풗 ퟏ ퟐ An important difference between projective transformations and affinities is V vector which is the source of nonlinearities of projective transformations. Besides, as opposed to the affinities, the scalings included in "푨" vary depending on the position on the image. Similarly, orientation of the transformed line also depends on the position and orientation of the source line.

5.1.2. Situations in which solving a homography arises

There are many situations where the use of homographies is required. In this part, the applications which use homographies are discussed [13].

i)Camera Calibration

Camera calibration is the key step in many vision applications as it lets the systems to determine the relation between what appears on the image and where it is located in 3D world. In order to compensate for the undesired features of the lens such as radial distortions, camera calibration matrix must be known. Two important works of finding camera calibration matrix using homography estimation are [15] and [16]. In these works, the images of the same planar pattern such as checker boards are taken from different perspectives and a homography is estimated between those images to find out calibration matrix.

18

ii) 3D Reconstruction and Visual Metrology

3D reconstruction is a problem in computer vision where the goal is to obtain the scene configurations and camera positions from images of the scene. In medical imaging, multiple images of some body parts are taken and 3D model of that part is analyzed. Additionally, the distances between the objects and the size of the objects are estimated by utilizing homographies in visual metrology.

iii) Stereo Vision

Two cameras which are separated by a distance take the pictures of the same scene. Images are shifted over top of each other to find the parts that match. The shifted amount is called the disparity. A key step is to find out the point correspondences in the images, and these points are searched across a line called epipolar line. Rectifying the homographies between the images allows to make the epipolar lines axis-aligned and parallel, thus makes the search of corresponding points very efficient [13].

Some more applications can be added to the ones mentioned above. The homography between two views plays an important role in the geometry of multiple views. Homography is also used in tracking applications using multiple cameras and/or using one camera with multiple views of the scene, and also it is used to build projector-camera systems. Homography relation can be used between two views to obtain the transformation between planes. Even when the target is partially or fully occluded by an unknown object, the tracker can follow the target as long as it is visible from another view [13]. No complicated inference scheme is used and no 3D information is recovered explicitly [17]. Additionally, homographies are used for military applications such that they are used to obtain the altitude map of an unknown environment by the help of photos taken by airplanes so the risks to the soldiers can be eliminated in advance.

5.1.3. How to find the homography?

Finding the homography between two images is a must in order to construct the control law in this project. The ways of finding homography are analyzed in two subsections. In the first subsection, the answer to the question "How can the homography be found in a simulation environment?" is provided. In the second subsection, the method of estimating the homography from two real images for real experiments is explained in details.

*5.1.3.1. Theory of Homography and Homography in Simulation Environments

In this project, the aim is to bring the current camera frame ℱ to the target(reference) camera frame ℱ∗. It is supposed that only the images 픗∗ and 픗 of the scenes at the target position and at the current position respectively are available to us. This is illustrated in Figure 5.5.

19

Figure 5.5 Illustration of the configuration and Homography between two images of a plane

Let P be a point in 3D space and its coordinates are represented by 훘∗ = [푋∗, 푌∗, 푍∗]푇 in the reference frame ℱ∗. 훘∗ is mapped to a virtual plane which is perpendicular to the optical axis and 휆(focal length) much away from the center of projection 풪∗. Then, its mapped coordinates are denoted by 풎∗ = [푢∗, 푣∗, 휆]푇 with respect to the reference camera frame. Thus, the relationship between 훘∗ and 풎∗comes out to be 풎∗ = 휆 훘∗. After that, 풎∗ is 푍∗ projected onto the reference image plane 픗∗ as 풑∗ = [푟∗, 푐∗, 1] which has the pixel coordinates 푟∗ and 푐∗ by the help of intrinsic camera matrix such that

훼푥 푠 푥표 ∗ ∗ 풑 = 푲풎 푤푕푒푟푒 푲 = 0 훼푦 푦표 . 0 0 1

In the intrinsic camera matrix, 훼푥 and 훼푦 are the focal lengths of the camera in terms of pixel dimensions in the x and y directions respectively. 푥표 and 푦표 are the coordinates of the principal point in pixels in the image and 푠 is the skew parameter as explained in perspective projection section.

The 3D point P is represented by 훘 = [푋, 푌, 푍]푇 relative to the current camera coordinate frame ℱ. If the same procedure is applied to the point P but this time with respect to the current camera frame, the following equations are obtained:

풎 = 휆 훘 where 풎 = [푢, 푣, 휆]푇 푍 and then m is projected onto the current image plane as point 풑 = [푟, 푐, 1]푇 by the help of 풑 = 푲풎.

The rotation matrix and the translation vector between the frames ℱ∗and ℱ are 푹 ∈ 푆푂 3 and 풄 ∈ ℜ3 respectively. Besides, if the point P is supposed to belong to the plane 휋 ∗ 푇 and 풏 = [푛푥 , 푛푦 , 푛푧 ] is the normal to the plane 휋 expressed relative to the reference camera frame and 푑∗ is the distance between the plane 휋 and the origin of the reference plane, then

20 the relation between 풑∗ and 풑 is defined by a projective transformation H in such a way that 풑 = 푯풑∗. A homography H can be related to the camera motion as seen in the equation (5.4).

풏∗푇 푯 = 푲푹 푰 + 풄 푲−ퟏ (5.4) 푑∗

In the simulations, there is no real robot which travels through the works space and takes the images of the scene so no real images are available in the computer. Therefore, the initial and the target positions and orientations of the robot must be known by us in order to emulate the real motion of the robot in simulation environment. With that knowledge, the rotation matrix and the translation vector between the current frame and the target frame can be found. With the knowledge of the intrinsic camera matrix [18], the homography can be computed by equation (5.4). For 풏∗ and 푑∗, some arbitrary but appropriate values can be tried by inspection. Although they have effects on the performance, they do not affect the convergence of the system at all. Thus, plugging all of these into the equation (5.4), a 3x3 homography is obtained and its elements can be used in the determination of the control signal.

*5.1.3.2. Homography estimation from two real images

In the real experiments, we have the image of the scene taken at the desired position as a reference image and the current images taken during the robot's motion. This means that we have nothing else as additional information other than two images(current one and the reference one), so the rotation matrix and the translation vector are not known a priori. Therefore, all required information must be extracted from the images in order to find out the control signal. In order to do so, two steps must be completed.

STEP 1: First, features that can be utilized in order to find out reliable matchings between the views of the scene must be extracted from the images. There are several methods in the literature to find features in the images such as Harris corner detector, canny edges, entropy operator, SIFT etc. If the features detected are highly distinctive and invariant to image scaling and rotation, it is going to allow for more robust estimation of the homography. Among several, the Scale Invariant Feature Transform (SIFT) which is an algorithm in computer vision to detect and describe local features in images is employed in this project. SIFT and most common algorithms search for points as image features, while lines and conics may also be utilized as image features by other algorithms. There are four main cascaded steps to determine the set of image features in SIFT algorithm. In [19] and [20], detailed information about this algorithm can be found.

1. Scale-space extrema detection: The first stage of computation searches over all scales and image locations, i.e., the first stage of keypoint detection is to identify locations and scales which can be used under various views of the same scene. It is implemented efficiently by using a difference-of-Gaussian function to identify potential interest points that are invariant to scale and orientation. The image is convolved with Gaussian filters at

21 different scales, and then the difference of successive Gaussian-blurred images is taken. Specifically, difference-of-Gaussian image is given by

D x, y, σ = L x, y, kiσ − L x, y, kjσ where L x, y, kσ is the convolution of the original image I(x, y) with Gaussian-blur G x, y, kσ at a scale kσ such that

L x, y, kσ = G x, y, kσ ∗ I(x, y).

Thus, a difference-of-Gaussian image between scales kiσ and kjσ is just the difference of the

Gaussian-blurred images at scales kiσ and kjσ . The image is first convolved with Gaussian- blurs at different scales. The convolved images are grouped by octave (an octave corresponds to doubling the value of σ), and the value of k is selected so a fixed number of convolved images per octave are obtained. Then, the Difference-of-Gaussian images are taken from adjacent Gaussian-blurred images per octave. An illustration of difference-of-Gaussian is given in figure 5.6[19].

Figure 5.6 Illustration of difference of Gaussian

2. Keypoint localization: Once difference-of-Gaussian images have been obtained, keypoints are then taken as maxima/minima of the Difference of Gaussians (DoG) images across scales[19], [21]. This is done by comparing each pixel in the difference-of-Gaussian images to its eight neighbors at the same scale and nine corresponding neighboring pixels in each of the neighboring scales. Figure 5.7 [19] shows the search region.

22

Figure 5.7 Search region to find a keypoint candidate

The pixel marked with an X is investigated whether it could be a keypoint candidate or not. There are 8 neighbors around it, all of which are at the same scale of X pixel and 9 pixels above at a higher scale and 9 pixels below at a lower scale shown by green circles. If the X pixel has the minimum or maximum intensity value among 26 pixels, then it is included to the list of keypoint candidates. As a result of this procedure, lots of keypoint candidates appear. However, some of them are not stable enough such that they may be located on an edge of the image or may be in a low contrast region so if there is some image noise, it could be hard to distinguish that pixel from its neighbors so it may not be recognized as a keypoint anymore. There are some algorithms developed for discarding low contrast candidate keypoints and eliminating edge repsonses [21]. These subjects are not explained here for the sake of keeping the main subject in the bounds. After elimination of the inappropriate keypoint candidates, there is one more thing left to do, which is the determination of the keypoint location. For each candidate keypoint, interpolation of nearby data is used to accurately determine its position. Calculating the interpolated location of the extremum improves matching and stability when compared to locating each keypoint at the location and scale of the candidate keypoint. This can be explained by a simple example: Assume that there are two pixels nearby and one pixel is totally white and the other one is in another color and besides, the white pixel is considered as a keypoint. Normally, the coordinates of the center of the white pixel should be provided as keypoint coordinates. However, as you may guess, the point at the middle of the line that combines the center of the white pixel and the center of the other pixel has a higher contrast because it is on the transition region and it is thus easier to detect this point at other images of the same scene taken from different perspectives. Therefore, the interpolations are carried out to find more suitable coordinates. The interpolations are done using the quadratic Taylor expansion of the Difference-of- Gaussian scale-space function with the candidate keypoint as the origin. Additionally, the softwares used to find the keypoints generally return some double numbers for the coordinates of the keypoints rather than integer numbers which indicate the pixel location in a matrix. This is simply because of this kind of interpolations.

3. Orientation assignment: Orientations are assigned to each pixel around the keypoint location based on local image gradient directions. Firstly, the Gaussian-smoothed

23 image L x, y, σ at the keypoint's scale σ is taken so that all computations are performed in a scale-invariant manner. For an image sample L x, y at the scale of σ, the gradient magnitude, 푚(푥, 푦), and the orientation, 휃(푥, 푦), are computed using pixel differences [21]:

푚 푥, 푦 = (퐿 푥 + 1, 푦 − 퐿 푥 − 1, 푦 )2 + (퐿 푥, 푦 + 1 − 퐿 푥, 푦 − 1 )2

퐿 푥, 푦 + 1 − 퐿 푥, 푦 − 1 휃 푥, 푦 = tan−1 퐿 푥 + 1, 푦 − 퐿 푥 − 1, 푦

The magnitude and direction calculations for the gradient are repeated for every pixel in a neighboring region around the keypoint in the Gaussian-blurred image L x, y, σ . The result of this procedure is illustrated in figure 5.8 for 8x8 array of pixels in the neighborhood of the keypoint location.

Figure 5.8 Gradients of pixels around the keypoint location

An orientation histogram with 36 bins covering 360 degree range of orientation is formed, with each bin covering 10 degrees. Each sample in the neighboring window added to a histogram bin is weighted by its gradient magnitude and by a Gaussian-weighted circular window with σ that is 1.5 times that of the scale of the keypoint [21]. The peaks in this histogram correspond to dominant orientations. Once the histogram is filled, the orientations corresponding to the highest peak and local peaks that are within 80% of the highest peak are assigned to the keypoint. In the case of multiple orientations being assigned, an additional keypoint is created having the same location and scale as the original keypoint for each additional orientation [19].

4. Keypoint descriptor: Previous steps found keypoint locations at particular scales and assigned orientations to them and this ensures invariance to image location, scale and rotation. At this step, a descriptor vector for each keypoint is computed such that the descriptor is highly distinctive and partially invariant to the remaining variations such as

24 illumination. Generally, magnitude and orientation values of samples in a 16x16 region around the keypoint are calculated. Then, for each 4x4 subregion of the original neighborhood region, the samples are accumulated into orientation histograms with 8 bins corresponding to 8 directions so there are totally (16x16)/(4x4)=16 histograms created. The magnitudes of the gradients are further weighted by a Gaussian function with σ equal to 1.5 times the scale of the keypoint. The descriptor then becomes a vector of all the values of these histograms [21]. 16 histograms with 8 bins are created so there are 16x8=128 entries that must be included in keypoint descriptor vector.

After applying these four steps to both of the images, keypoint descriptors of both images are obtained. Then, the keypoints in both images must be checked whether they are matching with each other or not. In order find the matches, one keypoint is taken from the first image and it is compared with all keypoints of the other image one by one. After that, the second keypoint is picked up from the first image and it is compared with all keypoints of the other image again. This loop continues until all keypoints are compared with eachother. The criterion of whether two key points are accepted as a matched pair or not can be explained like the following: Each keypoint has its feature(descriptor) vector. When two keypoints are compared, the angle between the feature vectors is found by the help of dot product. If that angle is smaller than a threshold, they are accepted as a matched pair.

퐹 1. 퐹 2 = 퐹 1 |퐹 2|cos(훼) where 퐹 1 and 퐹 2 are the feature vectors and 훼 is the angle between them. The smaller the 훼 is, the more similar the feature vectors are. When 훼 gets below a certain threshold, keypoints are assumed to match with each other. An example of matched points by SIFT program is illustrated in Figure 5.9.

50

100

150

200

250

300

350

100 200 300 400 500 600 700

Figure 5.9 An example of point matches

There are 1021 and 579 keypoints found in the left and right images respectively and 19 of them are matched.

25

STEP 2: After matched points are found, it is now doable to determine the homography between two images. One of the widely used methods for homography estimation is Direct Linear Transformation algorithm. In order to find a homography between two images, there should be at least 4 matched point pairs. As stated in projective transformation section, a homography has 9 elements and only the ratios of the elements are important so a homography has 8 degrees of freedom. One matched pair of keypoints constraints 2 degrees of freedom so 4 matched pairs are eventually necessary to define the homography fully. The ′ homography relates one point 푥푖 in one image to another point 푥푖 which is in the other image ′ ′ in such a way that 풙풊 = 푯풙풊. In this representation, the homogeneous 3-vectors 풙풊 and 푯풙풊 may not be equal in magnitude since H is defined up to a scale but they have the same ′ direction. In order to ease the analysis, it is more appropriate to use 풙풊 퐱 푯풙풊 = ퟎ .

푡푕 풋푇 If the 푗 row of the matrix H is represented by 풉 , then 푯풙풊 can be written as

ퟏ푇 풉 풙풊 ퟐ푇 푯풙풊 = 풉 풙풊 . ퟑ푇 풉 풙풊

′ ′ ′ ′ 푇 If 풙풊 = (푥푖 , 푦푖 , 푤푖 ) , then the cross product becomes

′ ퟑ푻 ′ ퟐ푻 푦푖 풉 풙풊 − 푤푖 풉 풙풊 ′ ′ ퟏ푻 ′ ퟑ푻 풙풊 퐱 푯풙풊 = 푤푖 풉 풙풊 − 푥푖 풉 풙풊 = ퟎ. ′ ퟐ푻 ′ ퟏ푻 푥푖 풉 풙풊 − 푦푖 풉 풙풊

풋푇 풋푇 푻 풋 Since 풉 풙풊 is 1x1, it is equal to its transpose. Therefore, 풉 풙풊 = 풙풊 풉 for j=1,2,3 and a set of three equations can be obtained and represented by the equation (5.5).

푕 1 푕 2 푕3 푻 ′ 푻 ′ 푻 ퟎ −푤푖 풙풊 푦푖 풙풊 푕4 ′ 푻 푻 ′ 푻 푤푖 풙풊 ퟎ −푥푖 풙풊 푕5 = ퟎ (5.5) ′ 푻 ′ 푻 푻 푕 −푦푖 풙풊 푥푖 풙풊 ퟎ 6 푕7 푕8 푕9

Now the equations are in the form of 푨풊풉 = ퟎ, where 푨풊 is a 3X9 matrix and h is a 9-vector consisting of the elements of the homography elements. Therefore if h is found, then H is also determined.

26

푕 1 푕 2 푕3 푕4 푕1 푕2 푕3 풉 = 푕5 , 푯 = 푕4 푕5 푕6 (5.6) 푕 푕 푕 푕 6 7 8 9 푕7 푕8 푕9

Even though there are three equations in (5.5), the third row is dependent on the other two ′ ′ rows such that third row is the sum of 푥푖 times the first row and 푦푖 times the second row.

′ 푻 ′ ′ 푻 ′ ′ 푻 푥푖 times the first row : ퟎ −푥푖 푤푖 풙풊 푥푖 푦푖 풙풊

′ ′ ′ 푻 푻 ′ ′ 푻 푦푖 times the second row: 푦푖 푤푖 풙풊 ퟎ −푦푖 푥푖 풙풊

′ ′ 푻 ′ ′ 푻 푻 Sum: 푦푖 푤푖 풙풊 −푥푖 푤푖 풙풊 ퟎ

′ If −푤푖 is factored out from the sum, the third row of equation (5.5) is obtained. Therefore, equation (5.5) can be reduced to equation (5.7).

푕 1 푕 2 푕3 푕 ퟎ푻 −푤′ 풙푻 푦′ 풙푻 4 푨 풉 = 푖 풊 푖 풊 푕 = ퟎ (5.7) 풊 푤′ 풙푻 푻 ′ 푻 5 푖 풊 ퟎ −푥푖 풙풊 푕 6 푕7 푕8 푕9

The solution of equation (5.7) gives the homography. The summary of the Direct Linear Transformation algorithm [10] is as follows.

′ i) For each matched pair of points 푥푖 ↔ 푥푖 , find 2x9 푨풊 matrix. ii) Stack all n many 푨풊 matrices for n correspondences in 2nx9 A matrix. iii) Obtain the singular value decomposition of A. The unit singular vector corresponding to the smallest singular value is the solution h. If 푨 = 푼푫푽푻, then h is the last column of V. iv) Then using equation (5.6), H can be constructed from h.

This algorithm is implemented in Matlab and it gives the following 3x3 homography for the images in figure 5.9

−0.0009 −0.0021 0.6174 푯 = 0.0030 −0.0013 −0.7867 . 0.0000 −0.0000 −0.0020 27

The correctness of this homography matrix can be verified by the following way

* Pick up a specific point in the left image and find its coordinates(풙풊),

′ * Find that specific point in the right image and also find its coordinates(풙풊),

* If they are related by the obtained homography, then it means that software is working correctly.

Let's examine the correctness of the homography this way. Find the coordinates of the upper right corner of the letter "I" in the word "BASMATI" in the left image. Then this time, find again the upper right corner of the letter "I" in the word "BASMATI" in the right image. This is illustrated in figures 5.10 and 5.11 .

X: 336 Y: 213 Index: 65 RGB: 0.259, 0.259, 0.259

Figure 5.10 A specific point in the left image

X: 97 Y: 34 Index: 113 RGB: 0.471, 0.471, 0.471

Figure 5.11 Same specific point in the right image

28

푇 The coordinates of that specific point in the left image are 풙풊 = [336, 213, 1] and in the ′ 푇 right image 풙풊 = [97, 34,1] and 97 −0.0009 −0.0021 0.6174 336 97.0847 ′ 풙풊 = 34 = 푯풙풊 ≅ 0.0030 −0.0013 −0.7867 213 = 34.4144 1 0.0000 −0.0000 −0.0020 1 1 so this indicates that homography is estimated in a true manner. Please note that the elements of H are rounded off here, so the hand calculation of 푯풙풊 is not exactly same as the result of Matlab.

5.2. Motion Model of a Mobile Robot

The system that is to be controlled is a mobile robot with nonholonomic motion constraints. Nonholonomic constraints occur due to the presence of the wheels such that the mobile robot can not move sideways as shown in figure 5.12.

Figure 5.12 Nonholonomic constraint for a mobile robot

The nonholonomic constraints allow for rolling but not slipping. In general, a nonholonomic mechanical system can not move arbitrarily in its configuration space. Holonomic constraints can be written as equations independent of 푞 , like 푓 푞, 푡 = 0, where 푞 stands for generalized coordinates. However, nonholonomic constraints can not be written only in terms of generalized coordinates as they also depend on the time derivative of the generalized coordinates. This means that nonholonomic constraints are not integrable constraints. A nonholonomic mobile robot model can be represented by the following state and output equations:

풙 = 풇 풙, 풖 (5.8)

29

풚 = 풉 풙 5.9 where 풙 denotes the state vector, 풖 denotes the input vector and 풚 is the output vector. Inputs consist of forward velocity(푣) and angular velocity(푤).

The coordinate system used is shown in figure 5.13.

Figure 5.13 Coordinate System

There are two coordinate frames that should be specified in order to remove possible ambiguities in minds. One of them is the coordinate frame attached to the mobile robot and the other one is the world coordinate frame. When the robot reaches its target pose, the coordinate frame attached to the robot may be different than the world coordinate frame. However, the world coordinate frame can be chosen to be fully coincident with the robot coordinate frame at its target pose without loss of generality.

The state vector can be defined as 풙 = [푥 푧 휙]푇 since the robot has movements on x-z plane. 푥 and 푧 are for the position of the mobile robot with respect to the world coordinate frame. 휙 represents the orientation of the robot and it is the angle between the z axis of the coordinate frame attached to the mobile robot and the z axis of the world frame. According to the information provided above, it can be said without loss of generality that when the mobile robot reaches the target position, all state variables become zero since the world coordinate frame is coincident with the robot coordinate frame at the target pose. Now, state equations can be written explicitly as

푥 −sin(휙) 0 푧 = cos(휙) 푣 + 0 푤 (5.10) 휙 0 1

30

In order to define the output vector, a homography between two images(current and target images) must be found since outputs of the system are chosen among homography elements. A homography is related to camera motion as

풏∗푇 푯 = 푲푹 푰 + 풄 푲−ퟏ. (5.11) 푑∗

푹 and 풄 are the rotation matrix and the translation vector between the current and target poses. 푲 is the internal camera calibration matrix. In practice, there are some assumptions made such that robot moves on a planar surface without irregularities, the principal point coordinates are (0,0) and there is no skew of pixels.

The rotation matrix can be derived by conveying the origin of the target frame to the origin of the current frame and examining the relationship between those two coordinate frames.

Figure 5.14 Target frame(x, y, z) and Current Frame(x′ , y′ , z′ )

The target frame and the current frame are shown in figure 5.14. y and y′ axes are not shown in the figure because they are orthogonal to the page plane according to the right hand rule. In order for the current frame to be coincident with the target frame, it must rotate – 휙 degrees in clockwise direction(according to the convention used, counterclockwise rotations are positive as shown in figure 5.13). Thus, following equations define the relationship between the current and target coordinate frames when their origins are coincident.

푥 = −푧′ sin −휙 + 푥′ cos −휙 = 푥′ cos 휙 + 푧′ sin 휙

푦 = 푦′

푧 = 푧′ cos −휙 + 푥′ sin −휙 = −푥′ sin 휙 + 푧′ cos 휙

These equations can be put into matrix representation and the rotation matrix, 푹, can be obtained as equation (5.12).

31

푥 cos(휙) 0 sin(휙) 푥′ 푥′ 푦 = 0 1 0 푦′ = 푹 푦′ (5.12) 푧 −sin(휙) 0 cos(휙) 푧′ 푧′ The translation vector between the target and current frames is represented by 푥 풄 = 0 . (5.13) 푧 y coordinate is always zero because robot moves on x-z plane. Using equation (5.11), homography between the target and current images can be obtained as

푕11 푕12 푕13 푯 = 푕21 푕22 푕23 푕31 푕32 푕33 where

푛푥 푕11 = cos 휙 + [푥푐표푠 휙 + 푧푠푖푛(휙)] 푑휋

훼푥 푛푦 푕12 = [푥푐표푠 휙 + 푧푠푖푛(휙)] 훼푦 푑휋

푛푧 푕13 = αx[sin 휙 + 푥푐표푠 휙 + 푧푠푖푛 휙 ] 푑휋

푕21 = 0

푕22 = 1

푕23 = 0

푛푥 1 푕31 = [−sin 휙 + (−푥푠푖푛 휙 + 푧푐표푠 휙 ) ] 푑휋 훼푥

푛푦 1 푕32 = (−푥푠푖푛 휙 + 푧푐표푠 휙 ) 푑휋 훼푦

푛푧 푕33 = cos 휙 + −푥푠푖푛 휙 + 푧푐표푠 휙 . 푑휋

푕21 ,푕22 and 푕23 do not give any information because they are already constant numbers due to planar motion constraint. Elements 푕31 and 푕32 are discarded since their magnitudes are low due to that 훼푥 and 훼푦 take place at the denominator and they are more sensitive to noise when compared with other homography elements. In monocular systems, planes in front of the camera with dominant 푛푧 are detected more easily [9], so 푕13 and 푕33 are chosen among the rest of the elements since they are dependent on 푛푧. Therefore, output vector is defined as 푕 풚 = 13 . 푕33

32

5.3. Input-Output Linearization and Control Law

The approach employed here navigates the mobile robot by controlling the elements of the homography. This means that the problem of visual servo control is converted into a tracking problem, i.e., actual elements of the homography should follow the desired trajectories of the homography elements during the motion. The geometric model of this system is nonlinear relating inputs and outputs. A linearization is carried out by differentiating the homography elements until the control inputs can be obtained. Before going on with input-output linearization and derivation of the control law, we can show that the system is controllable.

The state dynamics of the mobile robot allows the system to be written in an affine format as

푚 ′ 풙 = 푓 풙 + 푔푖(푥)푢푖 where uis are the inputs. (5.14) 푖=1

The state dynamics of the mobile robot is given by equation (5.10). If the equations (5.14) and (5.10) are equated, the following result is obtained.

푓 푥 = 0 ,

푚 = 2 ,

−sin(휙) 푢1 = 푣 푎푛푑 품ퟏ = cos(휙) , 0 0 푢2 = 푤 푎푛푑 품ퟐ = 0 . 1 Since 푚 = 2, the accessibility distribution(C) becomes

푪 = 품ퟏ, 품ퟐ, 품ퟏ, 품ퟐ .

푔1, 푔2 is the Lie bracket operation and its definition is the following: 푥 흏품ퟏ 흏품ퟐ 품ퟏ, 품ퟐ ≡ 품ퟐ − 품ퟏ where 풙 = 푧 is the state vector. 흏풙 흏풙 휙

Accessibility distribution is obtained as:

0 0 −푐표푠(휙) 0 −푐표푠(휙) 흏품ퟏ 품 = 0 0 −푠푖푛(휙) 0 = −푠푖푛(휙) 흏풙 ퟐ 0 0 0 1 0 0 0 0 −sin(휙) 0 흏품ퟐ 품 = 0 0 0 cos(휙) = 0 흏풙 ퟏ 0 0 0 0 0

33

−푐표푠(휙) 품ퟏ, 품ퟐ = −푠푖푛(휙) 0 −sin(휙) 0 −푐표푠(휙) 푪 = cos(휙) 0 −푠푖푛(휙) . 0 1 0 Since rank(C) is equal to 3 which is the number of states, the system is controllable [22].

5.3.1. Input-Output Linearization

Linearization is a common way of designing nonlinear control systems. In this section, outputs will be differentiated until they become linearly dependent on the inputs. Please note that the normal vector(n) of the plane which creates the homography and the distance(d) between that plane and the origin of the target frame are invariant and time derivatives of them are also zero.

Time derivative of 푕13 :

푛푧 푕13 = αx[sin 휙 + 푥푐표푠 휙 + 푧푠푖푛 휙 ] 푑휋

푛푧 푕13 = αx[cos 휙 휙 + 푥 푐표푠 휙 + 푧 푠푖푛 휙 − 푥푠푖푛 휙 휙 + 푧푐표푠(휙)휙 ] 푑휋

By the help of state equations, the equation above can be simplified.

푥 = − sin 휙 푣 ==> 푥 cos(휙) = − sin 휙 cos(휙) 푣

푧 = cos 휙 푣 ==> 푧 sin 휙 = sin 휙 cos(휙) 푣

푥 cos 휙 + 푧 sin 휙 = − sin 휙 cos(휙) 푣 + sin 휙 cos(휙) 푣 = 0

Therefore,

푛푧 푕13 = αx cos 휙 휙 + −푥푠푖푛 휙 휙 + 푧푐표푠 휙 휙 푑휋

푛푧 = αx휙 cos 휙 + −푥푠푖푛 휙 + 푧푐표푠 휙 = αxh33푤 since 푤 = 휙 . 푑휋

First time derivative of 푕13 becomes linearly dependent on the inputs so the relative degree of this output is 1, and there is no need for further differentiations of 푕13 .

Time derivative of 푕33 :

푛푧 푕33 = cos 휙 + −푥푠푖푛 휙 + 푧푐표푠 휙 푑휋

푛푧 푕33 = −sin 휙 휙 + −푥 푠푖푛 휙 + 푧 푐표푠 휙 − 푥푐표푠 휙 휙 − 푧푠푖푛(휙)휙 푑휋

By the help of state equations, the equation above can be simplified.

34

푥 = − sin 휙 푣 ==> −푥 sin(휙) = sin2 휙 푣

푧 = cos 휙 푣 ==> z cos ϕ = cos2(휙)v

−푥 sin 휙 + 푧 cos 휙 = sin2 휙 푣 + cos2(휙)푣 = 푣

Therefore,

푛 푛 푛 푕 푧 푧 푧 13 푕33 = 푣 − 푤 sin 휙 + 푥푐표푠 휙 + 푧푠푖푛 휙 = 푣 − 푤. 푑휋 푑휋 푑휋 훼푥

Also, first time derivative of 푕33 becomes linearly dependent on the inputs so relative degree of this output is 1, too, and there is no need for further differentiations of 푕33 .

5.3.2. Control Law

After taking the first time derivatives of the outputs, a linear relationship is obtained between outputs and inputs. This relationship can be shown by matrix representation, and decoupling matrix(푳) can be obtained as

0 훼푥 푕33 푕13 푣 푣 = 푛푧 푕13 = 푳 . (5.15) 푤 푤 푕33 − 푑휋 훼푥

The error system should be in such a form that both the tracking error and the derivative of the tracking error must converge to zero. To illustrate, an error system differential equation of a tracking problem should be 푒 + 푘푒 = 0, so it has a left half plane pole for positive 푘 values and thus the error and the time derivative of the error decay to zero exponentially. In order to achieve this task, following arrangements are made(Superscript ′푑′ stands for 'desired').

푒 푕푑 − 푕 푒 푕 푑 − 푕 푘 0 풆 = 1 = 13 13 , 풆 = 1 = 13 13 푎푛푑 풌 = 13 (5.16) 푒2 푑 푒 푑 0 푘 푕33 − 푕33 2 푕33 − 푕33 33

푑 푑 푕 − 푕13 푘 0 푕 − 푕13 0 13 + 13 13 = (5.17) 푑 0 푘 푑 푕33 − 푕33 33 푕33 − 푕33 0 After some manipulations on equation (5.17), equation (5.18) is obtained.

푕 푕 푑 + 푘 (푕푑 − 푕 ) 13 = 13 13 13 13 (5.18) 푑 푑 푕33 푕33 + 푘33 (푕33 − 푕33 )

푘13 and 푘33 are positive control gains. Equating right hand sides of the equations (5.15) and (5.18) allows for the solution of the control signal.

푣 푕 푑 + 푘 (푕푑 − 푕 ) 푳 = 13 13 13 13 (5.19) 푤 푑 푑 푕33 + 푘33 (푕33 − 푕33 )

35

Multiplying both the left and the right hand sides of the equation (5.19) by 푳−1 gives the control signal.

푕 푑 푑 13 휋 휋 푣 푕 푑 + 푘 (푕푑 − 푕 ) 훼2푕 푛 푛 푕 푑 + 푘 (푕푑 − 푕 ) = 푳−1 13 13 13 13 = 푥 33 푧 푧 13 13 13 13 (5.20) 푤 푕 푑 + 푘 (푕푑 − 푕 ) 1 푕 푑 + 푘 (푕푑 − 푕 ) 33 33 33 33 0 33 33 33 33 훼푥 푕33

In order to have a nonsingular control signal, decoupling matrix must be invertible such that det(푳)≠ 0. In order to investigate the situations that can create nonsingularity, determinant of the decoupling matrix should be analyzed:

푛푧 det 푳 = −훼푥 푕33 (5.21) 푑휋

Here, 훼푥 denotes the focal length in pixel dimensions in x direction, so it is not zero and since the plane that generates the homography is at a finite distance from the target position,

푑휋 ≠ ∞. Also, the plane must be seen by the camera and this makes 푛푧 ≠ 0. Then, there is one possibility left which can make the determinant of the decoupling matrix zero and that possibility is 푕33 = 0. 푕33 is given by

푛푧 푕33 = cos 휙 + −푥푠푖푛 휙 + 푧푐표푠 휙 . 푑휋

It should be shown that 푕33 never becomes zero in order to hamper singularity in control law. The target is in front of the mobile robot so 푧 < 0 until the moment robot reaches the target pose according to the assigned target coordinate frame. At the moment robot is at the desired pose, 푥, 푧 and 휙 become zero and 푕33 becomes one. There are some constraints on the orientation of the robot. In order for the robot to see the target scene fully or partially, − 휋 < 휙 < 휋 must be satisfied. Otherwise, robot would see a scene which is not related to 2 2 the target scene at all, and it would not be possible to construct a meaningful control signal. − 휋 < 휙 < 휋 constraint ensures that 푐표푠 휙 > 0. Besides, 푛 must be negative with respect 2 2 푧 to the target coordinate frame since the plane that produces the homography is visible for the camera. Therefore, 푧푐표푠 휙 푛푧 becomes greater than zero and then it follows that if 푑휋 푛푧 푛푧 cos 휙 + 푧푐표푠 휙 > | − 푥푠푖푛(휙) |, then 푕33 becomes greater than zero(푕33 > 0). 푑휋 푑휋 This inequality imposes that the lateral distance to compensate is smaller than the depth error. In other words, this inequality holds if the depth error is higher than the lateral error due to the camera field of view constraint such that 푧푐표푠 휙 > |푥푠푖푛(휙)|. As a result, it is concluded that the determinant of the decoupling matrix is never zero in the work space and control signal can be constructed without facing any singularity.

5.3.3. Desired Trajectories of the Homography Elements

Control law needs the definition of the desired trajectories of the homography elements as can be seen by equation (5.20). The motion performed by the robot is obviously dependent on

36

푑 푑 the selection of the desired trajectories of the homography elements(푕13 , 푕33). When the 푥 0 robot reaches the target pose, 푧 = 0 and therefore, the homography becomes an identity 휙 0 matrix. This dictates that final values of 푕13 and 푕33 must be 0 and 1 respectively. There are several proposals for the desired trajectories of the homography elements in the literature. Two of the most important ones are offered by [9] and [23]. Suggestion of [23] for the desired trajectories is taken into consideration in this project.

*Desired Trajectories:

The desired trajectory of 푕13 is selected in such a way that it corrects the lateral and orientation errors simultaneously and the chosen desired trajectory for 푕33 is a sinusoid which is a smooth function that converges to 1 assuring depth error is removed.

Desired Trajectory of 푕13: There is a condition regarding the initial configuration of the robot that should be checked before deciding about the desired trajectory of 푕13 . That condition is related to the current and target epipoles at the starting time of the motion. The sign of the multiplication of x coordinates of the current epipole(ecx ) and the target epipole(etx ) at the beginning of the motion must be examined. Please refer to Appendix A for information about the epipolar geometry and its relationship with the mobile robot navigation. Let's analyze the desired trajectory of 푕13 in two cases.

Case 1: If ecx 0 . etx 0 ≤ 0, desired trajectory can be defined in two steps.

푕푑 0 ≤ 푡 ≤ 푇 = 푕 (0) 휓(푡) 13 2 13 휓(0)

푑 푕13 푇2 < 푡 < ∞ = 0.

Case 2: If ecx 0 . etx 0 > 0, desired trajectory is defined in three steps. First of these steps drives the robot to a proper orientation and thereafter, a smooth motion towards the target can be realized. Second and third steps can be defined alike the first and second steps of the first case.

푑 푑 푑 푕13 0 +푕13 (푇1) 푕13 0 −푕13 (푇1) 휋푡 푕13 0 ≤ 푡 ≤ 푇1 = + cos( ) 2 2 푇1

푑 휓(푡) 푕13 푇1 < 푡 ≤ 푇2 = 푕13 (푇1) 휓(푇1)

푕푑 푇 < 푡 < ∞ = 0 where 푕푑 푇 = − 2 푕 0 and 푇 < 푇 . 13 2 13 1 3 13 1 2

First step is an intermediate step that should be completed in 푇1. 휓 is the angle of the straight line connecting the current position of the robot with the target position defined in target 푑 frame as seen in figure 5.13. 푕13 is proposed in relation with 휓 since it is desired to correct the lateral and orientation errors altogether.

37

Desired Trajectory of 푕33 : The desired trajectory of 푕33 is realized in two steps.

푑 푕33 0 +1 푕33 0 −1 휋푡 푕33 0 ≤ 푡 ≤ 푇2 = + cos( ) 2 2 푇2

푑 푕33 푇2 < 푡 < ∞ = 1.

Desired homography values should be reached in 푇2. The desired trajectories are dependent on the homography and initial position. As the robot moves, control law makes the realized homography elements track the desired trajectories defined above guaranteeing the convergence to the target.

5.4. Stability Analysis

A candidate Lyapunov function for the error system is chosen as

푒 푕푑 − 푕 푉 풙, 푡 = 1 풆 2 푤푕푒푟푒 풆 = 1 = 13 13 . (5.22) 2 푒 푑 2 푕33 − 푕33

The Lyapunov candidate is positive definite except the origin of the error space. Now, it must be proven that the time derivative of the Lyapunov function is zero at the origin of the error space and negative definite elsewhere. Following definitions are made for this analysis.

푕푑 푕 푑 푘 0 푫 = 13 , 푫 = 13 and 풌 = 13 푑 푑 0 푘 푕13 푕33 33

푕 푑 − 푕 푕 푑 푣 푉 풙, 푡 = 풆푻풆 = 풆푻 13 13 = 풆푻 13 − 푳 = 풆푻 푫 − 푳푳−ퟏ(푫 + 풌풆) 푑 푑 푤 푕33 − 푕33 푕33

= 풆푻 푰 − 푳푳−ퟏ 푫 − 풌 풆푻푳푳−ퟏ풆 (5.23)

Equation (5.23) shows that the time derivative of the Lyapunov candidate is negative definite in the error space except the origin so asymptotic stability is guaranteed. Since 푳푳−ퟏ is equal to 2x2 identity matrix in theory, first term of equation (5.23) drops and 푉 풙, 푡 = −풌 풆 2 with a positive definite and diagonal gain matrix 풌 satisfies the asymptotic stability conditions. In practice, the estimation of 푳−ퟏ may not be exact, so 푳푳−ퟏ may not be an exact identity matrix. However, if the estimation of 푳−ퟏ is not too course, asymptotic stability of the system is achieved [3]. Region of the stability is the workspace of the mobile robot with the camera field of view limitations [23].

푑 푑 Now, it has been proven that 푕13 converges to 푕13 and 푕33 converges to 푕33 since 풆 goes to 푑 푑 zero(system is asymptotically stable). After time 푇2, 푕13 becomes 0 and 푕33 becomes 1 as understood from the proposed desired trajectory sets. If figure 5.13 is examined, it is seen that 휓 = − arctan 푥 since 푥 = −휌 sin 휓 and 푧 = 휌 cos 휓 for all quadrants. In order for 푧 푑 푕13 (and so 푕13 ) to converge to zero, 휓 must goes to zero eventually and this is realized when 푥 becomes equal to zero. Therefore, 푥 = 0 is reached at the end of the motion. Now, the final

38 values of the other state variables (z and 휙) must be found. The values of these state variables are found by the help of the homography equations of h13 and h33 .

푛푧 푕13 = αx sin 휙 + 푥푐표푠 휙 + 푧푠푖푛 휙 (5.24) 푑휋

푛푧 푕33 = cos 휙 + −푥푠푖푛 휙 + 푧푐표푠 휙 (5.25) 푑휋

푧 variable is eliminated from equations (5.24) and (5.25) by following the procedure below. i) Multiply equation (5.24) by 푐표푠 휙 , ii) Multiply equation (5.25) by – αx sin 휙 , iii) Add the results of (i) and (ii) side by side.

2 푛푧 푛푧 cos 휙 푕13 = αx sin 휙 cos 휙 + αx푥푐표푠 휙 + αx푧푠푖푛 휙 cos(휙) 푑휋 푑휋

2 푛푧 푛푧 – αx sin 휙 푕33 = – αx sin 휙 cos 휙 + αx푥푠푖푛 휙 – αx푧푠푖푛 휙 푐표푠 휙 푑휋 푑휋

Adding the equations above and plugging the final values of 푕13 and 푕33 into the added equation result in equation (5.26): 푛 sin 휙 = 푥 푧 (5.26) 푑휋

Since 푥 becomes equal to zero at the end of the motion, 휙 must also become zero as understood from equation (5.26). Plugging 푥 = 0 and 휙 = 0 into equation (5.25) shows that 푥 0 푧 = 0. This analysis proves that the only equilibrium state of the system is 푧 = 0 and 휙 0 when the equilibrium state is reached, homography becomes 3x3 identity matrix( 푯 = 푰) which is an indication of that camera sees the target scene and the goal is accomplished.

6-SIMULATIONS

Simulations are carried out in order to show the validity of the proposed approach. Performance of the system with and without noise and calibration errors is investigated. In simulations, the knowledge of the initial and target configurations is enough as a priori. Control algorithm tries to drive the robot from initial configuration to the target configuration. Control loop is illustrated in figure 6.1.

39

Figure 6.1 Diagram of the control loop

The rotation matrix and the translation vector between the current and target configurations can be found since the current and target positions and orientations are known as inputs. Possessing the knowledge of intrinsic camera matrix, the theoretical formula of homography (equation (5.11)) can be used to find out the 3x3 homography matrix between the current and target virtual scenes. Intrinsic camera matrix is formed by using the information in [10] and [18]. The virtual image is assumed to have a 640x480 pixel resolution. The value of the focal length is used as 푓 = 6 푚푚, and its real value is varied to see the effect on the final errors. Besides, the effect of principal coordinates on the final errors is also analyzed. The values of the control gains used are 푘13 = 1 and 푘33 = 1 and 푇1 = 40 푠 and 푇2 = 80 푠. Total time of the simulation is chosen to be 푇푡표푡푎푙 = 100 푠.

The simulations are carried out for several initial configurations and the target configuration of (푥 = 0, 푧 = 0, 휙 = 0°). Since the mobile robot is moving on a horizontal plane (푥 − 푧 푝푙푎푛푒), 푦 coordinate with respect to the robot attached coordinate frame and the world reference frame is zero. Furthermore, the roll and pitch angles do not change in time, but yaw angle(휙) is a variable. The outcomes of the simulations are shown in figures 6.2-6.22.

40 i) Results for initial configuration of (x = −5, z = −15, ϕ = 5°):

Evolution of Lateral Position (X) in time Evolution of Depth (Z) in time Evolution of Orientation () in time 1 0 5

0 0

-1 -5 -5

-2 -10

[deg]

X [m] Z [m]

-3  -15 -10 -4 -20

-5 -25

-6 -15 -30 0 50 100 0 50 100 0 50 100 Time[s] Time[s] Time[s]

Control Output: Linear Velocity(v) vs. Time Control Output: Angular Velocity(w) vs. Time Followed Path: X vs. Z 0.35 1.5 0 0.3 1 0.25 -5 0.5 0.2

0 Z [m]

v [m/s] 0.15 w [deg/s]w -10 -0.5 0.1

0.05 -1 -15 -10 -5 0 5 0 -1.5 0 50 100 0 50 100 X [m] Time[s] Time[s]

Figure 6.2 Evolution of position and orientation parameters and control signals

H vs. Time H vs. Time H vs. Time 11 12 13 2 0.15 1000 Realized H 0.1 13 1.5 500 Desired H 13 0.05 1 0 0

0.5 -0.05 -500 0 50 100 0 50 100 0 50 100 Time[s] Time[s] Time[s] H vs. Time H vs. Time H vs. Time 21 22 23 1 2 1

0.5 1.5 0.5

0 1 0

-0.5 0.5 -0.5

-1 0 -1 0 50 100 0 50 100 0 50 100 Time[s] Time[s] Time[s]

-3 H vs. Time -4 H vs. Time H vs. Time x 10 31 x 10 32 33 4 6 4 Realized H 33 2 4 3 Desired H 33

0 2 2

-2 0 1 0 50 100 0 50 100 0 50 100 Time[s] Time[s] Time[s]

Figure 6.3 Evolution of homography elements

41

Error in Lateral Position (X) in time Error in Depth (Z) in time Error in Orientation () in time 1 0 5

0 0

-1 -5 -5

-2 -10

[m] [m] [deg] -3 -15

-10 -4 -20

-5 -25

-6 -15 -30 0 50 100 0 50 100 0 50 100 Time[s] Time[s] Time[s]

Figure 6.4 Evolution of error in position and orientation parameters ii) Results for initial configuration of (x = −8, z = −20, ϕ = −45°):

Evolution of Lateral Position (X) in time Evolution of Depth (Z) in time Evolution of Orientation () in time 2 0 10

0 0 -5 -10 -2

-10 -20

[deg]

X [m] Z [m]

-4  -30 -15 -6 -40

-8 -20 -50 0 50 100 0 50 100 0 50 100 Time[s] Time[s] Time[s]

Followed Path: X vs. Z Control Output: Linear Velocity(v) vs. Time Control Output: Angular Velocity(w) vs. Time 0 0.5 1.5

0.4 -5 1 0.3

-10 0.5 Z [m]

v [m/s] 0.2 w [deg/s]w

-15 0 0.1

-20 0 -0.5 -10 -5 0 5 0 50 100 0 50 100 X [m] Time[s] Time[s]

Figure 6.5 Evolution of position and orientation parameters and control signals

42

H vs. Time H vs. Time H vs. Time 11 12 13 2 0.2 1000

1 0 0 Realized H 13 0 -1000 Desired H 13 -1 -0.2 -2000 0 50 100 0 50 100 0 50 100 Time[s] Time[s] Time[s] H vs. Time H vs. Time H vs. Time 21 22 23 1 2 1

0 1 0

-1 0 -1 0 50 100 0 50 100 0 50 100 Time[s] Time[s] Time[s]

-3 H vs. Time -4 H vs. Time H vs. Time x 10 31 x 10 32 33 6 6 4 Realized H 33 4 4 3 Desired H 33 2 2 2

0 0 1 0 50 100 0 50 100 0 50 100 Time[s] Time[s] Time[s]

Figure 6.6 Evolution of homography elements

Error in Lateral Position (X) in time Error in Depth (Z) in time Error in Orientation () in time 1 0 5

0 -2 0

-4 -5 -1

-6 -10 -2 -8 -15 -3

-10 -20

[m] [m] -4 [deg] -12 -25 -5 -14 -30

-6 -16 -35

-7 -18 -40

-8 -20 -45 0 50 100 0 50 100 0 50 100 Time[s] Time[s] Time[s]

Figure 6.7 Evolution of error in position and orientation parameters

43 iii) Results for initial configuration of (x = 10, z = −35, ϕ = −25°):

Evolution of Lateral Position (X) in time Evolution of Depth (Z) in time Evolution of Orientation () in time 15 0 80

60 10 -10 40

5 -20 20

[deg]

X [m] Z [m]  0 0 -30 -20

-5 -40 -40 0 50 100 0 50 100 0 50 100 Time[s] Time[s] Time[s]

Followed Path: X vs. Z Control Output: Linear Velocity(v) vs. Time Control Output: Angular Velocity(w) vs. Time 0 1.5 4 -5 2 1 -10 -15 0 0.5

Z [m] -20

v [m/s] -2 w [deg/s]w -25 0 -4 -30

-35 -0.5 -6 -10 0 10 20 0 50 100 0 50 100 X [m] Time[s] Time[s]

Figure 6.8 Evolution of position and orientation parameters and control signals

H vs. Time H vs. Time H vs. Time 11 12 13 4 0.5 2000

2 0 0 0 Realized H -2000 13 -2 Desired H 13 -4 -0.5 -4000 0 50 100 0 50 100 0 50 100 Time[s] Time[s] Time[s] H vs. Time H vs. Time H vs. Time 21 22 23 1 2 1

0.5 1.5 0.5

0 1 0

-0.5 0.5 -0.5

-1 0 -1 0 50 100 0 50 100 0 50 100 Time[s] Time[s] Time[s]

-3 H vs. Time -3 H vs. Time H vs. Time x 10 31 x 10 32 33 8 1 5 Realized H 33 6 4 Desired H 33 4 0.5 3

2 2

0 0 1 0 50 100 0 50 100 0 50 100 Time[s] Time[s] Time[s]

Figure 6.9 Evolution of homography elements

44

Error in Lateral Position (X) in time Error in Depth (Z) in time Error in Orientation () in time 14 0 70

60 12 -5 50 10 -10 40

8 30 -15

6 20

[m] [m] [deg] -20 10 4

-25 0 2 -10 -30 0 -20

-2 -35 -30 0 50 100 0 50 100 0 50 100 Time[s] Time[s] Time[s]

Figure 6.10 Evolution of error in position and orientation parameters iv) Results for initial configuration of (x = 10, z = −25, ϕ = −35°):

Evolution of Lateral Position (X) in time Evolution of Depth (Z) in time Evolution of Orientation () in time 20 0 150

15 -5 100

10 -10

50

[deg]

X [m] Z [m]

5 -15 

0 0 -20

-5 -25 -50 0 50 100 0 50 100 0 50 100 Time[s] Time[s] Time[s]

Followed Path: X vs. Z Control Output: Linear Velocity(v) vs. Time Control Output: Angular Velocity(w) vs. Time 0 1.5 10

-5 1 5

-10 0.5

0 Z [m]

-15 v [m/s] 0 w [deg/s]w

-5 -20 -0.5

-25 -1 -10 0 5 10 15 20 0 50 100 0 50 100 X [m] Time[s] Time[s]

Figure 6.11 Evolution of position and orientation parameters and control signals

45

H vs. Time H vs. Time H vs. Time 11 12 13 2 0.5 2000

0 0 0 Realized H 13 -2 -2000 Desired H 13 -4 -0.5 -4000 0 50 100 0 50 100 0 50 100 Time[s] Time[s] Time[s] H vs. Time H vs. Time H vs. Time 21 22 23 1 2 1

0.5 1.5 0.5

0 1 0

-0.5 0.5 -0.5

-1 0 -1 0 50 100 0 50 100 0 50 100 Time[s] Time[s] Time[s]

-3 H vs. Time -4 H vs. Time H vs. Time x 10 31 x 10 32 33 6 6 3 Realized H 33 4 2.5 Desired H 4 33 2 2 2 0 1.5

-2 0 1 0 50 100 0 50 100 0 50 100 Time[s] Time[s] Time[s]

Figure 6.12 Evolution of homography elements

Error in Lateral Position (X) in time Error in Depth (Z) in time Error in Orientation () in time 18 0 120

16 100

14 -5 80 12

60 10 -10

8 40

[m] [m] [deg]

6 -15 20

4 0 2 -20

-20 0

-2 -25 -40 0 50 100 0 50 100 0 50 100 Time[s] Time[s] Time[s]

Figure 6.13 Evolution of error in position and orientation parameters

46 v) Results for initial configuration of (x = −3, z = −20, ϕ = 30°):

Evolution of Lateral Position (X) in time Evolution of Depth (Z) in time Evolution of Orientation () in time 1 0 40

0 20 -5 -1 0

-2 -10

[deg]

X [m] Z [m]

 -20 -3 -15 -40 -4

-5 -20 -60 0 50 100 0 50 100 0 50 100 Time[s] Time[s] Time[s]

Control Output: Linear Velocity(v) vs. Time Control Output: Angular Velocity(w) vs. Time Followed Path: X vs. Z 1.2 6 0 1 4 -5 0.8

0.6 2

-10 Z [m]

v [m/s] 0.4 0 w [deg/s]w -15 0.2 -2 0 -20 -10 -5 0 5 -0.2 -4 0 50 100 0 50 100 X [m] Time[s] Time[s]

Figure 6.14 Evolution of position and orientation parameters and control signals

H vs. Time H vs. Time H vs. Time 11 12 13 4 0.5 2000 Realized H 13 2 1000 Desired H 0 13 0 0

-2 -0.5 -1000 0 50 100 0 50 100 0 50 100 Time[s] Time[s] Time[s] H vs. Time H vs. Time H vs. Time 21 22 23 1 2 1

0 1 0

-1 0 -1 0 50 100 0 50 100 0 50 100 Time[s] Time[s] Time[s]

-3 H vs. Time -4 H vs. Time H vs. Time x 10 31 x 10 32 33 4 6 4 Realized H 33 4 3 Desired H 33 2 2 2

0 0 1 0 50 100 0 50 100 0 50 100 Time[s] Time[s] Time[s]

Figure 6.15 Evolution of homography elements

47

Error in Lateral Position (X) in time Error in Depth (Z) in time Error in Orientation () in time 1 0 30

-2 20 0 -4 10 -6 -1 0 -8

-2 -10 -10

[m] [m] [deg]

-12 -20 -3 -14 -30 -16 -4 -40 -18

-5 -20 -50 0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100 Time[s] Time[s] Time[s]

Figure 6.16 Evolution of error in position and orientation parameters vi) Results for initial configuration of (x = −0.25, z = −1.2, ϕ = −20°):

Evolution of Lateral Position (X) in time Evolution of Depth (Z) in time Evolution of Orientation () in time 0.1 0 5

0.05 -0.2 0 0 -0.4

-0.05 -0.6 -5

[deg] Z [m] X [m] -0.1 -0.8  -10 -0.15 -1 -15 -0.2 -1.2

-0.25 -1.4 -20 0 50 100 0 50 100 0 50 100 Time[s] Time[s] Time[s]

Control Output: Linear Velocity(v) vs. Time Control Output: Angular Velocity(w) vs. Time Followed Path: X vs. Z 0.03 0.4 0 0.025 -0.2 0.3 -0.4 0.02

-0.6 0.015 0.2

Z [m] v [m/s] -0.8 0.01 [deg/s]w 0.1 -1 0.005

-0.5 0 0.5 0 0 0 50 100 0 50 100 X [m] Time[s] Time[s]

Figure 6.17 Evolution of position and orientation parameters and control signals

48

H vs. Time -3 H vs. Time H vs. Time 11 x 10 12 13 1 0 100

-1 0

0.95 -2 -100 Realized H -3 -200 13 Desired H 13 0.9 -4 -300 0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100 Time[s] Time[s] Time[s] H vs. Time H vs. Time H vs. Time 21 22 23 1 2 1

0.5 1.5 0.5

0 1 0

-0.5 0.5 -0.5

-1 0 -1 0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100 Time[s] Time[s] Time[s]

-4 H vs. Time -5 H vs. Time H vs. Time x 10 31 x 10 32 33 10 4 1.2 Realized H 33 3 Desired H 5 1.1 33 2 0 1 1

-5 0 0.9 0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100 Time[s] Time[s] Time[s]

Figure 6.18 Evolution of homography elements

Error in Lateral Position (X) in time Error in Depth (Z) in time Error in Orientation () in time 0.1 0 5

0.05 -0.2 0

0 -0.4

-5

-0.05 -0.6

[m] [m] [deg] -0.1 -0.8 -10

-0.15 -1

-15 -0.2 -1.2

-0.25 -1.4 -20 0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100 Time[s] Time[s] Time[s]

Figure 6.19 Evolution of error in position and orientation parameters

49 vii) Results for initial configuration of (x = 12, z = −40, ϕ = 45) and this time target configuration of (x = −8, z = −5, ϕ = −20°).

Evolution of Lateral Position (X) in time Evolution of Depth (Z) in time Evolution of Orientation () in time 15 -5 60

-10 10 40 -15

5 -20 20

[deg]

X [m] Z [m] 0 -25  0 -30 -5 -20 -35

-10 -40 -40 0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100 Time[s] Time[s] Time[s]

Followed Path: X vs. Z Control Output: Linear Velocity(v) vs. Time Control Output: Angular Velocity(w) vs. Time -5 1 1

-10 0.8 0 -15 0.6 -20 -1

0.4 Z [m]

-25 v [m/s]

w [deg/s]w -2 0.2 -30 -3 -35 0

-40 -0.2 -4 -10 0 10 20 0 20 40 60 80 100 0 20 40 60 80 100 X [m] Time[s] Time[s]

Figure 6.20 Evolution of position and orientation parameters and control signals

H vs. Time H vs. Time H vs. Time 11 12 13 1.4 0.2 2000 Realized H 13 1.2 0.1 1000 Desired H 13

1 0 0

0.8 -0.1 -1000 0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100 Time[s] Time[s] Time[s] H vs. Time H vs. Time H vs. Time 21 22 23 1 2 1

0 1 0

-1 0 -1 0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100 Time[s] Time[s] Time[s]

-3 H vs. Time -3 H vs. Time H vs. Time x 10 31 x 10 32 33 4 1.5 10 Realized H 33 2 1 Desired H 33 5 0 0.5

-2 0 0 0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100 Time[s] Time[s] Time[s]

Figure 6.21 Evolution of homography elements

50

Error in Lateral Position (X) in time Error in Depth (Z) in time Error in Orientation () in time 20 0 70

60 -5 15 50 -10

40 10 -15

30

[m] [m] [deg] -20 5 20

-25 10 0 -30 0

-5 -35 -10 0 50 100 0 50 100 0 50 100 Time[s] Time[s] Time[s]

Figure 6.22 Evolution of error in position and orientation parameters

Since the homography decomposition is not necessary and it is not done in this control approach, the normal vector(풏) of the plane that generates the homography and the distance(푑휋 ) between that plane and the origin of the target frame are not known, so the term 푛푧 used in control and homography calculations is not known exactly either. Therefore, the 푑휋 푛푧 value of term must be estimated. The effect of the uncertainty in 푛푧 and 푑휋 on the 푑휋 performance is checked by using fixed values in the computation of the homography and varying those values in the control law. Figure 6.23 and 6.24 show the effect of this uncertainty on the final pose error.

51

0.6 Lateral(x) Error[m] 0.55 Depth(z) Error[m] 0.5 Orientation() Error[deg] 0.45

0.4

0.35

0.3

0.25

0.2

0.15

0.1 Final Final Pose Error 0.05

0

-0.05

-0.1

-0.15

-0.2

-0.25

-0.3 0 5 10 15 20 25 30 d [m] 

Figure 6.23 Final pose error for different 푑휋 values

Lateral(x) Error[m] 0.95 Depth(z) Error[m] 0.9 Orientation() Error[deg] 0.85 0.8 0.75 0.7 0.65 0.6 0.55 0.5 0.45

0.4 Final Final Pose Error 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0

-5 -4.5 -4 -3.5 -3 -2.5 -2 -1.5 -1 -0.5 0 n [m] z

Figure 6.24 Final pose error for different 푛푧 values

As can be understood from the results shown in figures 6.23 and 6.24, the convergence of the approach is not affected by the uncertainty and good final pose errors are obtained. Another important issue in most of the visual servoing systems is the calibration of the camera. Since

52 the elements of the intrinsic camera matrix take place in the control law and in the computation of homography, it is necessary to investigate the impacts of the elements of the intrinsic camera matrix on the performance. The simulation results presented before are obtained by taking the focal length of the camera as 6 millimeters as mentioned before and the principal point is assumed to be at the centre of the image(푥0 = 0, 푦0 = 0). Final pose errors of the robot are shown in figures 6.25-6.27 for a range of the focal length and the coordinates of the principal point.

0.2

0 Lateral(x) Error Depth(z) Error -0.2 Orientation() Error

-0.4

Pose Error -0.6

-0.8

-1

-1.2 0 5 10 15 f [mm]) Figure 6.25 Final pose error varying the focal length

0.035 Lateral(x) Error 0.03 Depth(z) Error Orientation() Error 0.025

0.02

0.015

0.01 Pose Error

0.005

0

-0.005

-0.01 -50 -40 -30 -20 -10 0 10 20 30 40 50 x (pixels) 0 Figure 6.26 Final pose error varying the location of the x coordinate of the principle point

53

-4 x 10 3 Lateral(x) Error Depth(z) Error 2.5 Orientation() Error

2

1.5

1 Pose Error

0.5

0

-0.5 -50 -40 -30 -20 -10 0 10 20 30 40 50 y (pixels) 0 Figure 6.27 Final pose error varying the location of the y coordinate of the principle point

Results indicate that the method is able to compensate for the calibration errors. In other words, a rough calibration is sufficient to ensure the convergence of the system.

Also, the performance of the system is analyzed when noise is applied to the homography 푥 −5 푥 0 elements directly. The results of driving the robot from 푧 = −15 to 푧 = 0 with 휙 5 휙 0 white noise of standard deviation(휍) equal to 0.3 are represented in figures 6.28 and 6.29.

Evolution of Lateral Position (X) in time Evolution of Depth (Z) in time 2 5

0 0

-5

-2 Z [m] X [m] -10

-4 -15

-6 -20 0 20 40 60 80 100 0 20 40 60 80 100 Time[s] Time[s]

Evolution of Orientation () in time Followed Path: X vs. Z 10 0

0 -5

-10

[deg]

Z [m]  -10 -20

-30 -15 0 20 40 60 80 100 -15 -10 -5 0 5 10 Time[s] X [m]

Figure 6.28 Evolution of pose parameters with noise

54

H vs. Time H vs. Time 13 33 800 3.5

600 3

400 2.5

200 2 0

1.5 -200

1 -400

-600 0.5 0 20 40 60 80 100 0 20 40 60 80 100 Time[s] Time[s] Figure 6.29 Evolution of homography elements with noise

Besides, the final pose error under the effect of white noise with increasing standard deviation(휍) is given in figure 6.30.

20 Lateral(x) Error[m] Depth(z) Error[m] 15 Orientation() Error[deg]

10

5

0

-5 Final Final Pose Error -10

-15

-20

-25 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2  Figure 6.30 Final pose error varying noise on homography

It could be inferred from the graphics above that the convergence of the system is achieved in spite of the existence of noise. Unsurprisingly, the higher the standard deviation noise has, the more deviation from the target configuration occurs. Lateral and depth errors are

55 compensated better when compared with orientation error when noise with high standard deviation is affecting the system.

As a final explanatory remark in this chapter, the ways of finding 휓 will be discussed. Since the definition of the desired trajectories is absolutely necessary to carry out the simulations and since the desired trajectory of 푕13 includes 휓 , it is a must to know its value during simulations. There are two methods that can be employed to find out 휓 in simulations.

1-) Initial pose of the mobile robot is provided as an input to simulation algorithm so the initial values of 푥, 푧 and 휙 are known in the beginning. If the target pose is 푇 풙푡 = [푥푡 푧푡 휙푡 ] , then the value of 휓 can be computed by using the relation 푥−푥푡 휓 = − arctan − 휙푡 which can be inferred by examining figure 5.13. If the target pose 푧−푧푡 is 풙 = [0 0 0]푇, then simply 휓 = − arctan 푥 . After the computation of 휓, the construction 푡 푧 of the control law can be completed and the robot can be driven by the control signal to its next position(푥, 푧) and orientation(휙) that are also known. Applying 휓 = − arctan 푥 again, 푧 the control signal can be calculated and robot can be driven again. This loop continues until robot reaches the target pose.

2-) Second way of finding 휓 is related to the target epipole. Please refer to Appendix A for information about epipolar geometry. The relation between the target epipole and 휓 is explained by the help of figures 6.31 and 6.32.

Figure 6.31 Epipoles in the current and target poses

56

If the target epipole is zoomed in, figure 6.32 is obtained.

Figure 6.32 Target epipole

The triangle in figure 6.32 reveals out equation (6.1) which gives the value of 휓. 푒 푒 tan 180 − 휓 = − tan 휓 = 푡푥 ==> 휓 = − arctan 푡푥 (6.1) 훼푥 훼푥

In equation (6.1), 훼푥 is the focal length of the camera in pixel dimensions, so 푒푡푥 must also be in pixel dimensions in order to make the argument of the arctangent function unitless. In order to find out 휓 from equation (6.1), the value of x coordinate of the target epipole in pixel dimensions must be known. This is done by projecting the focal center of the camera, which is at the current pose, 퐶푐 , onto the image plane of the camera which is at the target pose. When figure 6.31 is analyzed, it is seen that the ray emanating from 퐶푐 and going towards 퐶푡 creates the target epipole. The relationship between a 3D homogeneous point 퐗 = [X Y Z 1]T expressed in the fixed world frame and its projection 퐱 = [x y 1]T in the image plane of the camera is:

퐱 = 퐏퐗 = 퐊 퐑 퐭]퐗 where (퐑, 퐭) are the extrinsic parameters(the rotation and the translation between the fixed world and the camera frames) and 퐊 is the intrinsic camera matrix as explained in perspective projection section. Therefore, there are two steps to calculate the target epipole in pixel dimensions.

i) Compute 3x4 projection matrix 퐏 for the target pose,

ii) Project the focal center of the camera, which is at the current pose, onto the image plane of the camera, which is at the target pose by 퐞퐭 = 퐏퐗퐂퐜 . Here, 퐞퐭 is the 3x1 vector standing for the target epipole, 퐏 is the 3x4 projection matrix of the target pose found in step

(i), and 퐗퐂퐜 is the 4x1 vector showing the homogeneous coordinates of the focal center of the camera which is at the current pose with respect to the world coordinate frame.

After the calculation of 퐞퐭, 푒푡푥 , which is the x coordinate of the target epipole, can easily be found and used in equation (6.1) to ascertain 휓. Then, the construction of the control law can be finished. Also, please note that the time derivative of the desired trajectories is necessary to find the control signal. Since numerical values of 휓 are available, time derivative of 휓 is found by numerical differentiation such that 휓 푡 = lim 휓 푡+Δ푡 −휓 푡 . Δ푡→0 Δ푡

57

7-EXPERIMENTAL ARRANGEMENTS

In an experiment, there are only real images from the camera as inputs and nothing else. This control algorithm needs two images, one of which is the image taken at the desired pose and the other one is the current image. It tries to drive the robot from the initial configuration towards the target pose by comparing the image taken at the desired pose and the current images captured during the motion. Control loop for an experiment is shown in figure 7.1.

Figure 7.1 Diagram of the control loop for an experiment

Features extraction from images and matching of image points are carried out by SIFT. SIFT(Scale Invariant Feature Transform) is an interest point detector and descriptor which is invariant to scale and rotation as explained in section 5.1.3.2. The information obtained from SIFT is used for the estimation of homography and the extraction of 휓. Estimation of homography is done by direct linear transformation method as elucidated in 5.1.3.2, and the extraction of 휓 is achieved by the relation 휓 = − arctan 푒푡푥 , so x coordinate of the target 훼푥 epipole must be found in reality from images. An algorithm proposed by [10] is used in order to find out the fundamental matrix and then epipoles. Please refer to Appendix B for information about the derivation of fundamental matrix and epipoles. After the extraction of 휓 and the computation of 3x3 homography matrix, construction of the control law is complete. Then, the control signal which includes the angular and linear velocities compatible with the aim can be applied to the robot. Thus, all required algorithms to carry out an experiment and an understanding of them are explained in this report. Although all necessary Matlab scripts are prepared to conduct an experiment on top of the required Matlab codes of simulations, we had lack of time in this three month internship project to perform an experiment.

58

It takes about 1.25 seconds(0.8 Hz) to calculate the control signal from two real images and the completion time of one cycle of the control loop depends on the communication speed between the computer and the robot. It has been verified by [23] with the experiments that if the control loop runs even at 0.75 Hz of frequency, the stability of the system is achieved. Thus, if the communication between the robot and the computer is sufficiently fast, then the proposed algorithm has to perform well with the guarantee of stability.

8-CONCLUSIONS

In this project, a research on mobile robot navigation using visual servo control methods is carried out. A homography based visual servoing method is decided to apply on a nonholonomic mobile robot. A control law is constructed based upon the input-output linearization of the system. Outputs of the system are chosen among the homography elements and a set of desired trajectories for those outputs are defined. Therefore, the visual servo control problem is transformed into a tracking problem. The visual control method needs neither homography decomposition nor depth estimation nor any 3D measure of the scene. Simulations show that the control algorithm is robust and the convergence of the system is achieved with noise, calibration errors and uncertainty of the control parameters.

The performance of the system is obviously dependent on the desired trajectories of the homography elements, since the problem is a tracking problem. In literature, there are several proposed sets of desired trajectories of the homography elements and one of them is used in this project. The set of desired trajectories picked up makes the robot converge towards the target in a smooth manner avoiding discrete motions. However, the mobile robot can not always converge to the target with zero pose error in a specified duration. This is mainly because of that the desired homography trajectories dictate the robot to follow a path which can not be achieved with present robot capabilities. Therefore, the mapping from homography trajectories to Cartesian path should be investigated more as a future work, and while doing that, the abilities of the robot should be taken into account. Then, more appropriate and realizable desired homography trajectories could be brought out.

Also, there is a drawback for all homography based control methods used in applications and offered in the literature. The homgoraphy based control methods may fail or give insufficient results, if no plane is detected in the scene or the plane detected has 푛푧 = 0, i.e., the plane is horizontal. In order to get rid of this disadvantage, some switching model based control methods are proposed, such that when there is no appropriate plane detected to employ homography based visual control, another control method takes over the control of the system. If the other control method faces a singularity, then the homography based control method becomes in charge again. As a future work, an addition of another control method such as epipole based control method to the present work will eventually increase the versatility and the robustness of the robot, on which the switching control algorithm is used.

59

APPENDIX A

When two cameras view a 3D scene from two distinct positions or when a single camera takes the pictures of the same 3D scene from different positions, there are a number of geometric relations between the 3D points and their projections onto the 2D images that lead to constraints between the image points. Figure A.1 shows two cameras looking at point X which is the point of interest to the cameras. OL and OR are the centers of projection(focal points) of the cameras. XL and XR are the projected points of 3D point X onto the image planes. Each camera captures a 2D image of the 3D world and transformation from 3D to 2D is carried out by perspective projection.

Figure A.1 Epipolar Geometry

Centers of projection of the cameras are distinct so each center of projection is projected onto a distinct point into the other camera's image plane(projection manifold) [24]. These two points on the image planes are denoted by eL and eR and they are called epipoles. Centers of projections and the epipoles of the cameras lie on the same 3D line. The line OL − X is viewed by the left camera as a point because that line is the projection ray such that it is directly in line with the left camera's center of projection. On the other hand, the very same line is seen as a line by the right camera and the projection of that line onto the image plane of the right camera is called an epipolar line (eR − XR). In the same manner, the line OR − X which is seen as a point by the right camera is viewed as an epipolar line (eL − XL) by the left camera. Additionally, the plane formed by OL, OR and X is called the epipolar plane. This plane intersects each camera's image plane and that intersection results in a line which is the epipolar line. All epipolar planes and lines intersect the epipole regardless of the location of

X. Additionally, the vector, w , originated from OL pointing towards OR is called positive epipolar ray while the vector, −w originated from OL pointing to the opposite direction is called negative epipolar ray.

60

The knowledge of the signs of the epipoles at the beginning of the motion is required in the determination of desired trajectory of h13 . If the robot has not got at a suitable orientation at the beginning of the motion, there is an extra step that should be taken in order to drive the robot into a proper orientation for a smooth motion towards the target. Decision about the extra step is dependent on the signs of x coordinates of the epipoles with respect to the robot attached coordinate frame. In the framework of this project, the mobile robot moves with planar motion, so only the x coordinates of the epipoles change in time. Therefore, x coordinates of the epipoles are the decisive factors. This phenomenon is explained with the help of figure A.2.

(a) (b) Figure A.2 Geometric relations of the epipoles in the current image and target image x coordinate of the target epipole is always positive when the initial position of the robot is in the third quadrant(x<0 and z<0) of target frame, such as the cases illustrated in figure A.2. To explain, the ray emanating from Cc crosses the projection manifold of the target scene in the first quadrant of the target frame, so epipole takes place in the first quadrant and it has a positive x coordinate. In a similar analogy, if the robot is in the fourth quadrant initially, target epipole will always be in the second quadrant and it will have a negative x coordinate. If the current epipoles are analyzed, it is seen that x coordinate of the current epipole of "case a" has a positive value and x coordinate of the current epipole of "case b" is negative with regard to the robot attached coordinate frames. Therefore, the desired trajectory of h13 is defined in three phases for the "case a". However, it is defined in two phases for the "case b" skipping the extra step.

61

APPENDIX B

The epipolar geometry explained in Appendix A is the intrinsic projective geometry between two views and independent of scene structure. It only depends on the cameras' internal parameters and relative pose. The fundamental matrix F encapsulates this intrinsic geometry [10]. In other words, it is the algebraic representation of epipolar geometry.

A point X in three dimensional space is projected onto two images as being 풙 in the first image and 풙′ in the second image. Then, the fundamental matrix shows the relation between these two image points. The image points 풙 and 풙′ , the space point X, and the camera centers are coplanar as shown in figure B.1, and this plane is called epipolar plane and denoted by 휋.

Figure B.1 3D Point X and its image points x and x′

The image point 풙 back projects to a ray in 3D space defined by camera center, C, and 풙 which are collinear. This ray is seen as line 풍′ in the second image.

Figure B.2 The ray emanating from C and passing through x is seen as line l′ (epipolar line for x) in the second image

62

As it can be seen in figure B.2, for each point 퐱 in one image, there is a corresponding epipolar line 풍′ in the other image and the matched point 퐱′ of 퐱 must lie on 풍′ . The fundamental matrix defines the mapping from a point in one image to its corresponding epipolar line in the other image(퐱 → 풍′ ), and it satisfies the condition that for any pair of corresponding points 풙 ↔ 풙′ in two images

푻 풙′ 푭풙 = 0 (B. 1)

If points 풙 and 풙′ are the matching points, then 풙′ must lie on the epipolar line 풍′ = 푭풙. Since 푻 푻 풙′ is on 풍′ , the equation 풙′ 풍′ = 0 must be satisfied. Plugging 풍′ = 푭풙 into 풙′ 풍′ = 0 results 푻 in 풙′ 푭풙 = 0. If the fundamental matrix denoted by

f11 f12 f13 푭 = f21 f22 f23 f31 f32 f33 is written as

푇 풇 = [f11 f12 f13 f21 f22 f23 f31 f32 f33 ] and 풙 = [푥 푦 1]푇 and 풙′ = [푥′ 푦′ 1] , then each point match results in one linear equation in terms of the unknown entries of the fundamental matrix, as shown in equation (B.2) below:

′ ′ ′ ′ ′ ′ 푥 푥f11 + 푥 푦f12 + 푥 f13 + 푦 푥f21 + 푦 푦f22 + 푦 f23 + 푥f31 + yf32 + f33 = 0 or

푥′ 푥 푥′ 푦 푥′ 푦′ 푥 푦′ 푦 푦′ 푥 y 1 풇 = 0 (B. 2)

For a set of n point matches, a set of linear equations are obtained.

′ ′ ′ ′ ′ ′ 푥1 푥1 푥1 푦1 푥1 푦1 푥1 푦1 푦1 푦1 푥1 y1 1 .

. 풇 = 푨풇 = 0 (B. 3) . ′ ′ ′ ′ ′ ′ 푥푛 푥푛 푥푛 푦푛 푥푛 푦푛 푥푛 푦푛 푦푛 푦푛 푥푛 yn 1

Equation (B.3) shows a homogeneous set of equations, so 풇 can be determined up to a scale [10]. In order to obtain a solution for 풇, matrix A must have a rank of 8 at most. If it has a rank of 8, then there exists a unique solution. However, if the data is noisy, then the rank may be higher than 8. In this case, least squares solution is applied to find out 풇. The least-squares solution for 풇 is the singular vector corresponding to the smallest singular value of 푨, that is, the last column of 푽 in SVD(푨) = 푼푫푽푻.

An important property of the fundamental matrix is that it is not full rank, that is, it is not an invertible mapping. An image point 풙 in one image defines a line 풍′ in the other image which is the epipolar line of 풙. In the same manner, the image point 풙′ in the second image also defines a line 풍 in the first image which is the epipolar line of 풙′ . Then, any point 풙 on 풍 is mapped to the same line 풍′ . Therefore, there is no inverse mapping since the location of the

63 inverse mapped point of line 풍′ can not be exactly known, i.e., it can be anywhere on the epipolar line 풍. This phenomenon makes the fundamental matrix rank deficient and it has a rank of 2. Also, another consequence of the singularity of the fundamental matrix is that the epipole location does not vary for different points. Physical interpretation of the singularity of the fundamental matrix is explained by the help of figure B.3 [10].

Figure B.3 (a)Full rank Fundamental Matrix (b)Rank Deficient Fundamental Matrix

The lines seen in figure B.3 are the epipolar lines calculated using 풍′ = 푭풙 for different 풙 points. There is no common epipole in (a), but all epipolar lines intersect at the same point which is the epipole in (b).

The fundamental matrix obtained by solving the linear equations in equation (B.3) may not be of rank 2 due to contaminated data due to noise. For such a case, there is a step that should be applied to force the fundamental matrix to be singular. This is done by singular value decomposition. If singular value decomposition is applied to 푭 found from equation (B.3), the following result is obtained:

a 0 0 SVD(푭) = 푼푫푽푻 where 푫 = 0 b 0 and a ≥ b ≥ c. 0 0 c Then, the reconstruction of the fundamental matrix is done by making the smallest singular value zero, such that a ≥ b ≥ c = 0. Hence, 푭 = 푼 푑푖푎푔 푎, 푏, 0 푽푻 and it has a rank of 2. Besides, the epipoles in two images are the left and right nullspaces of 푭, i.e., the last columns of 푼 and 푽 respectively.

64

REFERENCES

[1] G. N. DeSouza and A. C. Kak, “Vision for mobile robot navigation: A survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 2, pp. 237–267, 2002.

[2] S. Hutchinson, G. D. Hager, and P. I. Corke, “A tutorial on visual servo control”, IEEE Tran. on and Automation, vol. 12, no. 5, pp. 651–670, 1996.

[3] Francois Chaumette and Seth Hutchinson, “Visual Servo Control Part 1: Basic Approaches and Part 2: Advanced Approaches” , IEEE Robotics & Automation Magazine, December 2006.

[4] E. Malis, F. Chaumette, S. Boudet, 2 ½ D Visual servoing. IEEE Transactions on Robotics and Automation, 1999.

[5] Mark W. Spong, Seth Hutchinson, M.Vidyasagar, Robot Modeling and Control, John Wiley & Sons, Inc. ,the USA.

[6] http://en.wikipedia.org, Charge Coupled Devices. Obtained on 10th of December, 2009.

[7] B. Thuilot, P. Martinet, L.Cordesses, J. Gallice, “Position based visual servoing: Keeping the object in the field of vision”, in Proc. IEEE Int. Conf. Robot Automat., pp. 1624-1629, May 2002.

[8] W.Wilson, C.Hulls, G. Bell, “Relative end effector control using cartesian position based visual servoing”, IEEE Trans. Robot. Automat. vol. 12, pp. 684-696, Oct. 1996.

[9] C.Sagues, G. Lopez-Nicolas, J.J.Guerrero, “Homography based visual control of nonholonomic vehicles”, IEEE Int. Conference on Robotics and Automation, pages 1703- 1708, Rome- Italy, April 2007

[10] R. Hartley, A. Zisserman, Multiple View Geometry in Computer Vision, Cambridge University Press: Cambridge, UK.

[11] http://mathworld.wolfram.com, Dilation. Obtained on 10th of December, 2009.

[12] https://www.e-education.psu.edu/natureofgeoinfo/c2_p18.html, Nature of Geographic Information , Plane Coordinate Transformations. Obtained on 15th of October, 2009

[13] Elan Dubrofsky, “Homography Estimation: A Master's essay submitted in partial fulfillment of the requirements for the degree of master of science in faculty of graduate studies”, University of British Columbia, March 2009.

[14] http://www.svgopen.org/2008/papers/86-Achieving_3D_Effects_with_SVG, Achieving 3D Effects with SVG For the SVG Open 2008 conference. Obtained on 5th of December.

65

[15] Z. Chuan, T.D. Long, Z. Feng and D.Z. Li, “A planar homography estimation method for camera calibration”, Computational Intelligence in Robotics and Automation, 2003 and IEEE International Symposium on, 1:424-429, 2003.

[16] Z. Zhang., “A flexible new technique for camera calibration”, IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(11):1330–1334, 2000.

[17] Anubhav Agarwal, C. V. Jawahar, and P. J. Narayanan, “A Survey of Planar Homography Estimation Techniques”,Tech. Rep. IIIT/TR/2005/12, 2005.

[18] http://en.wikipedia.org, Camera resectioning. Obtained on 10th of October,2009.

[19] David G. Lowe, “Distinctive Image Features from Scale-Invariant Keypoints”, International Journal of Computer Vision, January 2004.

[20] David G. Lowe, “Object Recognition from Local Scale-Invariant Features”, Proc. of International Conference on Computer Vision, Corfu, September 1999.

[21] http://en.wikipedia.org, Scale invariant feature transform. Obtained on 1st of December,2009.

[22] J.J.E Slotine, Li Wieping, “Applied Non-linear Control”, Prentice-Hall.

[23] C. Sagues, G. Lopez-Nicolas, J.J. Guerrero, “Visual Control of Vehicles Using Two View Geometry”, sent to the journal “Mechatronics”,2009.

[24] http://en.wikipedia.org, Epipolar Geometry. Obtained on 15th November,2009.

66