6D Image-based Visual Servoing for Manipulators with uncalibrated Stereo Cameras

Caixia Cai1, Emmanuel Dean-Leon´ 2, Nikhil Somani1, Alois Knoll1

Abstract— This paper introduces 6 new image features to A. Related work provide a solution to the open problem of uncalibrated 6D image-based visual servoing for robot manipulators, where An IBVS usually employs the image Jacobian matrix the goal is to control the 3D position and orientation of the (Jimg) to relate end-effector velocities in the manipulator’s robot end-effector using visual feedback. One of the main Task space to the feature parameter velocities in the feature contributions of this article is a novel stereo camera model which employs virtual orthogonal cameras to map 6D Cartesian (image) space. A full and comprehensive survey on Visual poses defined in the Task space to 6D visual poses defined Servoing and image Jacobian definitions can be found in [1], in a Virtual Visual space (Image space). This new model is [3], [4] and more recently in [5]. In general, the classical used to compute a full-rank square Image Jacobian matrix image Jacobian is defined using a set of image feature (Jimg), which solves several common problems exhibited by the measurements (usually denoted by s) and it describes how classical image Jacobians, e.g., Image space singularities and local minima. This Jacobian is a fundamental key for the image- image features change when the robot manipulator pose based controller design, where a chattering-free adaptive second changess ˙ = Jimgv. In Visual Servoing the image Jacobian order sliding mode is employed to track 6D visual motions for needs to be calculated or estimated. Its inverse is used to a robot manipulator. Exponential convergence of errors in both map the image feature velocitiess ˙ into a meaningful state spaces without local minima are demonstrated. The complete variable required for the control law (usually the generalized control system is experimentally evaluated on a real . The robustness of the control scheme is evaluated for joint velocitiesq ˙). cases where the extrinsic parameters of the uncalibrated stereo In general, the image Jacobian can be computed using camera system are changed on-line and unknown when the direct depth information (depth-dependent Jacobian) [6],[7], stereo system is manually moved to obtain a clearer view of by approximation via on-line estimation of depth of the fea- the task. tures (depth-estimation Jacobian) [3], [5], [8], or using depth- independent image Jacobian matrix [9],[10]. Additionally, I.INTRODUCTION many papers directly estimate on-line the complete image Jacobian in different ways [11],[12], [13]. However, all these Visual servoing control (VSC) is an approach to control methods use redundant image point coordinates to define the motion of a robot manipulator using visual feedback (as a general rule) a non-square image Jacobian, which is from a vision system. This has been one of the most active 2p a differentiable mapping from SE(3) to s ∈ R (with p as topics in since the early 1990s [1]. We are concerned the number of feature points). Then, a generalized inverse with Image-Based Visual Servoing (IBVS) where the error of the image Jacobian needs to be computed, which leads to function is defined directly in terms of image features. well-known problems such as the Image space singularities This work aims to build upon the concepts of image- and local minima. based visual servoing and attempt to address some of the In our early work [14], we introduced a stereo camera most common problems plaguing conventional approaches model based on a virtual composite camera system, that can by introducing additional features and behaviors. As pointed provide 3D visual position vector. This visual position vector out in [1] and [2], convergence and stability problems may generates a 3 × 3 full-rank Image Jacobian that maps veloc- sometimes occur in IBVS. Local minima in the trajectories ities from the Task space to the Virtual Visual space. Using and singularities in the Image Jacobian (also known as this image Jacobian, a visual servo control is implemented Interaction Matrix) can severely affect the visual servoing to drive 3 DOF of a real robot to trace time-varying desired task. In image-based control approach, the ideal case is trajectories defined in an uncalibrated Image space. However, to find a particular visual feature where the interaction this approach is limited to control only 3D positions, and the matrix has neither local minima nor singularities. During the image Jacobian is only suitable for 3 DOF . last decade, several authors have worked on solving these In this paper, we extend the visual features to control 6D problems. In the following section, we will briefly describe visual poses in a Virtual Visual space (Image space). This the most important approaches used in IBVS and discuss visual pose is composed of 3D visual position and 3D visual about their properties and limitations. orientation, which is used to obtain a 6 × 6 full-rank square Image Jacobian to map velocities from the Task space to Authors Affiliation: 1 Technische Universitat¨ Munchen,¨ Fakultat¨ fur¨ the 3D Virtual Visual space. Therefore, this work offers a Informatik. 2 Technischen Universitat¨ Munchen,¨ Fakultat¨ fur¨ Elektrotechnik und Informationstechnik. general solution to control the position and orientation of a Email: {caica,somani,knoll}@in.tum.de robot end-effector using uncalibrated visual information with T 2p×1 a new full-rank square image Jacobian (Jimg). end-effector), and a vector s = [u1,v1,...,up,vp] ∈ R , which contains p = k/2 image points. Then, the relation B. Organization betweens ˙ and W˙ b is given by This paper is organized as follows: In Section II we high- light the problems in classical image-based visual servoing s˙ = Jx W˙ b (4) |{z} |{z}|{z} approaches and state the core issues which we tackle in this 2p×1 2p×6 6×1 work. In Section III we introduce a new Camera Model T 2p×6 where Jx = [Lx1,...,Lxp] ·M ∈ R is known as the image and describe how it is used to construct a Virtual Visual 6×6 space. This model is used to define the full-rank 6D Visual Jacobian, M ∈ R is the mapping to transform velocities Jacobian, which will be used in the Section IV to design an expressed in the camera frame to the robot base frame, and adaptive image-based 6D visual servoing. Section V presents Lxi is given by (3). two real-world experiments (as illustrated in Fig. 1) and B. The problems of Classical IBVS shows the results obtained in a dynamic environment. Finally, If we consider ∆W as the input to a robot controller, then Section VI draws the conclusions and presents directions for b we need to compute the inverse mapping ofs ˙ as future work. + ∆Wb = Jx ∆s, (5) 2cameras[UncalibratedStereoSystem]{30ms} Virtual Camera 1 Virtual + Virtual Camera 2 ∗ ∗ J ∈ Distributed Obstacle where ∆ is an error function defined in the space , x System 6×2p 3D R is chosen as the Moore-Penrose pseudoinverse of Jx, Visualization System Visual Stereo System Stereo which leads to the two characteristic problems of the IBVS Tracker Voltage Virtual Camera2 Virtual method: the feature (image) space singularities and local Robot LowLevel Camera1 Control minima. For most IBVS approaches we have 2p > 6. In q,q Robot q,q EF this case, the image Jacobian is singular when rank(Jx) < 6, TCP/IP t orqd Target {4ms} a)GeneralView b)TopView Virtual while the image local minima is defined as the set of image Obstacle  2p×1 locations Ωs = s|∆s 6= 0,W˙b = 0,∀s ∈ R when using Fig. 1. Description of robotic experimental setup. redundant image features. (A local minimum is reached since the velocity of the end-effector is zero while the final end- effector position is far from its desired one). Examples of II.PROBLEM FORMULATION the problems generated by the local minima conditions are A. Classical Image-Based Visual Servoing illustrated in [2] and [3]. Suppose that the robot end-effector is moving with angular C. Contribution of this Work velocity ω = [ω ,ω ,ω ] and translational velocity v = c x y z c In this work, we get a step further towards a general [v ,v ,v ] both w.r.t. the camera frame in a fixed camera x y z solution for the problem of the IBVS, by introducing an system (eye-to-hand configuration). Let P = [x,y,z]T be a c intermediate mapping from the classical image features s point rigidly attached to the end-effector. The velocity of the T to a new visual representation defined as Ws = [Xs,θs] = point Pc, expressed relative to the camera frame, is given by T 6×1 [xs,ys,zs,αs,βs,γs] ∈ R . In this case, Ws is a 6D visual P˙c = ωc × Pc + vc (1) pose vector defined in a 3D Image space (we call this space the Virtual Visual space). This visual pose is measured in where P˙ = [v ,v ,v ,ω ,ω ,ω ]T is the relative velocity of c x y z x y z pixels and it is composed of 3D visual position and 3D visual the point in the camera frame. orientation. As described in [1], we can relate image-plane velocity of a point to the relative velocity of the point with respect to 4 Image Points Algorithm Implementation the camera as ! x $ # 1l & Virtual Stereo # & Orthogonal X = f (X ) b ˙ y1 s V X = J ⋅ X l X , X s img b s = L P System Cl V RC pos ˙ x c (2) # & Camera l s x = # 1r & # & C   RC l θs = Jimg ⋅θb T y l Rs = Rv RC rot in which s = [u,v] is the image feature parameters, and the # 1r & l # & "  % interaction matrix Lx is defined as 2 2 " f f +u #   u uv Algorithm design Ws = Jimg ⋅Wb 0 − − −v  z z f f s = Jx ⋅Wb 6×1 6×6 6×1 Lx = 2 2 (3) 16×1 16×6 6×1 ! x $ ! x $ f v f +v uv ! $ # s & # b & 0 − − u ! $ xb x # & # ys & # yb & z z f f 1l  # & Ws = Ji ⋅ s # & # & # yb & # y & # zs & # zb & 1l # & J # & A new mapping from s to Ws = img ⋅ # zb & # α & # α & where f is the focal length of the camera lens. # x & = J ⋅ s b 1r x #  & # & # & # & αb # β & # β & y # & s b To control a 6 DOF robot, at least three points are # 1r &  # & # & # βb & # γ & # γ & #  & # & " s % " b % " % # γ & necessary (i.e., we require k ≥ 6, where k represents the " b % New Image feature Representation total number of feature measurements). If we have a vector Classical IBVS W = [X , ]T = [x ,y ,z , , , ]T ∈ 6×1 b b θb b b b αb βb γb R , which is the Fig. 2. Algorithm framework: The figure shows the philosophy behind the pose of the end-effector in the robot base frame (in this case algorithm proposed in this work (orange box) and its practical implemen- we choose Euler angles to represent the orientation of the tation (green box). b Virtual Cameras System 2 (a) RCl (b) Ov2 Ov'2 Ob pV2 Virtual � � Cameras Xb [m] Cl RV OV Ocl X()Cl XV

pl Object robot end-effector ef R V' = I Ov1 C OV' l Ov'1 pV1 pr XV =RV X Cl Ocr

O is located at the same � Moving with the end-effector Stereo V position as Ocl , and its Ov' is the same as the System orientation is given by coordinate frame of � Cl RV the robot end-effector.

3×1 Fig. 3. Image Projections: a) The figure depicts the different coordinate frames used to obtain a general 3D visual camera model. Xb ∈ R is the position in meters [m] of an Object with respect to the world coordinate frame (wcf) denoted by O . Rb ∈ SO(3) represents the orientation of wcf w.r.t b Cl Cl the left camera. OV is a reference frame for the virtual orthogonal cameras Ov1,2 where RV ∈ SO(3) is the orientation of frame OCl with respect to OV . 2×1 2×1 The vectors pl , pr ∈ R are the projections of the point Xb in the left and right cameras. Finally, pvi ∈ R represents the projection of the Object in the virtual cameras Ovi . b) Second virtual Camera System, whose reference coordinate frame OV 0 is fixed on the robot end-effector and has the same rotation v0 as the frame Oe f . And its orientation with respect to the first virtual camera system OV is Rv ∈ SO(3).

The visual representation Ws is related with the feature position (meters) and 3D visual position (pixels). The key points vector s as idea of this model is to combine the stereo camera model with a virtual composite camera model to get a full-rank W˙ s = Ji s˙ (6) |{z} |{z}|{z} square image Jacobian to map velocities of a target object 2p×1 6×1 6×2p (X˙b) to velocities of the image features (in our case, pixel 6×1 velocities in the 3D visual space, denoted here by X˙s), see In other words Ws ∈ R can be seen as the decomposition p× Fig. 3. of the vector s ∈ R2 1 in its principal components, in such a way that all the elements of W are independent and As presented in our earlier paper [14], the relationship s ˙ orthogonal to each other. between the pixel velocities (Xs) and the object position ˙ Substituting (4) in to (6) yields a new mapping as velocities (Xb) in the robot base frame can be rewritten as Cl b X˙s = (Jα R R )X˙b = Jimg X˙b (9) W˙ s = (JiJx) W˙ b (7) V Cl pos X˙  X˙  where we define the Jacobian J ∈ 3×3 as the posi- s = J b (8) imgpos R θ˙ img θ˙ tion image Jacobian and J consists of user-defined virtual s 6×1 |{z} b 6×1 α 6×6 camera parameters. Rb ∈ SO(3) represents the orientation of Cl Cl The advantage of this intermediate mapping is that, a full- wcf w.r.t the left camera and RV ∈ SO(3) is the orientation rank square image Jacobian matrix (J ∈ 6×6) can be img R of frame OCl with respect to OV ,see Fig. 3. obtained, which is nonsingular and without local minima. This is the core design of our algorithm, and all that remains B. Image Jacobian for 3D Orientation Jimgrot is to define the form of Jimg. Fig. 2 depicts our algorithm In order to define orientations in the Virtual Visual Space, framework. we need to define 4 different points rigidly attached to the robot end-effector, which can be used to represent the III. 6D IMAGE JACOBIAN orientation of the end-effector in a base frame. The 4 points This section shows how we construct the Virtual Visual expressed in the end-effector frame Oe f , are the origin and space using information generated by the stereo vision sys- the canonical basis of a 3D Euclidean space, which means tem. It also explains in detail how to implement the algorithm  T 6×6 origin point: Xe = [0,0,0] to obtain the full-rank image Jacobian matrix (Jimg ∈ R ).  e f In this work, the key idea of the 3D visual camera model is  x axis: X = [1,0,0]T e1 (10) to combine the stereo camera model with a virtual composite y axis: X = [0,1,0]T  e2 camera model. Fig. 3 (a) denotes our new visual camera  T  z axis: Xe3 = [0,0,1] model and the image projections. 1) Orientation Definition in Virtual Visual Space: In A. Image Jacobian for 3D position Jimgpos order to specify 3 orthogonal vectors in the Virtual Visual

We first define the mapping for 3D positions (Jimgpos ∈ space, which can be used to represent visual orientations, × R3 3),that shows the relationship between 3D Cartesian we define a second virtual orthogonal camera system, see Fig. 3 (b). This virtual coordinate frame OV 0 is fixed to the From (19) we can obtain the visual angular velocity robot end-effector coordinate frame Oe f . All parameters in ω = (RCl Rb )ω . (20) the second virtual camera system are the same as the first s V Cl b one, with the exception of the optical center offsets O1 = = [ , , ]T T Now, let θ α β γ be a vector of Euler angles, which O2 = [0,0] . Then, the mapping from 3D Cartesian position denotes a minimal representation for the orientation of the w.r.t the frame Ov to the 3D image position (check [14] (10)) end-effector frame relative to the robot base frame. Then, the for the new 3D Virtual Visual space Os0 is written as definition of the angular velocity ω is given by [15] x  V0    −yV0 +λ cx ω = T(θ)θ˙. (21) yV0 Xs0 = diag( f β)  + cx (11)  xV0 +λ    z 0 V cy If Re f = Rz,γ Ry,β Rx,α is the Euler angle transformation, then x 0 +λ V   T cos(γ)cos(β) −sin(γ) 0 where X 0 = [x 0 ,y 0 ,z 0 ] is the 3D position w.r.t. O 0 . V V V V V T(θ) = sin(γ)cos(β) cos(γ) 0 (22) Therefore, the 4 points defined in (10) can be represented   0 −sin(β) 0 1 in Os as  T  T Singularities of the matrix T(θ) are called representational X 0 = X = [0,0,0] Xs0 = [cx,cx,cy]  V e f ee f  e f singularities. It can easily be shown that T(θ) is invertible  T  f β T  X 0 = X = [1,0,0]  X 0 = [c + ,c ,c ] V 1 e1 ⇒ s 1 x λ x y provided cos(β) 6= 0. T f β T X 0 = X = [0,1,0] X 0 = [c ,c + ,c ] Substituting (21) into (20) we obtain  V 2 e2  s 2 x x λ y  T  f β T XV 03 = Xe3 = [0,0,1]  X 0 = [cx,cx,cy + ] Cl b s 3 λ T(θs)θ˙s = (R R )T(θb)θ˙b (23) (12) V Cl In the same form, using (12), we define a basis of the 3D Furthermore, this expression can be written as Virtual Visual space (expressed in pixels): Vi = Xs0i − Xs0 e f   Jimgrot α˙s z }| { f β T f β T f β T ˙ ˙ −1 Cl b ˙ ˙ V = [ ,0,0] , V = [0, ,0] ,V = [0,0, ] (13) θs = βs = T(θs) (RV RC )T(θb)θb = Jimgrot θb (24) 1 λ 2 λ 3 λ l γ˙s From (13), it is evident that the rotation of the robot end- 3×3 effector in the new 3D Virtual Visual space is represented where the Jacobian Jimgrot ∈ R is defined as the orientation h i e f V1 V2 V3 image Jacobian. as: R 0 = = I. s kV1k kV2k kV3k In Section III-A, we mapped the 3D positions to the first C. Visual Jacobian virtual orthogonal camera system, which is attached to the In the previous sections, we have defined the mappings stereo vision system. Therefore, the visual orientation also for the 3D visual position and orientation separately as posi- needs to be mapped to the same frame Os. The rotation of tion image Jacobian Jimg and orientation image Jacobian the end-effector with respect to Os can be computed as pos Jimgrot . Combining (9) and (24) we have the full expression e f s0 e f V V 0 s0 V 0 e f R = R R = R R R 0 = R = R (14) s s s0 s V V V V X˙  J 0 X˙  C e f ˙ s imgpos b ˙ = (R l Rb )R (15) Ws = ˙ = ˙ = JimgWb (25) V Cl b θs 0 Jimgrot θb 0 where Rs = I and Rs = I. 6×6 V 0 V where the Jacobian Jimg ∈ R is defined as the Image 2) Orientation Mapping Jimgrot : Given a rotation matrix Jacobian. R , the angular velocity of the rotating frame can be repre- Substituting the robot Differential Kinematics W˙ b = Ja(q)q˙, T sented as S(ω) = RR˙ (25) can be rewritten in the form in which S(ω) is a skew symmetric matrix and ω is the angular velocity. W˙s = JimgJa(q)q˙ = Jsq˙ (26) Therefore, the angular velocity of the end-effector frame 6×6 where Ja(q) ∈ R is the analytical Jacobian matrix of with respect to the robot base frame (ωb) is given by 6×6 the robot manipulator and the Jacobian matrix Js ∈ R is ˙e f e f T S(ωb) = Rb (Rb ) (16) defined as the Visual Jacobian. Then the inverse differential kinematics that relates gener- and using (15)1 the angular velocity of the end-effector frame alized joint velocitiesq ˙ and 6D visual velocities W˙ is given with respect to O (ω ) is given by s s s by ˙e f C ˙e f e f C −1 −1 −1 S( ) = R (Re f )T = (R l Rb )R (R )T (R l Rb )T (17) q˙ = Js W˙ s = Ja(q) Jimg W˙ s (27) ωs s s V Cl b b V Cl = (RCl Rb )S( )(RCl Rb )T (18) Remark 1: Singularity-free J . From (25), we can see V Cl ωb V Cl img Cl b that det(Jimg) = det(Jimgpos )det(Jimgrot ), therefore the set of = S((RV RC )ωb) (19) l singular configurations of Jimg is the union of the set of 1Properties of Skew Symmetric matrices show that: RS(α)RT = S(Rα) position configurations satisfying det(Jimgpos ) = 0 and the set 3 with R ∈ SO(3) and α ∈ R . of orientation configurations satisfying det(Jimgrot ) = 0. From [14], we know that a non-singular Jimgpos can be obtained, and Jimgrot is non-singular provided that T(θ) is invertible.

left Therefore, the singularities of Js in (27) are defined only by the singularities of Ja(q). Hence, to guarantee a non-singular visual mapping, an approach to avoid robot singularities must be implemented. In Section V, we discuss

this issue and propose a solution. right IV. CONTROL LAW (a)TeachingInterface (b) AutomaticExecution In this section, we describe the design of an adaptive image-based dynamic control (second order sliding mode Fig. 4. Snapshot of the 6D visual tracking. control) which includes the robot dynamics model in its passivity proof. The proposed second order sliding mode control is chattering free. provides 2D image features for 4 corner points. 3D position The robot dynamic model and joint velocity nominal and rotation of the robot end-effector with respect to the reference are the same as described in our earlier paper [14]. camera frame are obtained from the image features of these ˙ 6×1 Here the 6D visual nominal reference Wsr ∈ R is used 4 points. This experiment consists of two phases: teaching ˙ 3×1 instead of the Xsr ∈ R as and execution. −1 ˙ q˙r = Js Wsr . (28) 1) Teaching Interface: In this phase, we provide a teach- ing interface for the user, see Fig. 4 (a), where the user Details about the control law are shown in the paper [14]. is holding an AR marker, which is detected by the stereo Remark 2: Convergence of ∆W without local minima. b camera system that provides 2D image features. A red Given J is square and full-rank, from (25) it can be seen img square and a marker ID (displayed in cyan) on the image that ∆W = 0 → ∆W = 0 without local minima. This is s b indicates the detection. The user moves the marker, creating the most important impact of designing a full-rank image some visual trajectories, such as two orthogonal straight Jacobian which, in general, is not obtained with classical lines on the table and two smooth curves on the surface methods. of the Globe. These trajectories include both translation and V. EXPERIMENTS rotation motions. During the movement, the 2D features for Two experiments were performed to validate and evaluate 4 points of the target in the camera frames are recorded and this work on a standard industrial robot. The first experiment saved. At some points, when the marker is lost or can not be focuses on the 6D visual servoing algorithm as an application detected, the last available data is saved, which guarantees of a teaching interface, where the user defines the pose of that the desired pose can be reached and is safe for robot the end-effector using a visual marker and this information execution. is later used to define the desired visual trajectory. The 2) Automatic Execution: After the teaching phase, the second experiment uses the 6D visual servoing algorithm robot can automatically execute the recorded visual trajecto- for real-time tracking of a moving target in a Human-Robot ries. Another AR marker with the same size is attached to interaction scenario. In order to make this a practically useful the robot end-effector to detect the pose of the robot during application, several other features such as singularity avoid- execution. In this case, inputs to visual servoing are the 2D ance, self-collision avoidance and obstacle detection and image features which were recorded in the teaching phase. avoidance are implemented, which ensure safety of the robot From these features we extract our new 6D visual pose vector and human. This experiment also shows how the orientation Ws in the Virtual Visual space, which is used to create the matrix is estimated on line enabling an uncalibrated visual error function, see Section III. servoing system, where occlusions due to camera placement Experimental results are depicted in Fig. 5. The plots in can be handled in a natural and intuitive way by simply the first row (a) show the 3D linear and 3D angular visual moving manually the camera to a better position. tracking in our Virtual Visual space while the second row This system consists of 3 sub-systems: a) the Visual (b) shows the target trajectory tracking in the 3D Task space. Stereo Tracker, b) the System and c) the 3D The red lines in the plots are the target trajectories, which visualization System, see Fig. 1. exhibit some noise and chattering due to the unsteady move- A. Experiment 1: 6D Visual Tracking ment of the user. However, the blue lines which illustrate the trajectories of the robot end-effector, are smooth and This scheme has been implemented on a robotic platform chattering free. This experiment shows that when the errors with six degrees of freedom and an eye-to-hand config- in the Virtual Visual space converge to zero, the errors in the uration. 2D visual (image) features are extracted from a Task space converge to zero without local minima. Therefore stereo vision system with AR markers. We use the ArUco during execution, the AR marker on the robot end-effector library2 (based on OpenCV) to detect markers. Every marker shows identical linear and angular motions as instructed in 2ArUco: http://www.uco.es/investiga/grupos/ava/node/26 the teaching part. (a) Virtual (1) 3D Position (2) 3D Orientation around x axis (3) 3D Orientation around y axis (4) 3D Orientation around z axis Visual Space 30 60 −60 Target: X d α β γ s 20 sd sd sd α 40 β γ 200 End−effector: X s s −80 s s 10 20 150 0 −100 −10 0

Z (pixel) 100 −20 −120 −20 50 200 Euler Angles (degree) −30 Euler Angles (degree) Euler Angles (degree) 300 320 0 −40 −40 −140 340 0 20 40 60 80 100 120 140 0 20 40 60 80 100 120 140 0 20 40 60 80 100 120 140 360 −200 Y (pixel) X (pixel) 380 Time (s) Time (s) Time (s)

(b) Task Space (1) 3D Position (2) 3D Orientation around x axis (3) 3D Orientation around y axis (4) 3D Orientation around z axis −150 40 40 α Target: Xd d β γ −160 d d 1 20 End−effector: X α β 20 γ −170

0.8 −180 0 0 0.6 −190 −20 Z (m) −200 −20 0.4 −40 0 Euler Angles (degree) −210 Euler Angles (degree) Euler Angles (degree) 0.2 −0.5 −0.5 −220 −60 −40 0 0 20 40 60 80 100 120 140 0 20 40 60 80 100 120 140 0 20 40 60 80 100 120 140 0.5 −1 X (m) Y (m) Time (s) Time (s) Time (s)

Fig. 5. Experiment results for 6D visual tracking: (a) 3D position and orientation angles in the Virtual Visual Space, (b) 3D position and orientation angles in the Task Space.

B. Experiment 2: 6D Uncalibrated Image-based Visual Ser- singularity avoidance, where the robot does not reach the voing in a Human-Robot Interaction Scenario singular condition (q3 = 0), even when the user tries to force it. Fig. 7 (d) depicts table avoidance where the motion of the robot is constrained in the z − axis by the height of the 6DCameraModel b table (the end-effector is not allow to go under the table) p p XCl (X V ) X 4Image l , r Stereo Vision Orthogonal s q -1 but it can still move in the x and y axes, and Fig. 7 (e) points Model CameraModel =Js W s b b RCl (R V ) R s shows how the robot handles self-collisions. Fig.7 (f) shows obstacle avoidance while continuing to track the target.

˄ ˄ Environment T ˄ Constraints F J(q) YrΘ ʃ Θ ˄ One of the key contributions of this paper is the possibility qr τec ˄ of handling situations where the target object is occluded, + + - τ τ Sq t - Robot + Kd and the stereo system can be moved to maintain the target in + the field of view. This feature is analyzed in next subsection. q 2) On-line Orientation Matrix Estimation: In order to Fig. 6. 6D adaptive visual servoing including environment constraints. compute the visual Jacobian, in this work a coarse on-line estimation of the orientation matrix is computed using the In this experiment, we integrate the visual servoing system real-time information generated by the robot. As shown in in a scenario where environment constraints such as: robot our previous work [14], SVD (Singular Value Decomposi- singularities avoidance, (self-/obstacle) collision avoidance tion) on two sets of position points defined in each coordinate must be included to generate a safe and singularity-free tra- frame Ob and OCl is used to estimate the orientation matrix. jectory for the robot. We model the environment constraints The estimation errors for the complete Jacobian Js can be as a total force F, which includes singularity avoidance handled in the controller to some extent. force Fr, self-collision avoidance force Fc, obstacle-collision Object occlusion occurs in Fig. 7 (g), where the stereo avoidance force Fo, etc. Fig. 6 shows integration of the system can then be moved physically to maintain targets control with these environment constraints. in the field of view. The camera motion is detected by In this task, we demonstrate real time tracking using AR the system and a process for the coarse estimation of the markers for identifying the target pose and the current pose orientation matrix between the stereo system and the robot of the robot end-effector. The target is carried by a human, base frame is started. The robot performs a small motion and the control goal is to make the robot end-effector follow and a set of points are collected, shown in Fig. 8 (d). We the target placed in the human’s hand. compare the actual end-effector position with the recovered 1) Interaction results: This experiment demonstrates real end-effector position using the estimated rotation. Fig. 8 (a) time tracking of both the 3D visual position and 3D orienta- and (c) depicts these two positions trajectories. It can be tion (Fig. 7 (a)). The system proves to be stable and safe even clearly observed that the error between them is close to zero in situations where the target is lost (due to occlusions by after on-line rotation matrix estimation, as illustrated in Fig. the robot or the human), 7 (b). In this case, the robot pauses 8 (b). and the visual tracking is resumed as soon as the target is visible again. A video where more details for all these To demonstrate stability, we test our system under several experimental results are illustrated can be seen in: environment constraints. Fig. 7 (c) illustrates the results of http://youtu.be/arNFrbJ0Lj4 et uuewr nldstesuyo h fcec and efficiency the of study environ- the real-world includes a work in Future scheme ment. proposed the of feasibility collisions robot of a free generate trajectory singularities. to a and with approach system servoing control useful integrated visual dynamics practically were 6D constraints a our environment create stability with scenario, show to safe experimentally Furthermore, and to control. robot errors the industrial trajectory of real 6D a the on using designed motions adaptive the is in visual defined an control 6D servo Then, track visual Image to minima. mode sliding as local order such second and Servoing singularities Visual space image-based new in pose a problems visual obtain Jacobian we 6D Image pose a image visual square define 6D 4 this we Using from which vector. in virtual servoing with cameras, system visual vision orthogonal stereo uncalibrated for an using pixels) points in (measured the shows orientation. the (b) of rotation, estimation (c) the estimated and for points (a) the selected recovered estimation: the the using illustrates and matrix end-effector frame (d) base orientation errors, the the on-line in for end-effector the the position of of position Results 3D the show 8. Fig. table with Case (d) avoidance, system. singularity stereo the with moves Case manually (c) user lost, the is and target occlusion the Target (g) when avoidance. Case Obstacle (b) (f) tracking, and orientation avoidance and self-collision Position with Case (a) (e) behaviors: avoidance, System collision 7. Fig. −0.5 −0.5 Position(m) 0.5 0.5

Z (m) −1 −1 xeietlrslso ooispafr hwthe show platform robotics a on results Experimental new a proposed we paper, this In 0 0 0

(a) End−effectorPositionandRecoverEF

−0.5 Right Camera Left Camera 10 X (m) I C VI. (c) 3DEnd−effectorPosition 20 0 (a) ita iulspace Visual Virtual 30 Time (s) NLSOSAND ONCLUSIONS 0.5 40 −0.6 −0.4 50 −0.2 hc a vi h well-known the avoid can which , 60 Y (m) RecEF EF RecEF EF RecEF EF 0 y x z Xef Xef z y x 70

rec 0.2 (b)

−0.2 −0.1 ∆ Position(m) 0.1 0.2 0.3 0.4 0.5 0.6 −0.2

Z (m) 0 0 0 1 2

h oto a evaluated was control The .

F 10 X (m) 0 UTURE (d) ThepositionforEstimationRotation (b) ErrorofEnd−effectorPosition 0.2 ita iulspace Visual Virtual 20 0.4 30 W Time (s) 0.6 (c) ORK 40 −0.05 0 50 full-rank 0.05 Xef Xef Y (m) 60 0.1 stereo ∆ ∆ ∆

Z Y X 0.15 70

(d) 1]M .Sog .Hthno,adM Vidyasagar, M. and Hutchinson, S. Spong, W. M. [15] “Uncal- Knoll, A. and Somani, N. Mendoza, D. Dean-Leon, E. Cai, C. “Robust [14] Jagersand, M. and Amir-Massoud, Farahmand, Azad, dynamic S. “Uncalibrated [13] Lipkin, H. and McMurray, G. Piepmeier, J. knowl- without [12] servoing visual “Versatile Asada, M. and Hosoda K. Ma- [11] for Tracking Visual “Dynamic Zhou, D. and Liu, Y.-H. Wang, H. [10] sensors. with tactile technique and this depth-cameras in e.g. fusing approach sensors, by this various applications, implement of world will instead real We tracker more markers. object advanced AR an approaches. the use using servoing to the try to visual also compared will image-based We when other scheme and proposed classic the of robustness 9 .H i,H ag .Wn,adK .Lm Uclbae visual “Uncalibrated Lam, K. K. and Wang, C. Wang, H. Liu, Y.-H. [9] laws control to “Generalizations with Janabi-Sharifi, F. trajectory and Nematollahi camera E. “Optimal [8] Chaumette, F. and Mezouar Y. feed- visual [7] “Model-based Mitchell, O. and Lee, G. basic S. of C. Feddema, “Comparison J. Wilson, [6] W. and Deng, L. visual Janabi-Sharifi, new F. and classical [5] of “Analysis Basic Chaumette, F. I. and control. Marey M. servo “Visual [4] Hutchinson, S. and Chaumette F. in [3] convergence and stability of problems servo “Potential visual Chaumette, on F. tutorial “A [2] Corke, P. and Hager, G. Hutchinson, S. [1] n oto-eodEdition Control-Second and (IROS) Robot Systems for and Servoing Robots Visual Dynamic in Image-based Manipulators,” Stereo 3D ibrated on Conference International 5564–5569. IEEE pp. 2010, 2010 in (ICRA), servoing,” visual Automation uncalibrated for estimation 2004. jacobian Feb. 143–147, pp. 1, no. 20, vol. servoing,” visual Systems and Robots in Jacobian,” Intelligent true of edge Camera,” Fixed Robotics on Uncalibrated an Using nipulators Robotics matrix,” interaction on depth-independent Transactions a using robots of servoing tronics servoing,” visual image-based of 2003. 781–804, pp. 10, no. 22, vol. control.” 1992. image-based Aug. 21–31, system,” pp. robotic 8, coordinated no. hand-eye 25, a vol. for control 2011. back Oct. 967–983, pp. 5, no. 16, vol. methods,” servoing visual Automation and in laws,” control servoing 2006. Dec. 82–90, approaches,” 66–78. pp. in Control servoing,” and Vision visual of position-based and image-based 1996. Oct. 651–670, pp. 5, no. control,” o.3 o ,p.1716 2009. 167–186, pp. 3, no. 3, vol. , EETascin nRbtc n Automation and Robotics on Transactions IEEE o.2,n.3 p 1–1,Jn 2007. June 610–617, pp. 3, no. 23, vol. , (e) EERbtc uoainMagazine Automation Robotics IEEE a 08 p 3244–3249. pp. 2008, May , EETascin nRbtc n Automation and Robotics on Transactions IEEE EERJItrainlCneec nIntelligent on Conference International IEEE/RSJ NI eis o27 pigrVra,1998, Springer-Verlag, 237, No Series, LNCIS . h nentoa ora fRbtc Research Robotics of Journal International The R o.2,n.4 p 0–1,Ag 2006. Aug. 804–817, pp. 4, no. 22, vol. , EFERENCES EEItrainlCneec nRobotics on Conference International IEEE EEAM rnatoso Mechatronics on Transactions IEEE/ASME EERJG nentoa ofrneon Conference International IEEE/RSJ/GI o.2013. Nov. , 2004. , o.1 e.19,p.186–193. pp. 1994, Sep. 1, vol. , (f) nentoa ora fOptomecha- of Journal International o.1,n.4 pp. 4, no. 13, vol. , (g) EETransactions IEEE oo Dynamics Robot h Confluence The ooisand Robotics Computer o.12, vol. , May , IEEE , , , ,