<<

Homography Observer Design on Special Linear SL(3) with Application to Optical Flow Estimation Ninad Manerikar, Minh-Duc Hua, Tarek Hamel

To cite this version:

Ninad Manerikar, Minh-Duc Hua, Tarek Hamel. Homography Observer Design on Special Linear Group SL(3) with Application to Optical Flow Estimation. European Control Conference (ECC), Jun 2018, Limassol, Cyprus. ￿hal-01878702￿

HAL Id: hal-01878702 https://hal.archives-ouvertes.fr/hal-01878702 Submitted on 21 Sep 2018

HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Homography Observer Design on Special Linear Group SL(3) with Application to Optical Flow Estimation

Ninad Manerikar, Minh-Duc Hua, Tarek Hamel

Abstract— In this paper we present a novel approach for based on the group structure of the set of homographies, homography observer design on the Special Linear group the Special Linear group SL(3). This nonlinear observer SL(3). This proposed approach aims at computing online, the enhances the overall homography estimate between any two optical flow estimation extracted from continuous homography obtained by capturing video sequences from aerial vehicles given images by using the velocity information in order equipped with a monocular camera. The novelty of the paper to interpolate across a sequence of images. However, this lies in the linearization approach undertaken to linearize a observer still requires the pre-computation of the individual nonlinear observer on SL(3). Experimental results have been image homographies, which are then smoothed using filter- presented to show the performance and robustness of the ing techniques, thus making this approach computationally proposed approach. expensive. Also one of the drawbacks of this approach is that I.INTRODUCTION in order to obtain improved homography estimates, each pair of images needs to have a sufficient amount of data features Originated from computer vision, the homography repre- for the computation of homography. In another work [7] by sents a mapping that relates two views of the same planar some of the authors of the paper, the question of deriving an scene. It encodes the camera’s pose (i.e. position and orienta- observer that directly uses point correspondences from an tion), the distance between the camera and the scene and the image sequence without requiring the prior reconstruction scene’s normal vector in a single matrix. The homography of the individual homographies has been addressed. This has played an important role in various computer vision approach is not only computationally more efficient but also and robotics applications, where the scenarios involve man- works for scenarios where there is lack of data features for made environments that are composed of planar surfaces [1], the reconstruction of the homography matrix. [2]. It is also suitable for applications where the camera is In the present paper, we address the same problem as sufficiently far from the viewed scene, such as the situations [7] but with a different approach that directly exploits the when ground images are taken from an aerial drone. basis of the Lie algebra of the group SL(3) for observer The homography estimation problem has been extensively design. The simplicity of the resulting observer makes it studied in the past [3], [4], [9]. Most existing works consider suitable for real-time implementation. The proposed observer the homography as an incidental variable and tend to solve a is also ideal for the estimation of the so-called “continuous system of algebraic constraints on the frame-by-frame basis homography” and the optical flow by exploiting the homog- [9]. The quality of the homography estimates is strongly raphy estimated from every two consecutive images obtained dependant on the algorithm used as well as the nature of from a combined Camera-IMU (Inertial Measurement ) the image features used for estimation. Image features for system. homography estimation are typically geometric (points, lines, The paper is structured into six sections including the conics, etc.) or texture. For a well textured scene, state- introduction and the concluding sections. Section II gives of-the-art methods can provide high quality homography a brief description of the notation and the math related to estimates at the cost of high computational complexity homography. In section III a linear approach for observer (see [4]). For a scene with poor texture, while significant design on SL(3) is proposed using point correspondences computational effort is still required, the poor quality of and the knowledge of the group velocity. In section IV homography estimates is often an important issue. This is the computation of the optical flow estimate extracted from particularly the case in varying lighting conditions and in continuous homography is presented. Experimental results presence of specular reflections or moving shadows, where supporting the proposed approach are presented in section photogrammetric error criteria used in texture-based methods V. Finally, some concluding remarks and future works are become ill-posed. Feature-based methods of image-to-image provided in section VI. homography estimation are robust to these issues as long as good features and good matching are obtained. They have II.NOTATION AND THEORETICAL BACKGROUND been the mainstay of robust computer vision in the past. A. Mathematic background In a previous work [5], [6] by some authors of this paper, a The special linear group SL(3) and its Lie algebra sl(3) nonlinear observer for homography estimation was proposed are defined as  3×3 N. Manerikar, M.-D. Hua, T. Hamel are with I3S, SL(3) := H ∈ R | det(H) = 1 Universite´ Coteˆ d’Azur, CNRS, Sophia Antipolis, France,  3×3 manerika(hua, hamel)@i3s.unice.fr. sl(3) := U ∈ R | tr(U) = 0 Since the Lie algebra sl(3) is of dimension 8, it can be C. Homography kinematics spanned by 8 generators so that for any ∆ ∈ sl(3) there Let Ω and V denote the rigid body angular velocity and 8 exists a unique vector δ ∈ R such that linear velocity of the current frame {A} w.r.t the reference 8 frame {A˚}, both expressed in {A}. Then the rigid body X ∆ = δiBi (1) kinematics are  R˙ = RΩ i=1 × (2) ξ˙ = RV where the basis of sl(3) are chosen as follows: with [·] denoting the skew-symmetric matrix associated > > > × B1 = e1e2 ,B2 = e2e1 ,B3 = e2e3 with the cross product, i.e. u v = u × v, ∀u, v ∈ 3. > > > × R B4 = e3e2 ,B5 = e3e1 ,B6 = e1e3 Let us consider that the stationary planar scene is viewed > 1 > 1 B7 = e1e1 − I,B8 = e2e2 − I by a camera attached to the rigid body with kinematics (2). 3 3 Let H ∈ SL(3) denote the associated homography matrix. 3×3 with I the identity element of R and {e1, e2, e3} the Then, its kinematics are given by canonical basis of 3. R H˙ = HU (3) B. Homographies where the group velocity U ∈ sl(3) induced by the relative Let A˚ (resp. A) denote projective coordinates for the motion between the camera and the stationary planar scene image plane of a camera A˚(resp. A), and let {A˚} (resp. {A}) satisfies [5] denote its frame of reference. The position of the frame {A} V η> η>V with respect to {A˚} expressed in {A˚} is denoted by ξ ∈ 3. U = Ω + − I (4) R × d 3d The orientation of the frame {A} with respect to {A˚} is represented by a matrix R ∈ SO(3). Let d˚ (resp. The group velocity U given by (4) is often referred to as d) and ˚η ∈ S2 (resp. η ∈ S2) denote the distance from the “Continuous Homography” in the literature [8]. origin of {A˚} (resp. {A}) to the planar scene and the normal III.OBSERVER DESIGN ON SL(3) FORHOMOGRAPHY vector pointing to the scene expressed in {A˚} (resp. {A}). ESTIMATION Hence the coordinates of the same point in the reference frame (P˚ ∈ {A˚}) and in the current frame (P ∈ {A}) can The equation of the proposed homography estimator can be expressed by the relation be expressed as a kinematic filter system on SL(3) as ˙ P˚ = RP + ξ Hˆ = HUˆ + ∆Hˆ (5)

If the intrinsic parameters of the camera are known meaning where the innovation term ∆ ∈ (3) has to be designed in ˜ ˆ −1 that the camera is calibrated one can write1 order to drive the group error H := HH to identity, based on the assumption that we have a collection of n −1 ˚ ∼ −1 ∼ −1 H ˚yj 2 P = K ˚p, P = K p measurements yj = −1 ∈ S (j = 1, . . . , n), with |H ˚yj | ˚y ∈ S2 known and constant. Here y and ˚y represent where K is the camera calibration matrix and ˚p ∈ A˚ (resp. j j j calibrated image points normalized onto the unit sphere and p ∈ A) is the image of the point P˚ (resp. P ) when the can be computed as camera is aligned with frame {A˚} (resp. {A}) and can > −1 −1 be written in the form (u, v, 1) using the homogeneous K pj K ˚pj yj = −1 , ˚yj = −1 coordinate representation for that 2D image point. The image |K pj| |K ˚pj| homography matrix that maps pixel coordinates from the e current frame to the reference frame is given by The output errors j are defined as ˆ ˜  >  Hyj H˚yj ξη −1 ej := = (6) Him = γK R + K ˆ ˜ d |Hyj| |H˚yj| where γ is a scale factor that, without loss of generality, can which thus can be viewed as the estimates of ˚yj. In order to design the innovation term ∆, we first develop be chosen so that Him ∈ SL(3). In this case, γ is equal 1 ˜   3 linear approximations of both the dynamics of H and the to d/d˚ and corresponds to the second singular value of system output errors ei. Taking the time derivative of H˜ (= −1 Him [8]. The Euclidean Homography H ∈ SL(3) is related HHˆ ) and using first order approximation H˜ ≈ I one to Him by the following relation obtains  >  8 8 −1 ξη ˜˙ ˜ P ˜ P H = K HimK = γ R + H = ∆H = δiBiH ≈ δiBi (7) d i=1 i=1 8 1Equalities in are defined up to a multiplicative where the linear representation δ ∈ R of ∆ via the relation constant using the notation =∼. (1) has been used. Let x ∈ R8 denote the linear representation of H˜ . One IV. APPLICATION TO OPTICAL FLOW ESTIMATION then deduces the following approximation Using a moving Camera-IMU (i.e. a combined system 8  8  P P composed of a Camera and an Inertial Measurement Unit) H˜ = exp( xiBi) ≈ I + xiBi (8) i=1 i=1 that observes a stationary planar scene, the previously pro- Using (8) one obtains posed algorithm can be applied to estimate the homography matrix H related to every two consecutive images. If the 8 ˙ X camera frequency is fast enough, then the continuous ho- H˜ ≈ x˙ B (9) i i mography U defined by (4) can be approximately computed i=1 via logarithm operator as From (7) and (9) it is obvious that 1 x˙ ≈ δ U ≈ log(H) (10) T Now we focus on the linearization of the output vectors. with T the camera sample time. Since H is normally close to From (6) and (8) the output errors ej can be expressed as the identity matrix, log(H) can then be approximated using 8 Taylor expansions as follows: P ˚yj + xiBi˚yj 2 3 e ≈ i=1 W W j 8 log(H) = log(I −W ) ≈ log(f H) := −W − − −· · · P 2 3 |˚yj + xiBi˚yj| i=1 with W := I−H. However, such an approximation no longer  8   8  ensures that both log(H) and the resulting U will remain in P > P f ≈ ˚yj + xiBi˚yj 1 − ˚yj xiBi˚yj sl(3). Hence a reprojection on sl(3) is needed i=1 i=1 Neglecting high order terms one gets 1  1  U ≈ log(f H) − tr(log(f H))I 8 8 T 3 P P > ej ≈ ˚yj + xiBi˚yj − xi˚yj˚y Bi j > i=1 i=1 V V η −d˙ 8 Denoting φ := d and φ⊥ := d = d , which P ⇒ ej − ˚yj ≈ xiπ˚yj Bi˚yj = Cjx respectively correspond to the so-called translational optical i=1 flow and its along the normal vector η. Our > 2 with πy := I −yy , ∀y ∈ S , the projection operator on the objective consists in obtaining the estimation of both φ plane orthogonal to y and and φ⊥ from the decomposition of the already computed continuous homography U. C := π B ˚y | · · · | π B ˚y  ∈ 3×8 j ˚yj 1 j ˚yj 8 j R Using the fact that the angular velocity of the Camera- Stacking all n measurements in a vector as follows IMU is measured by the gyrometers, one deduces   e1 − ˚y1 V η> 1 U¯ := = U − γ (U + U >)I − Ω  .  3n 2 × Y :=  .  ∈ R d 2 > > en − ˚yn with γ2(U + U ) the second largest eigenvalue of U + U one obtains [8]. Taking into account the fact that η ∈ S2 one deduces Y ≈ Cx (11) Φ := φφ> = U¯U¯ >   C1 Defining β as the vector of the diagonal elements of Φ as with C :=  .  ∈ 3n×8.  .  R follows   Cn Φ11 From here, the innovation term δ (i.e. ∆) can be directly β = Φ22 designed on the linear approximation system (10) using the Φ33 linear approximation (11) of the output vector Y . In fact, if the matrix C is of rank 8 (it is well known that the one verifies that homography is observable from the measurements of at least ¯ > 3 3 3 U β = η(φ1 + φ2 + φ3) 4 linearly independent points [7]), then the design of the innovation term δ is straightforward. An obvious solution is From here, the estimate of φ is calculated as follows δ = −kC>Y  U¯ >β U¯ if |U¯ >β| >  ˆ ¯ > η with k > 0, resulting in the following stable closed-loop φ = |U β|  ¯ > system, in first order approximations, 0 if |U β| < η > x˙ = δ = −kC Cx with η > 0 being a small given threshold. Finally, the estimate of φ⊥ is straightforwardly obtained by Finally, ∆ ∈ sl(3) is computed from δ ∈ R8 using relation (1). φ⊥ = tr(U¯) • Continuous homography estimation based on two con- secutive images that is further decomposed to obtain the V d˙ estimation of φ(= d ) and φ⊥(= − d ) in real-time. The detection of features and extraction of descriptors from the images has been carried out by using FAST Feature Detector and ORB Descriptor Extractor algorithms that have been already implemented in the OpenCV library. These algorithms are used for the simple reason that they are very time-efficient and thus more suited for real-time implemen- tation. Remark 1 The continuous homography estimate is obtained at 20Hz due to the fact that the VI-sensor only provides images at that frequency. However, our offline tests have shown that the continuous homography and the optical flow Fig. 1. Experimental setup φ and φ⊥ estimates can be obtained at a higher frequency of 50Hz. This has suggested us to develop our proper hardware Camera-IMU using a fast camera in the near future. B. Experimental Results The experiment reported below has been performed online with the VI-sensor camera looking downward to observe a well textured planar horizontal ground (see Fig. 2). A video showing this experiment is provided as a supplementary material and is also available at https://goo.gl/i8zGj2. From Figs. 3 and 4, one can observe that the estimation of the optical flow φ and its perpendicular component φ⊥ obtained from the decomposition of the estimated continu- ous homography are pretty accurate when compared to the corresponding ground-truth data.

Fig. 2. Textured planar horizontal ground (target) used for experiment validations 2 1 1 d

v 0

V. EXPERIMENTAL RESULTS -1 GROUND TRUTH -2 MEASURED A. Experimental Setup 0 10 20 30 40 50 60 70 80 90 t(s) A Visual-Inertial (VI) sensor developed by the company 2 Skybotix and the Autonomous Systems Lab (ETH Zurich) 1 2 d has been used to perform experimental validation (see Fig. v 0 1). This VI-sensor consists of two cameras and two IMU’s -1 -2 (composed of a 3-axis gyrometer and a 3-axis accelerom- 0 10 20 30 40 50 60 70 80 90 eter). However, in order to validate the proposed approach t(s) 2 only one camera and one IMU are sufficient. One of the main 1 reasons for using the VI-sensor is the possibility to obtain 3 d

v 0 perfectly time-synchronized images and IMU measurments -1 (20Hz for camera and 200Hz for IMU). For validation -2 purposes, the ground truth is obtained by using the highly 0 10 20 30 40 50 60 70 80 90 t(s) accurate Optitrack Motion Capture system available at the I3S lab that provides the full pose of the Camera-IMU system Fig. 3. Optical flow components estimated from images (blue curves) and derived from ground truth pose (red curves) versus time (s) at 120Hz. The proposed algorithm has been implemented using C++ on an Intel Core i7 CPU running at 3.40Ghz. A high speed VI.CONCLUSION ethernet cable is used to carry out the transmission of data This paper presents a general analysis on a novel linear from the camera to the PC. The Linux based PC is in charge approach for observer design posed directly on the special of carrying out two principal software tasks: linear group SL(3) and its application for translational • Acquisition of data (images as well as IMU data) by optical flow estimation. In order to support the theoretical interfacing with the camera hardware. approach, the performance of this observer was demonstrated 2 GROUND TRUTH 1.5 MEASURED

1

0.5 ˙ d d −

= 0 ⊥

Φ -0.5

-1

-1.5

-2 0 10 20 30 40 50 60 70 80 90 t(s) −d˙ Fig. 4. φ⊥ = d estimated from images (blue curve) and derived from ground truth pose (red curve) versus time (s) with a set of real world experiments. The experiments showed the good quality of estimates that could be obtained in real time. The proposed approach for the estimation of optical flow can also be used as a cue for haptic force feedback [10]. In the near future, the idea is to perform this experiment with a camera which gives images at a much higher frequency (for e.g. 100Hz) in order to obtain the optical flow estimates at a much faster rate (50Hz).

ACKNOWLEDGMENT This work was supported by the FUI GreenExplorer project and the EQUIPEX Robotex project.

REFERENCES [1] Y. Fang, W. Dixon, D. Dawson, and P. Chawda. Homography-based visual servoing of wheeled mobile robots. IEEE Trans. on Systems, Man, and Cybernetics - Part B, 35: 10411050, 2005 [2] F. Fraundorfer, C. Engels, and D. Nister. Topological mapping, localiza- tion and navigation using image collections. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 38723877, 2007. [3] C. Mei, S. Benhimane, E.Malis and P. Rives. Efficient Homography- Based Tracking and 3-D Reconstruction for Single-Viewpoint Sensors. IEEE Trans. On Robotics, VOL. 24, NO. 6, December 2008 [4] R. Hartley and A. Zisserman. Multiple View Geomerty in Computer Vision, Cambridge University Press, second edition, 2003 [5] R. Mahony, T. Hamel, P. Morin, and E. Malis. Nonlinear comple- mentary filters on the special linear group. International Journal of Control, 85(10): 15571573, 2012. [6] E. Malis, T. Hamel, R. Mahony, and P. Morin. Dynamic estimation of homography transformations on the special linear group for visual servo control. In IEEE International Conference on Robotics and Automation (ICRA), pages 14981503, 2009. [7] T. Hamel, R. Mahony, J. Trumpf, P. Morin, and M.-D. Hua. Homog- raphy estimation on the Special Linear group based on direct point correspondence. In IEEE Conference on Decision and Control (CDC), pages 79027908, 2011 [8] Y. Ma, S. Soatto, J. Kosecka, and S. S. Sastry. An Invitation to 3-D Vision: From Images to Geometric Models, Springer Verlag, 2003. [9] A. Agarwal, C.V. Jawahar, and P.J. Narayanan. A survey of planar homography estimation techniques. Centre for Visual Information Tech- nology, Tech. Rep. IIIT/TR/2005/12, 2005 [10] S. Omari, M.-D. Hua, G. Ducard, and T. Hamel. Bilateral Haptic Teleoperation of VTOL UAVs. In IEEE International Conference on Robotics and Automation (ICRA), pages 23932399, 2013.