An Examination of Feature Detection for Real-Time Visual Odometry in Untextured Natural Terrain

An Examination of Feature Detection for Real-time Visual Odometry in Untextured Natural Terrain Kyohei Otsu1, Masatsugu Otsuki2, Genya Ishigami2, and Takashi Kubota2 1 The University of Tokyo, 7-3-1 Hongo, Bunkyo, Tokyo, Japan, [email protected] 2 ISAS/JAXA, 3-1-1 Yoshinodai, Chuo, Sagamihara, Kanagawa, Japan Abstract. Estimating the position of a robot is an essential requirement for autonomous mobile robots. Visual Odometry is a promising localization method in slippery natural terrain, which drastically degrades the accuracy of Wheel Odometry, while relying neither on other infrastructure nor any prior knowledge. Visual Odometry, however, suffers from the instability of feature extraction from the untextured natural terrain. To date, a number of feature detectors have been proposed for stable feature detection. This paper compares commonly used detectors in terms of robustness, localization accuracy and computational efficiency, and points out their trade-off problems among those criteria. To solve the problem, a hybrid algorithm is proposed which dynamically switches between multiple detectors according to the texture of terrain. Validity of the algorithm is proved by the simulation using dataset at volcanic areas in Japan. Keywords: Visual odometry, Outdoor environment, Feature detection 1 Introduction Exploring ultimate environments such as planetary surfaces and deep sea is a challenging but beneficial task for human beings. Several missions have at- tacked such environments, e.g., the Mars Science Laboratory (MSL) program by NASA3 . Due to the severe conditions of the environments, autonomous mobile robots are regarded as an effective method for such missions. One of the essential techniques of autonomous mobile robots is localizing themselves. Especially in such environments, robots have to estimate their position without any external infrastructure (such as GPS satellites) nor prior knowledge about the location. To date, numerous localization methods for mobile ground vehicles have been proposed and implemented in mobile ground vehicles. The most popular methods are Wheel Odometry (WO) and Inertial Measurement Unit (IMU) or the combination of them. These methods offer high resolution with low cost sensors. Even so, these approaches have several challenges: WO is vulnerable to 3 http://mars.jpl.nasa.gov/msl/ wheel slips, and inertial sensors are prone to drift. These shortages can be cru- cial when exploring the environment containing loose terrain and steep slopes. Doppler sensors are used as velocity sensors insensitive to wheel slips, while they are applicable only for fast-moving robots. Active ranging sensors, such as ultra- sonic sensors and Laser Range Finders (LRF), are also typically used to localize robots. These sensors measure the distance from the robot to objects, and the robot estimates its current position from the physical relationship with the objects. However, these active sensors have a drawback to consume much electric power, so they are not feasible under the energy-limited environment. Recently, the focus is on another powerful sensor, vision sensor, which is less energy-consuming but provides rich information about the environment. A technique to estimate motion by using visual input is called Visual Odometry (VO). It is regarded as a promising localization method with the help of the rapid im- provement of computational resources in recent years. The basic principle of VO is an iteration of estimating the camera relative pose by finding the feature point correspondences between images, which is a key technique of two-view Structure from Motion (SfM) problem. VO is immune to wheel slips and also more stable for the drift error since it can cancel the drift by vision approaches (e.g., Bun- dle Adjustment [1] or loop-closing technique of Simultaneous Localization and Mapping (SLAM) problem [2]). VO can also be easily installed into the system since visual sensors are recently mounted on most robots due to the variety of their possible usage. SLAM is, on the other hand, another powerful localization method actively researched in the robotic field. Radio sensors and vision sensors tend to be used for the input to SLAM algorithms. The method is very useful for the robot navigation since it creates a map as well as localizing the robot. However, the algorithm is complex and requires much computational resources, which may bring difficulty in installing into low-performance onboard computers. From the viewpoint, VO focuses on calculating a robot trajectory, which requires far less computational power and promotes easy installation onto a system. VO is becoming more and more popular due to these advantages. Still, it has several challenges evolved from properties of vision: 1. Stability: feature point tracking should be robust to the terrain appearance. VO becomes stable if every pair of images exhibits adequate correspondence of features. 2. Accuracy: the robot should be accurately localized even if the algorithm uses error-prone images. The accuracy can be improved by using statistical methods. 3. Computational efficiency: most onboard computers equipped on mobile robots are not computationally powerful. To execute real-time VO the algorithm efficiency is an issue to be concerned. Generally speaking, these criteria depend on the appearance of ground and cannot be fully estimated beforehand. In addition, these are trade-off in many cases. Several implementations of real-time VO have been presented (e.g., [3{5]). Regardless of these successful results, VO in an outdoor environment has a cru- cial problem on detecting feature points from the untextured terrain. VO assumes that the terrain exhibits rich texture so that the feature points are easily tracked. However, in contrast to indoor environments, particular outdoor scenery makes the point tracking difficult. In fact, the VO localization in the Mars Exploration Rover (MER) mission by NASA/JPL revealed that the rovers found many areas which have little visual features on the ground surfaces in the real ultimate environment [6]. In order to address the challenge of VO in untextured terrain, roughly two approaches have been proposed. One simple but effective approach is to use a proper feature detection algorithm. A number of detectors have been proposed to detect points with intended properties. The detail of common detectors will be discussed in Sect. 2. Since these detectors focus on the different characteristics of the image, the proper detector for certain scenery depends on the terrain appearance and the intended properties. The other approach is to divide images into several blocks and find the most characteristic point in each region [7, 8]. This method enables feature detection even in feature-less terrain. However, this approach has several shortages, e.g., forcing extraction from extremely low textured region causes low matching rate because of using too weak characteristic points. The proposed method adopts the former approach, i.e., using the effective feature detector. The rest of the paper is organized as follows: in Sect. 2, the commonly used detectors are discussed and evaluated by using dataset at volcanic fields. In Sect. 3, a hybrid algorithm of several detectors is introduced, which is designed to overcome shortages of the common detectors. In Sect. 4, the compar- ative study of the common and proposed detectors is presented. Finally, Sect. 5 concludes the paper. 2 Conventional Feature Detectors Detector Description The main focus of this paper is to detect stable features from the smooth terrain which shows difficulty in tracking features. A lot of feature detectors have been proposed. Generally speaking, these can be divided into two groups: Corner detectors These detectors find corners in given images, since corners tend to exhibit invariance to the change of the view. This group involves the methods such as Harris [9], Shi-Tomasi [10], and FAST [11, 12] detectors. Scale-space feature detectors These detectors can obtain scale invariant features. This characteristic benefits VO as it is robust to scale changes and enables longer tracking. However, these invariance may degrades the computational efficiency to some extent. SIFT [13], SURF [14], and STAR based on CenSurE [15] can be classified into this group. The algorithms mentioned above are implemented in OpenCV Library [16] and widely used in various applications including VO. Typically, corner detectors such as Harris and FAST are used in VO, since they are high-speed and Fig. 1. The average number of corre- Fig. 2. The percentage of correct spondences between features matches in the all extracted features Table 1. Average runtime per frame (320x240 grayscale images on Intel Core 2 Quad 2667MHz CPU) Detectors Harris Shi-Tomasi FAST SIFT SURF STAR Ave. runtime [ms] 11.87 14.36 1.32 54.98 27.90 9.86 accurately located. Yet, these corners are sometimes difficult to find in untextured natural terrain. The quantity of features can be improved by changing parameters such as threshold, but it can be prone to increase noise and outliers. SIFT and SURF are also used if scale change is a big concern. These algorithms require large computational time, and lose pixel-level accuracy in exchange for scale invariance. Agrawal et al. [15] proposed a novel detector called CenSurE, that is scale invariant but has a better computational property, as a feature detector in their real-time VO implementation [4]. STAR detector is implemented based on CenSurE. Performance test A performance test for these detectors is conducted by using the dataset including more than 900 stereo image pairs of volcanic areas (See examples in Fig. 3). Statistical results are shown in Fig. 1 and 2. The accurate VO localization requires a certain number of persistent feature correspondences in order to compensate errors statistically. More than 20-30 correct matches are typically regarded to be enough for estimation. Harris corner detector shows better performance than the others. However, Harris and similar Shi-Tomasi detectors cannot present high matching rate in matching process, which can affects the matching efficiency and accuracy.

Load more