Handbook of Robotics Chapter 22
Total Page:16
File Type:pdf, Size:1020Kb
Handbook of Robotics Chapter 22 - Range Sensors Robert B. Fisher Kurt Konolige School of Informatics Artificial Intelligence Center University of Edinburgh SRI International [email protected] [email protected] June 26, 2008 Contents 22.1 Range sensing basics . ......... 1 22.1.1 Rangeimagesandpointsets . .......... 1 22.1.2 Stereovision .................................... ......... 2 22.1.3 Laser-based Range Sensors . ....... 7 22.1.4 TimeofFlightRangeSensors. ........... 7 22.1.5 Modulation Range Sensors . ........ 8 22.1.6 Triangulation Range Sensors . .......... 9 22.1.7 ExampleSensors ................................... ........ 9 22.2Registration .................................... ............. 10 22.2.1 3D Feature Representations . ......... 10 22.2.2 3DFeatureExtraction. ........... 12 22.2.3 Model Matching and Multiple-View Registration . .......... 13 22.2.4 Maximum Likelihood Registration . ............. 15 22.2.5 Multiple Scan Registration . ........... 15 22.2.6 Relative Pose Estimation . .............. 15 22.2.7 3DApplications ................................... ........ 16 22.3 Navigation and Terrain Classification . ................. 17 22.3.1 Indoor Reconstruction . ........ 17 22.3.2 UrbanNavigation ............................... ........... 18 22.3.3 RoughTerrain ................................... ......... 19 22.4 Conclusions and Further Reading . ........ 20 i CONTENTS 1 Range sensors are devices that capture the 3D struc- ture of the world from the viewpoint of the sensor, usu- ally measuring the depth to the nearest surfaces. These measurements could be at a single point, across a scan- ning plane, or a full image with depth measurements at every point. The benefits of this range data is that a robot can be relatively certain where the real world is, relative to the sensor, thus allowing the robot to more reliably find navigable routes, avoid obstacles, grasp ob- jects, act on industrial parts, etc. This chapter introduces the main representations for range data (point sets, triangulated surfaces, voxels), the main methods for extracting usable features from the range data (planes, lines, triangulated surfaces), the main sensors for acquiring it (Section 22.1 - stereo and laser triangulation and ranging systems), how multiple observations of the scene, e.g. as if from a moving robot, can be registered (Section 22.2) and several indoor and outdoor robot applications where range data greatly sim- plifies the task (Section 22.3). Figure 22.1: Above: Registered infrared reflectance im- age. Below: Range image where closer is darker. 22.1 Range sensing basics In the formulas given here, α and β are calibrated values Here we present: 1) the basic representations used for specific to the sensor. range image data, 2) a brief introduction to the main 3D sensors that are less commonly used in robotics applica- 1. Orthographic: Here (X,Y,Z) = (αi,βj,d(i, j)). tions and 3) a detailed presentation of the more common These images often arise from range sensors that laser-baser range image sensors. scan by translating in the x and y directions. (See Figure 22.2a.) 22.1.1 Range images and point sets 2. Perspective: Here d(i, j) is the distance along the 1 line of sight of the ray through pixel (i, j) to point Range data is a 2 2 D or 3D representation of the scene (x,y,z). Treating the range sensor focus as the ori- around the robot. The 3D aspect arises because we gin (0, 0, 0) and assuming that its optical axis is the are measuring the (X,Y,Z) coordinates of one or more Z axis and (X,Y ) axes are parallel to the image (i, j) points in the scene. Often only a single range image is d i,j axes, then (X,Y,Z) = ( ) (αi, βj, f), used at each time instance. This means that we only √α2i2+β2j2+f 2 observe the front sides of objects - the portion of the where f is the ‘focal length’ of the system. These scene visible from the robot. In other words, we don’t images often arise from sensor equipment that in- have a full 3D observation of all sides of a scene. This is corporates a normal intensity camera. (See Figure 1 22.2b.) the origin of the term 2 2 D. Figure 22.1a shows a sam- ple range image and (b) shows a registered reflectance 3. Cylindrical: Here, d(i, j) is the distance along the image, where each pixel records the level of reflected in- line of sight of the ray through pixel (i, j) to point frared light. (X,Y,Z). In this case, the sensor usually rotates to There are two standard formats for representing range scan in the x direction, and translates to scan in the data. The first is an image d(i, j), which records the y direction. Thus, distance d to the corresponding scene point (X,Y,Z) for (X,Y,Z) = (d(i, j) sin(αi),βj,d(i, j) cos(αi)) is the each image pixel (i, j). There are several common map- usual conversion. (See Figure 22.3c.) pings from (i, j, d(i, j)) to (X,Y,Z), usually arising from the geometry of the range sensor. The most common 4. Spherical: Here, d(i, j) is the distance along the image mappings are illustrated in Figures 22.2 and 22.3. line of sight of the ray through pixel (i, j) to point CONTENTS 2 IMAGE PLANE IMAGE SURFACE +Y +J +Y +J (i,j) (x,y,z) +Z (i,j) d(i,j) (x,y,z) (0,0,0) +Z d(i,j) +I +I +X +X c) a) Orthographic Cylindrical IMAGE PLANE +Y +Y +J d(i,j) (x,y,z) (x,y,z) +Z d(i,j) α β j (i,j) (0,0,0) i +Z +I (0,0,0) f +X +I d) Spherical +X Figure 22.3: Different range image mappings: c) cylin- drical and d) spherical. b) Perspective Figure 22.2: Different range image mappings: a) ortho- graphic, b) perspective. data d(i, j) to (X,Y,Z) the range data is only supplied as a list. Details of the precise mapping and data format are supplied with commercial range sensors. (X,Y,Z). In this case, the sensor usually rotates to scan in the x direction, and, once each x scan, 22.1.2 Stereo vision also rotates in the y direction. Thus (i, j) are the azimuth and elevation of the line of sight. Here It is possible to acquire range information from many dif- (X,Y,Z) = ferent sensors, but only a few have the reliability needed d(i, j)(cos(βj) sin(αi), sin(βj), cos(βj) cos(αi)). for most robotics applications. The more reliable ones, (See Figure 22.3d.) laser based triangulation and LIDAR (laser radar) are discussed in the next section. Some sensors only record distances in a plane, so the Realtime stereo analysis uses two or more input im- scene (x,z) is represented by the linear image d(i) for ages to estimate the distance to points in a scene. The each pixel i. The orthographic, perspective and cylindri- basic concept is triangulation: a scene point and the two cal projection options listed above still apply in simpli- camera points form a triangle, and knowing the baseline fied form. between the two cameras, and the angle formed by the The second format is as a list (Xi,Yi,Zi) of 3D data camera rays, the distance to the object can be deter- points, but this format can be used{ with all} of the map- mined. pings listed above. Given the conversions from image In practice, there are many difficulties in making a CONTENTS 3 stereo imaging system that is useful for robotics applica- tions. Most of these difficulties arise in finding reliable matches for pixels in the two images that correspond to the same point in the scene. A further consideration is that stereo analysis for robotics has a realtime constraint, and the processing power needed for some algorithms can be very high. But, in recent years much progress has been made, and the advantage of stereo imaging is that it can provide full 3D range images, registered with vi- sual information, potentially out to an infinite distance, at high frame rates - something which no other range sensor can match. In this subsection we will review the basic algorithms of stereo analysis, and highlight the problems and po- tential of the method. For simplicity, we use binocular stereo. Figure 22.4: Ideal stereo geometry. The global coordi- Stereo image geometry nate system is centered on the focal point (camera cen- ter) of the left camera. It is a right-handed system, with This subsection gives some more detail of the fundamen- positive Z in front of the camera, and positive X to the tal geometry of stereo, and in particular the relationship right. The camera principal ray pierces the image plane of the images to the 3D world via projection and repro- at Cx, Cy, which is the same in both cameras (a varia- jection. A more in-depth discussion of the geometry, and tion for verged cameras allows Cx to differ between the the rectification process, can be found in [31]. images). The focal length is also the same. The images The input images are rectified, which means that the are lined up, with y = y′ for the coordinates of any scene original images are modified to correspond to ideal pin- point projected into the images. The difference between hole cameras with a particular geometry, illustrated in the x coordinates is called the disparity. The vector be- Figure 22.4. Any 3D point S projects to a point in the tween the focal points is aligned with the X axis. images along a ray through the focal point.