Self-Supervised Monocular Road Detection in Desert Terrain
Total Page:16
File Type:pdf, Size:1020Kb
Self-supervised Monocular Road Detection in Desert Terrain Hendrik Dahlkamp∗, Adrian Kaehler†, David Stavens∗, Sebastian Thrun∗, and Gary Bradski† ∗Stanford University, Stanford, CA 94305 †Intel Corporation, Santa Clara, CA 95052 Abstract— We present a method for identifying drivable sur- faces in difficult unpaved and offroad terrain conditions as encountered in the DARPA Grand Challenge robot race. Instead of relying on a static, pre-computed road appearance model, this method adjusts its model to changing environments. It achieves robustness by combining sensor information from a laser range finder, a pose estimation system and a color camera. Using the first two modalities, the system first identifies a nearby patch of drivable surface. Computer Vision then takes this patch and uses it to construct appearance models to find drivable surface outward into the far range. This information is put into a drivability map for the vehicle path planner. In addition to evaluating the method’s performance using a scoring framework run on real-world data, the system was entered, and won, the Fig. 1. Image of our vehicle while driving in the 2005 DARPA Grand 2005 DARPA Grand Challenge. Post-race log-file analysis proved Challenge. that without the Computer Vision algorithm, the vehicle would not have driven fast enough to win. I. INTRODUCTION For the DARPA Grand Challenge, however, their system[5] This paper describes a Computer Vision algorithm devel- is not directly applicable: Their road finder operates on grey, oped for our entry to the 2005 DARPA Grand Challenge. Our not color images. The desert terrain relevant for the Grand vehicle (shown in Fig. 1) uses laser range finders as sensors Challenge has very subtle color tones hidden in grey images. for obstacle avoidance. Their range, however, is very limited Furthermore, success of dirt-road finding depends on a manual due to high levels of environmental light, a shallow incident initialization step. Finally, the system cannot handle arbitrarily angle with the ground and possible dirt on the sensors. Since shaped surfaces, even the dirt-road localization needs a road a vehicle needs to be able to stop within this distance, using complying to model assumptions. lasers limits the maximum speed at which a vehicle can travel Both the ARGO project of the Universita` di Parma, Italy[6], while maintaining the ability to avoid obstacles. [7], as well as Carnegie Mellon University (CMU)’s NavLab Therefore, we are proposing a camera system as an ad- project[8] are only concerned with on-road driving. In addition ditional long-range sensor. Our goal is to detect drivable to using tracking of lane markings, CMU also deals with surface in desert terrain, at long range. In the existing body inner-city driving involving cluttered scenes and collision of literature, the primary concern has been the identification avoidance[9]. of the kind of paved roads observed in daily situations such Another system applicable to inner-city driving belongs to as highways, country roads and urban roads. Such roads are Fraunhofer IITB in Germany. Their use of computer vision typically characterized by their asphalt or concrete color, as for road finding includes edge extraction and fitting those well as by standardized markings in or at the edges of the to edges to a detailed lane- and intersection model for pose roads, and are hence much easier to identify than undevel- estimation[10]. oped dirt-roads occurring in the desert area where the Grand The DARPA Learning Applied to Ground Robotics (LAGR) Challenge took place. Consequently, most published papers project has also created interest in vision-based terrain are not directly applicable to the problem: classification[11]. Due to the relatively slow speed of the Universitat¨ der Bundeswehr Munchen¨ (UBM) has one of LAGR vehicle, however, sensor lookahead distance in the the most mature autonomous driving systems, in development LAGR project is not as crucial as in the DARPA Grand since 1985[1]. Their approach involves modeling roads as Challenge. Because LAGR sensors are limited to stereo cam- clothoids and estimating their parameters using an Extended eras and a contact sensor, combination of different sensor Kalman Filter (EKF) on visual observations and odome- modalities is not possible. try. More recent work extends that framework to dirt road Outside of robotics, a surprisingly similar application is detection[2], waypoint driving[3], and obstacle detection while skin-color detection. Previous work, e.g.[12] also uses a driving off-road[4]. gaussian-mixture model to deal with varying illumination (though not desert sun) and varying skin tone between indi- viduals, comparable to road material changes. There, however, the option of fusing data with a different sensor modality does not exist. Finally, previous work associated with our group[13] pro- vides very good classification in the desired terrain types, but needs examples of non-drivable terrain as training data, which our robot cannot easily provide automatically. To overcome the limitations of these published algorithms, and achieve the level of robustness and accuracy required for the DARPA Grand Challenge, we have developed a novel algorithm whose main feature is the adaptation to different Fig. 2. Real-time generated map of the vehicle vicinity. terrain types while driving, based on a self-supervised learn- The left image shows close-range map as generated by the laser scanners. Red cells are occupied by obstacles, white cells are drivable and grey cells ing algorithm. This self-supervised learning is achieved by are (as yet) unknown. The black quadrangle is fit into the known empty area combining the camera image with a laser range finder as a and shipped to the computer vision algorithm as training data for drivable second sensor. Specifically, the laser is used to scan for flat, road surface. The right image shows a long-range map as generated by the computer-vision drivable surface area in the near vicinity of the vehicle. Once algorithm described in this paper. It can be seen that the road detection range identified, this area is assumed to be road and used as training is in this case about 70m, as opposed to only 22m using lasers. data for the computer vision algorithm. The vision algorithm then classifies the entire field of view of the camera, and extracts a drivability map with a range of up to 70m. The Using the 6 DOF position estimate for the vehicle and combined sensors are integrated into a driving strategy, which the position of the camera with respect to the vehicle, we allows for very robust, long range sensing. This is ultimately project the quadrangle into the visual plane of the camera. It proven by winning the 2005 DARPA Grand Challenge. is now safe to assume that this area in the image contains Section II of this paper explains the different steps of this only drivable surface or road, because lasers just identified it algorithm in detail, Section III provides numerical results to be flat and because GPS data indicates that we are inside that lead to parameter selections for the race and Section IV the Grand Challenge driving corridor. presents results achieved by running the algorithm in the 2005 We will hence forth refer to this quadrangle area as the DARPA Grand Challenge. ”training” area: based on this area we will build our models of what the road ”looks like”. II. ALGORITHM DESCRIPTION The overall method consists of seven major parts. These are: B. Removing sky and shadows (A) Extracting close range road location from sensors invariant We must deal with cast shadows that are dark enough that to lightning conditions, (B) Removing sky and shadow areas anything in them is hidden. In our present case, we opt to from the visual field, (C) Learning a visual model of the nearby simply remove shadowed terrain. This can cause occasional road, (D) Scoring the visual field by that model, (E) Selecting problems, in particular when a long shadow falls across a road, identified road patches, (F) Constructing a sky-view drivability but for our present purpose the option of eliminating shadow map, and (G) exploiting that map for a driving strategy. We pixels from the model provides acceptable results. will address each of these separately. We define a shadow pixel as any pixel whose brightness falls below a minimum threshold, and whose blue content is greater A. Extracting close range road location from laser sensors than its red or green content (measured as 8 bits in each of the Scanning laser range finders mounted rigidly on a moving three channels). The sky is not problematic for the algorithm, vehicle are used to sweep out a map of the terrain in front but measurement of sky pixels against the learned models of the vehicle one line at a time. Referencing the laser scans (see below) is computationally wasteful. We implement the with an exact 6 degree of freedom (DOF) pose estimate, the horizon finding algorithm originally proposed by Ettinger et accumulation of such lines form a point cloud corresponding to al.[15] to eliminate all pixels above that horizon. We also use the locations of individual points on the surface of the ground, a flood fill to find pixels below the horizon which are the same or obstacles ahead. Using this point cloud, a 2-dimensional color as the sky above, and also eliminate these. Finally, we drivability map is established, dividing the area into drivable found that a dilation of the sky thus defined removes pixels on and non-drivable map cells. Our approach, described in [14], distant horizon objects, saving compute time at no loss to the achieves this by looking for height differences within and algorithm otherwise.