Self-supervised Monocular Road Detection in Desert Terrain

Hendrik Dahlkamp∗, Adrian Kaehler†, David Stavens∗, ∗, and Gary Bradski† ∗, Stanford, CA 94305 †Intel Corporation, Santa Clara, CA 95052

Abstract— We present a method for identifying drivable sur- faces in difficult unpaved and offroad terrain conditions as encountered in the DARPA Grand Challenge robot race. Instead of relying on a static, pre-computed road appearance model, this method adjusts its model to changing environments. It achieves robustness by combining sensor information from a laser range finder, a pose estimation system and a color camera. Using the first two modalities, the system first identifies a nearby patch of drivable surface. Computer Vision then takes this patch and uses it to construct appearance models to find drivable surface outward into the far range. This information is put into a drivability map for the vehicle path planner. In addition to evaluating the method’s performance using a scoring framework run on real-world data, the system was entered, and won, the Fig. 1. Image of our vehicle while driving in the 2005 DARPA Grand 2005 DARPA Grand Challenge. Post-race log-file analysis proved Challenge. that without the Computer Vision algorithm, the vehicle would not have driven fast enough to win.

I.INTRODUCTION For the DARPA Grand Challenge, however, their system[5] This paper describes a Computer Vision algorithm devel- is not directly applicable: Their road finder operates on grey, oped for our entry to the 2005 DARPA Grand Challenge. Our not color images. The desert terrain relevant for the Grand vehicle (shown in Fig. 1) uses laser range finders as sensors Challenge has very subtle color tones hidden in grey images. for obstacle avoidance. Their range, however, is very limited Furthermore, success of dirt-road finding depends on a manual due to high levels of environmental light, a shallow incident initialization step. Finally, the system cannot handle arbitrarily angle with the ground and possible dirt on the sensors. Since shaped surfaces, even the dirt-road localization needs a road a vehicle needs to be able to stop within this distance, using complying to model assumptions. lasers limits the maximum speed at which a vehicle can travel Both the ARGO project of the Universita` di Parma, Italy[6], while maintaining the ability to avoid obstacles. [7], as well as Carnegie Mellon University (CMU)’s NavLab Therefore, we are proposing a camera system as an ad- project[8] are only concerned with on-road driving. In addition ditional long-range sensor. Our goal is to detect drivable to using tracking of lane markings, CMU also deals with surface in desert terrain, at long range. In the existing body inner-city driving involving cluttered scenes and collision of literature, the primary concern has been the identification avoidance[9]. of the kind of paved roads observed in daily situations such Another system applicable to inner-city driving belongs to as highways, country roads and urban roads. Such roads are Fraunhofer IITB in Germany. Their use of computer vision typically characterized by their asphalt or concrete color, as for road finding includes edge extraction and fitting those well as by standardized markings in or at the edges of the to edges to a detailed lane- and intersection model for pose roads, and are hence much easier to identify than undevel- estimation[10]. oped dirt-roads occurring in the desert area where the Grand The DARPA Learning Applied to Ground (LAGR) Challenge took place. Consequently, most published papers project has also created interest in vision-based terrain are not directly applicable to the problem: classification[11]. Due to the relatively slow speed of the Universitat¨ der Bundeswehr Munchen¨ (UBM) has one of LAGR vehicle, however, sensor lookahead distance in the the most mature autonomous driving systems, in development LAGR project is not as crucial as in the DARPA Grand since 1985[1]. Their approach involves modeling roads as Challenge. Because LAGR sensors are limited to stereo cam- clothoids and estimating their parameters using an Extended eras and a contact sensor, combination of different sensor Kalman Filter (EKF) on visual observations and odome- modalities is not possible. try. More recent work extends that framework to dirt road Outside of robotics, a surprisingly similar application is detection[2], waypoint driving[3], and obstacle detection while skin-color detection. Previous work, e.g.[12] also uses a driving off-road[4]. gaussian-mixture model to deal with varying illumination (though not desert sun) and varying skin tone between indi- viduals, comparable to road material changes. There, however, the option of fusing data with a different sensor modality does not exist. Finally, previous work associated with our group[13] pro- vides very good classification in the desired terrain types, but needs examples of non-drivable terrain as training data, which our robot cannot easily provide automatically. To overcome the limitations of these published algorithms, and achieve the level of robustness and accuracy required for the DARPA Grand Challenge, we have developed a novel algorithm whose main feature is the adaptation to different Fig. 2. Real-time generated map of the vehicle vicinity. terrain types while driving, based on a self-supervised learn- The left image shows close-range map as generated by the laser scanners. Red cells are occupied by obstacles, white cells are drivable and grey cells ing algorithm. This self-supervised learning is achieved by are (as yet) unknown. The black quadrangle is fit into the known empty area combining the camera image with a laser range finder as a and shipped to the computer vision algorithm as training data for drivable second sensor. Specifically, the laser is used to scan for flat, road surface. The right image shows a long-range map as generated by the computer-vision drivable surface area in the near vicinity of the vehicle. Once algorithm described in this paper. It can be seen that the road detection range identified, this area is assumed to be road and used as training is in this case about 70m, as opposed to only 22m using lasers. data for the computer vision algorithm. The vision algorithm then classifies the entire field of view of the camera, and extracts a drivability map with a range of up to 70m. The Using the 6 DOF position estimate for the vehicle and combined sensors are integrated into a driving strategy, which the position of the camera with respect to the vehicle, we allows for very robust, long range sensing. This is ultimately project the quadrangle into the visual plane of the camera. It proven by winning the 2005 DARPA Grand Challenge. is now safe to assume that this area in the image contains Section II of this paper explains the different steps of this only drivable surface or road, because lasers just identified it algorithm in detail, Section III provides numerical results to be flat and because GPS data indicates that we are inside that lead to parameter selections for the race and Section IV the Grand Challenge driving corridor. presents results achieved by running the algorithm in the 2005 We will hence forth refer to this quadrangle area as the DARPA Grand Challenge. ”training” area: based on this area we will build our models of what the road ”looks like”. II.ALGORITHMDESCRIPTION The overall method consists of seven major parts. These are: B. Removing sky and shadows (A) Extracting close range road location from sensors invariant We must deal with cast shadows that are dark enough that to lightning conditions, (B) Removing sky and shadow areas anything in them is hidden. In our present case, we opt to from the visual field, (C) Learning a visual model of the nearby simply remove shadowed terrain. This can cause occasional road, (D) Scoring the visual field by that model, (E) Selecting problems, in particular when a long shadow falls across a road, identified road patches, (F) Constructing a sky-view drivability but for our present purpose the option of eliminating shadow map, and (G) exploiting that map for a driving strategy. We pixels from the model provides acceptable results. will address each of these separately. We define a shadow pixel as any pixel whose brightness falls below a minimum threshold, and whose blue content is greater A. Extracting close range road location from laser sensors than its red or green content (measured as 8 bits in each of the Scanning laser range finders mounted rigidly on a moving three channels). The sky is not problematic for the algorithm, vehicle are used to sweep out a map of the terrain in front but measurement of sky pixels against the learned models of the vehicle one line at a time. Referencing the laser scans (see below) is computationally wasteful. We implement the with an exact 6 degree of freedom (DOF) pose estimate, the horizon finding algorithm originally proposed by Ettinger et accumulation of such lines form a point cloud corresponding to al.[15] to eliminate all pixels above that horizon. We also use the locations of individual points on the surface of the ground, a flood fill to find pixels below the horizon which are the same or obstacles ahead. Using this point cloud, a 2-dimensional color as the sky above, and also eliminate these. Finally, we drivability map is established, dividing the area into drivable found that a dilation of the sky thus defined removes pixels on and non-drivable map cells. Our approach, described in [14], distant horizon objects, saving compute time at no loss to the achieves this by looking for height differences within and algorithm otherwise. The right part of Fig. 4 shows a sample across map cells while modeling point uncertainties in a tem- sky-ground segmentation. poral Markov chain. Once such a map is created, a quadrangle is fit to the largest drivable region in front of the vehicle. This C. Learning a visual model of the nearby road quadrangle, usually shaped like a trapezoid, is shown in black Once we have the training area, and have removed any on the left side of Figure 2. shadows from that training area, the next task is to abstract the data into a model which we subsequently use to score pixels outside of the training area. Our basic model of the road appearance is a mixture of Gaussians (MOG)-model in RGB space, with k Gaussians to be found. Our method is to classify all pixels in the training area using k-means clustering[16], and then to model each cluster by its average value, its covariance matrix, and its mass (where the mass is defined as the total number of pixels in the cluster). These abstractions are the training models. In addition to the k training models, the method provides for ’n’ learned color Fig. 3. Input raw image (left) and output data of the computer vision part of the algorithm. The right image duplicates the left, overlaying the training models (where n > k). Initially, all of these learned models data from the laser map (blue polygon), and the pixels identified as drivable are null. Each of the training models is compared to the learned (red area). The scene is the same as in Fig. 2. models at any given frame of the incoming video stream. If the training models overlap with a learned model according to the following relation:

T −1 (µL − µT ) (ΣL + ΣT ) (µL − µT ) ≤ 1 (1)

(where µi and Σi are the mean and covariances of the two models respectively) the training model is interpreted as new data for that learned model, and the learned model is updated. The model is updated according to the formula Fig. 4. Intermediate processing steps: The left image shows the similarity of each pixel to our road model (before thresholding). The right image shows µL ← (mLµL + mT µT )/(mL + mT ) (2) the border between ground and sky, as found by the algorithm.

ΣL ← (mLΣL + mT ΣT )/(mL + mT ) (3) m ← m + m , (4) L L T the mean of the learned model Gaussian which minimizes this where mi is the mass of the associated model. We acknowl- distance, and whose mass is above some threshold which saves edge there are statistically more sound ways of integrating compute time by not testing individual pixels against models these. If the training model matches no existing learned model, with very low mass. We set this threshold to 30% (of the then one of two things happens: if there are still null models mass of the most massive model) and found this to provide one of them is replaced by the training model. If no null a good balance between efficiency and result quality. By models remain, then the learned model with the lowest mass using a Mahalanobis distance, we handle roads with different is eliminated and replaced with the training model. In this color and texture variation by adapting the thresholds to what way, new road types which are encountered generate new constitutes drivable terrain. In this way, every point in the models until the space for storing models is saturated. Once image (excluding sky and shadows which we have already this space is saturated, new models can still be learned, but removed from consideration) is assigned a “roadness” score. known models which are not associated with a large mass The left part of Figure 4 shows the output of this processing (i.e. a large amount of experience of that model) will be lost step. to make room for the new model. E. Selecting identified patches The setup of distinguishing k gaussians describing the current training area from n gaussians describing training By thresholding away image points further away than 3σ, history allows the system to tightly fit its model to the training we get a binary image indicating the drivability of each pixel. while keeping a richer history of the past road appearance. In Most roads, however, still have individual outlier pixels not addition, the matching process allows for dynamic adaptation classified as drivable. These pixels usually correspond to small to situations of both slow-varying and quickly changing road leaves, stones, and other items not considered obstacles. To appearance. In the first case of smooth trail driving, the avoid being influenced by these impurities, we run a dilate training model is matched to a learned one and updates it filter followed by two erode filters on the image. This removes slowly by merging data. In the second case of a sudden road small non-drivable areas while preserving bigger obstacles and surface change, the system reacts instantly by replacing an old even encircling them with a non-drivable corona. Furthermore, learned gaussian with a new one from the training area. we only accept pixels that have a connected path to the location of the training polygon. This gets rid of unwanted clutter D. Scoring the visual field by the road model made up from distant mountains or other small clutter that At this point, it is possible to score all pixels in the image just happens to have the same properties as the main road. according to the learned models. Individual pixels are assigned The resulting, final classification is visible as red pixels in the a score which is the Mahalanobis distance from that point to right part of Figure 3. F. Constructing a sky-view drivability map The next step is to re-project this 2D drivability image into a sky-view map similar to the laser occupancy map which was used to extract the training area. The major obstacle here is the absence of depth information from the image. As an approximation, we reverse the projective transform by assuming the road is flat but sloped by a constant degree upwards or downwards. This assumption is valid for almost all roads encountered in the Grand Challenge, since it allows for inclines and declines, as long as the slope does not change much. The places where this assumption is invalid usually lie in highly irregular mountainous terrain. In the mountains, the vehicle is slow anyways, so computer vision as a long-range sensor is not required for safe driving. The road slope is accurately estimated by a low-pass filter on vehicle pitch obtained from the pose estimation system. This distinction between vehicle and road pitch allows driving Fig. 5. These 4 images were acquired within a second during the Grand inclines and declines while maintaining the ability to factor Challenge National Qualifying Event (NQE) and show the process of adaption to different terrain types. The algorithm initially learns to identify grey asphalt out fast camera movement, as caused by road ruts and general as road (top left), but as the training area contains more and more green grass, bumpy off-road driving. the algorithm switches to accept this grass (bottom right). Given 6D pose information for the camera, projection of the camera image into a vision map is straightforward. As a result, the map created by this method (right part of Fig. 2) reaches much further than the one created using only the lasers (left chose a practical approach: Since development and testing of part of Fig. 2). our vehicle was performed on last year’s Grand Challenge course, we decided to train on the corridor data associated G. Incorporating the drivability map into a driving strategy with that race. To comply with California law, DARPA had While the vision map is remarkably accurate in most driving to provide participating teams with a route file that describes situations, it only has two states, namely drivable and unknown the allowed driving corridor and extends no wider than the cells. Simply because a part of the image does not look like actual roads width (at least for the Californian segment of the training set does not imply that it is obstacle. The laser the race). To make the race feasible for truck-based entrants, map on the other hand is tri-state and distinguishes between DARPA had to also make sure that this corridor is not obstacles and unknown areas. much smaller than the road. Therefore, we already have a For this reason, our driving strategy performs path-planning ground-truth labeling. Due to Stanley’s pose estimation error only in the laser map and instead uses the vision map as following the course, this labeling is of course not always part of the speed selection strategy: By default, the vehicle entirely accurate, but in good, open sky sections 20cm-accurate has a maximum velocity of 11.2m/s (25mph). Only if the localization is obtained and in worst-case conditions about vision map contains only drivable cells among the next 40m 2m error. Fortunately, the error is independent of algorithmic of the planned path, the maximum velocity is set to 15.6m/s choices therefore not biased towards certain parameters. So (35mph). Therefore, computer vision acts as a pre-warning comparisons of algorithms using this ground-truth are valid system. If our vehicle is driving towards an obstacle at high while the absolute performance numbers may be slightly speed, it is slowed down as soon as vision fails to see clear inaccurate. Furthermore, the DARPA labeling is specified in road out to 40m. Once the obstacle comes within laser range, the sky-view map instead of the image, which allows scoring the vehicle has already slowed down and has no problem in the final drivability map produced by the algorithm and not avoiding the obstacle using the laser range finders without in the image, which is a mere intermediate product. Based exceeding lateral acceleration constraints. Though not needed on the corridor, our scoring function counts the number of for the race, avoidance of stationary, car-sized obstacles in drivable cells inside the corridor that are correctly identified the middle of the path was demonstrated at up to 17.9m/s as road (true positives, TP) and the number of cells outside the (40mph), on slippery gravel surface. corridor that are wrongly classified as drivable (false positives, FP). While not all cells outside the corridor are non-drivable, III.PARAMETERSELECTION they are usually not road and hence desirable to be not marked A. Scoring classification results so. To make our scoring even more meaningful, we only scored In order to compare different algorithms and optimize pa- cells between 25m and 50m in front of the robot, as that is rameters, we need a scoring method that enables us to evaluate the target range for vision-based sensing. The closer range is the efficiency of a particular approach. Instead of tedious, already covered by laser range finders and looking out farther hand-labeling of images as ground-truth for drivability, we is not necessary at the speeds driven in the Grand Challenge. 64 46 90 90 k=1 25 25 k=1 20 20 62 k=2 17 25 25 15 17 20 44 20 80 80 17 15 17 60 k=2 k=3 10 15 15 5 10 70 10 58 42 k=3 70 10 2 5 56 k=4 5 n=1,k=1 60 40 5 n=1,k=1 1 2 % true positives 54 k=4 % true positives % true positives n=10,k=3 % true positives 60 n=10,k=3 k=5 1 50 2 2 52 k=5 38 1 1 50 50 40 0.1 0.12 0.14 0.16 0.09 0.1 0.11 0.12 0.13 0.14 0 1 2 3 4 0.2 0.4 0.6 0.8 % false positives % false positives % false positives % false positives

Fig. 6. The impact of k on tracking performance. The black line shows Fig. 7. The impact of φ on tracking performance. Increasing the parameter averages over different values of n ∈ {6, 7, 8, 10, 15}, which are plotted in φ while keeping other parameters constant increases the overall number of clusters colored by value of k ∈ {1 ... 5} (φ = 1.0 is constant). Left and cells identified as road. Left and right plot are for the same datasets as in right plots are for two subsets of the Grand Challenge 05 data, each 1000 Figure 6. image frames in length. As expected, an increasing k leads to a tighter fit of the road model to the training data, which results in a reduction of both false and true positives. problem with very uniformly colored road surfaces. Since we do not incorporate textures in our algorithm, our camera was In order to compare TP and FP rates, we chose to plot Receiver set to not enhance edges, and consequentially color variation Operating Characteristic (ROC) curves. in some training areas can be very low - less than one brightness value standard deviation. Even the slightest color B. Further testing results variation further out (for example caused by CCD blooming During our 2-month testing phase before the race, the vision in desert light, or just lens imperfections) would cause the system was confronted with a number of different obstacle road in this situation to be called non-drivable. Paradoxically, types including parked cars, black and white trashcans, and a this meant that uniform colored road was detected with less PVC tank trap model - all of which we were able to detect. certainty than textured road. To solve this problem and provide During reliability testing, Stanley drove 200 autonomous miles accurate classification in both road situations, we introduce an on a 2.2-mile loop course with two white trashcans per loop, explicit minimum noise term φ I3x3 added to the measured placed on light grey gravel in a 35mph speed zone. Stanley training data covariance. By modifying this noise term, we managed to avoid 180 out of the 181 obstacles and only can tune the ROC curves for true positives vs. false negatives sideswiped the remaining one. The main limitation of the and balance between constant and environment-adaptive color system is it’s color-focused design and thus the inability to thresholding. Figure 7 displays ROC curves for different detect objects with little color difference to the underlying values of these thresholds. road. A brown object on a brown dirt road without significant IV. GRAND CHALLENGE RESULTS shadow is undetectable. Our main result consists in successfully completing and C. Optimal number of Gaussians winning the 2005 DARPA Grand Challenge. Figure 8 shows a A major design choice in our algorithm is how many significant overtaking maneuver during the race, and Figure 10 Gaussians to select in k-means clustering (k). The model shows some other scenes that Stanley encountered during the storage capacity n is less important, as long as n > k. As race. n is increased, the number of frames where more than a few of the n models have a mass above the threshold is very small, A. Real-time performance so that the effect of increasing n is negligible. For our application it is crucial that the algorithm runs in Choosing a small value of k has the danger of not being real-time and reacts predictably to unrehearsed environmental able to model multi-colored roads (like pavement with lane conditions. Since onboard power comes from the stock alter- markers), leading to a loose fit that includes colors between nator that has to supply numerous other consumers as well, the ones in the training set. Choosing a large value on the other the computing power allocated for computer vision was that hand can lead to loss of generality and overfitting the training of just one Pentium M 1.6 Ghz Blade computer, comparable patch to an extent where too little actual road is found. to a low-end laptop. Furthermore, this computing power is Figure 6 provides ROC curves for different values of these also used for receiving images from the ethernet camera, parameters. The mentioned tradeoff between small and large for calculating image statistics to adjust camera parameters, k is clearly visible. for video compression, logging on the hard disk, and for distributing the map to the path planning algorithm. Therefore, D. Balancing road and obstacle finding only half of that computing power is available for the actual The other obvious place to tune TP vs. FP rate is by setting computer vision part of the algorithm. To achieve performance the threshold for the maximum mahalanobis distance between of about 12fps and to keep the algorithm simple and hence pixels and the MOG is accepted in section 2.4. Testing the more robust, we made the following parameter choices on race algorithm on different kinds of road, however, unearthed a day: 40

30

20

average mph 10

0 0 20 40 60 80 100 120 Grand Challenge Progress [miles]

Fig. 9. Average speed (50 second window) of our vehicle during the 2005 Grand Challenge. All velocities above 11.2m/s (25mph) would not have been achieved without help of the computer vision algorithm. Please note that other factors (for example speed limits and road roughness) would also slow down the vehicle. The top speed is only achieved when all sensors indicate perfect driving conditions.

our finishing time by capping vehicle velocity at a safe laser speed of 25mph, we obtain a new finishing time of more than 7 hours and 5 minutes. The improvement seems unimpressive, but it has to be considered that the entire route had DARPA Fig. 8. Overtaking maneuver: At mile 101.5, we overtook CMU’s H1ghlander speed limits attached. In fact, post-race analysis concluded that vehicle. The processed image in the upper right part shows how the obstacle 67.7% of the time, the DARPA speed limit or a lateral accel- is clearly identified by the algorithm. The lower part shows the generated eration constraint limited Stanley’s maximum speed. 17.6% of map, where H1ghlander is visible at 50m range. the time, road roughness estimated by [17] limited the speed and only 14.0% of the time, the algorithm described in this article was responsible for not driving faster. Furthermore, • (n, k) = (1, 1): Although results indicate that values of turning vision off would have guaranteed Stanley a second k = 2 − 3 and n = 5 − 10 perform slightly better, place behind CMU’s Sandstorm vehicle which finished in the algorithm is more predictable for simple parameters. 7:04:50 hours. Furthermore, the runtime of the pixel evaluation depends linearly on n, so smaller values increase frame rate. If V. CONCLUSIONS the race was held a second time, or if more knowledge We have presented a novel method for locating roads in about the terrain types encountered had been available difficult, uneven, and inconsistent terrain. The context of the beforehand, we would run/have run it with values (3, 10). DARPA Grand Challenge involved driving 132 miles in 7 • Image resolution was 320 × 240. hours, starting before dawn and continuing through the diverse • Shadow and sky removal was disabled. We chose to illumination conditions of the Nevada desert. By relying on mount the camera and select the camera zoom such that short range sensors to train the vision system we were able the overwhelming proportion of the visible scene would to continuously learn and adapt to new conditions throughout not contain sky. Thus the computation time required this extended period of time. Because our method relies on to locate the horizon was not warranted by the small fast, well developed algorithms, the overall method shows high reduction in pixels requiring analysis. performance in real time even on modest computer hardware. Our code made use of the Open Source Computer Vision This combination of performance and robustness played a Library (OpenCV)[18] maintained by Intel. OpenCV is a decisive role in the race victory. comprehensive collection of C code optimized computer vision While the system as-is was developed with the application algorithms under a BSD license. We also utilized Intel Inte- of the DARPA Grand Challenge in mind, the underlying grated Performance Primitives[19] software library containing approaches can be generalized to help solve other real-world assembly level optimized routines that OpenCV makes use of. problems and finally bring Computer Vision into production systems: Systems are much more robust if they integrate B. Impact on the race outcome multiple sensor modalities in a way that takes the advantages The completion time of our robot, measured by DARPA and disadvantages of each sensor into account. Vision should officials, was 6:53:58 hours, resulting in 1st place. Without be treated as an add-on to a system, and as such it can play a the computer vision module, our top speed would have been role that extends capabilities significantly. 11.2m/s (25mph) instead of 15.6m/s (35mph). Figure 9 shows At the current development level, vision may still have the average speed during the 132.2 mile race. Simulating failure modes if the conditions don’t allow for it to operate ACKNOWLEDGMENT The authors would like to thank the members of the team, as well as the sponsors for making the project possible.

REFERENCES [1] E. D. Dickmanns and A. Zapp, Guiding Land Vehicles Along Roadways by Computer Vision. Proc. AFCET Congres Automatique, Toulouse, France, October 23-25 1985. [2] M. Lutzeler¨ and S. Baten, Road Recognition for a Tracked Vehicle. Proc. SPIE Conf. on Enhanced and Synthetic Vision, AeroSense 2000, Orlando, FL, USA, April 24-28 2000. [3] R. Gregor, M. Lutzeler,¨ and E. D. Dickmanns, EMS-Vision: Combining on- and off-road driving. Proc. SPIE Conf. on Unmanned Ground Vehicle Technology III, AeroSense 2001, Orlando, FL, USA, April 16-17 2001. [4] S. Baten, M. Lutzeler,¨ E. D. Dickmanns, R. Mandelbaum, and P. Burt, Techniques for Autonomous, Off-Road Navigation. IEEE Intelligent Sys- tems, Vol. 13(6): 57-65, 1998. [5] R. Gregor, M. Lutzeler,¨ M. Pellkofer, K.-H. Siedersberger, and E. D. Dick- manns, A vision system for Autonomous Ground Vehicles With a Wide Range of Maneuvering Capabilities, Computer Vision Systems, Second International Workshop, ICVS 2001 Vancouver, Canada, July 7-8 2001. [6] A. Broggi, M. Bertozzi, and A. Fascioli, ARGO and the MilleMiglia in Automatico Tour, IEEE Intelligent Systems, Vol. 14(1): 55-65, 1999. [7] M. Bertozzi, A. Broggi, A. Fascioli, and S. Nichele, Stereo vision-based vehicle detection, Proc. IEEE Intelligent Vehicles Symposium, Dearborn, MI, USA, October 3-5 2000. [8] P. H. Batavia, D. A. Pomerleau, and C. E. Thorpe, Predicting Lane Posi- tion for Roadway Departure Prevention, Proc. IEEE Intelligent Vehicles Symposium, Stuttgart, Germany, October 28-30 1998. [9] R. Aufrere,` J. Gowdy, C. Mertz, C. Thorpe, C. Wang, and T. Yata, Per- ception for collision avoidance and autonomous driving, Mechatronics, Vol. 13(10): 1149-1161, 2003. [10] F. Heimes and H.-H. Nagel, Towards Active Machine-Vision-Based Driver Assistance for Urban Areas, International Journal of Computer Vision, Vol. 50(1): 5-34, 2002. [11] D. Lieb, A. Lookingbill, S. Thrun: Adaptive Road Following using Self- Supervised Learning and Reverse Optical Flow, Proc. Robotics Science and Systems, Cambridge, MA, USA, June 8-11 2005. [12] H. Greenspan, J. Goldberger, and I. Eshet: Mixture model for face-color modeling and segmentation, Pattern Recognition Letters Vol. 22(14): Fig. 10. Scenes of the 2005 Grand Challenge: The first row shows a media 1525-1536 (2001). watch point, with the raw camera image on the left and the classification [13] B. Davies and R. Lienhart, Using CART to Segment Road Images, result on the right. The next rows show classification results during traversal Proc. SPIE Multimedia Content Analysis, Management, and Retrieval, of the mountainous Beer Bottle Pass. The blue training area varies significantly San Jose, CA, USA, Jan 15-19 2006. because the rugged nature of the environment causes Stanley to shake more, [14] S. Thrun, M. Montemerlo, and A. Aron: Probabilistic Terrain Analysis which in turn sometimes leaves holes in the laser map. The module that fits an For High-Speed Desert Driving. Proc. Robotics Science and Systems, area-maximized training quadrangle into this map is conservative and tries not Philadelphia, PA, USA, August 16-19 2006. to cover these holes. A training area that varies from frame to frame however [15] S. Ettinger, M. Nechyba, P. Ifju, and M. Waszak: Vision-Guided Flight does not harm segmentation performance, since the algorithm is executed at Stability and Control for Micro Air Vehicles, Advanced Robotics, Vol. high frame rate and learns a road model that incorporates history. 17: 617-640, 2003. [16] R. Duda and P. Hart: Pattern Classification and Scene Analysis, John Wiley and Sonds, New York, NY, 1973. [17] D. Stavens and S. Thrun: A Self-Supervised Terrain Roughness Estimator confidently. However, systems can be designed such that they for Off-Road Autonomous Driving, In Proc. Conference on Uncertainty gracefully fall back to base functionality provided by other in AI (UAI), Cambridge, MA, USA, July 13-16 2006. [18] http://www.intel.com/technology/computing/opencv/index.htm modalities. In our case, Stanley simply slowed down if vision [19] http://www.intel.com/cd/software/products/asmo-na/eng/perflib/ipp/ was unable to prove road drivability out to a safe distance.