sensors
Concept Paper Obstacle Detection and Safely Navigate the Autonomous Vehicle from Unexpected Obstacles on the Driving Lane
Malik Haris and Jin Hou *
School of Information Science and Technology, Southwest Jiaotong University Xipu campus, West Section, High-tech Zone, Chengdu 611756, Sichuan, China; [email protected] * Correspondence: [email protected]
Received: 5 July 2020; Accepted: 18 August 2020; Published: 21 August 2020
Abstract: Nowadays, autonomous vehicle is an active research area, especially after the emergence of machine vision tasks with deep learning. In such a visual navigation system for autonomous vehicle, the controller captures images and predicts information so that the autonomous vehicle can safely navigate. In this paper, we first introduced small and medium-sized obstacles that were intentionally or unintentionally left on the road, which can pose hazards for both autonomous and human driving situations. Then, we discuss Markov random field (MRF) model by fusing three potentials (gradient potential, curvature prior potential, and depth variance potential) to segment the obstacles and non-obstacles into the hazardous environment. Since the segment of obstacles is done by MRF model, we can predict the information to safely navigate the autonomous vehicle form hazardous environment on the roadway by DNN model. We found that our proposed method can segment the obstacles accuracy from the blended background road and improve the navigation skills of the autonomous vehicle.
Keywords: roadway hazard; Markov random field; autonomous vehicle; deep learning; image processing; self-driving car
1. Introduction Global Status Report on Road Safety is released by World Health Organization (WHO, Geneva 27, Switzerland) 2018, in which WHO claims that about 1.35 million people die each year in road traffic accidents [1,2]. Similarly, American Automobile Association (AAA) Foundation released press report in 2016 that 50,658 vehicle roads accidents occurred only in America from the year 2011 to 2014 due to roadway obstacles. Roadway obstacles were the main factor of vehicle crashes and caused 9850 injuries and 125 deaths annually in the United States [3]. Reports indicate that over 90% of crashes are caused by errors of driver [4]. To improve this situation, governments, municipal departments, and car manufacture companies have considered significant investments to support the development of various technologies such as autonomous vehicles and cognitive robots. About 1 billion euros already have been invested by EU agencies on such type of projects [5]. In 2009, autonomous vehicles were developed and tested in four different states in the United States under the supervision of companies such as Waymo (Google, Mountain View, CA, U.S) and Uber with the support of traditional car manufacturers such as Ford and BMW [6]. Since then, this technology has been evolved and currently it is introduced in 33 different states of the United States with its specific regulations. In addition, Victoria Transport Policy Institute quoted that this technology will be widely used after 2040–2050 [7]. Currently, the most advanced features found in autonomous vehicles are Lane Changing (LC) Control [8,9], Adaptive Cruise Control (ACC) [10], Automatic Emergency
Sensors 2020, 20, 4719; doi:10.3390/s20174719 www.mdpi.com/journal/sensors Sensors 2020, 20, 4719 2 of 22
Braking System (AEBS) [11], Light Detection and Ranging (LIDAR) [12], Street Sign Recognition [13], Vehicle to Vehicle (V2V) Communication [14], Object or Collision Avoidance System (CAS) [15], etc. With the continuous development of the highways, roads, city streets, and expressways, challenging problems are increasing to distinguish because the road environment is complex and constantly developing. It will be influenced by small obstacles or debris, shadows, light, water spots, and other factors. Those objects have been fallen from vehicles, construction sites or are littering. Different types of sensors, active sensors (RADAR or LIDAR) to passive sensors (camera), were used to solve this problem. Active sensors such as RADAR or LIDAR offer high precision in measuring distance and speed from point to point but they often suffer from low resolution and high costs. However, in comparison of passive sensors such as camera, its accuracy is crucial for the timely detection of small obstacles and an appropriate response by safety critical moving platforms. Detecting the small obstacle that is displayed in a very small area of the image with all possible shapes and forms is also very challenging problem. Gradient induced by the edges of obstacles can also be caused by paper or soil dust on the road or due to moisture gradient after a rain, mud, or road marking, which can be potential sources of false positives. Figure 2 describes the phenomena very well. In recent research and development, Convolutional Neural Network (CNN) models are used in autonomous vehicles to navigate safely. For example, during training, the CNN-based end to end driving model maps a relationship between the driving behavior of humans using roadway images collected from stereo camera and the steering wheel angle [16–18], and during testing, the CNN-models predict the steering wheel angle to navigate the autonomous vehicle safely. So that the autonomous vehicle is depended on the training dataset. If the CNN model is not trained on the roadway obstacle than navigation system of autonomous vehicle may generate incorrect information about the steering wheel angle and cause a collision in result. In addition, the number of research studies shows that the autonomous vehicle navigation system may fail to navigate safely due to several reasons, such as Radar sensor failure, camera sensor failure, and software failure [19]. For example, Tesla Model 3 failing to stop for an overturned truck and slamming right into it on highway in Taiwan even ignores the pedestrian with autopilot on. This study addresses how to improve the robustness of obstacle detection method in a complex environment, by integrating a Markov random field (MRF) for obstacles detection, road segmentation, and CNN model to navigate safely [20]. We segment out the obstacle from the image in the framework of MRF by fuses intensity gradient, curvature cues, and variance in disparity. After analyzing the obstacle from the captured image, CNN-model helps to navigate the autonomous vehicle safely. The main contributions of this study are as follows:
Pixel label optimization of images as a small obstacle or hindrance on the road detected by using • an MRF model. Navigating an autonomous vehicle on a roadway from unexpected obstacle. • The remaining part of the research is organized as follows:
Section2—Reviews the relevant works carried out and developed in the past few years. • Section3—Introduces the method for detecting the physical obstacles or hindrances on the road • and predicts the steering wheel angle for AV. Section4—Shows demonstration and simulation. • Section5—Discusses the results and its comparison. • 2. Related Work The probability of occupation map is one of the main directions of work for obstacle detection [21]. It is developed through orthogonal projection of the 3D world onto a plane road surface (assuming that the environment structured of the road surface is almost planar). The plane is discretized in cells to form a grid; therefore, the algorithm predicts the probability of occupation of each cell. We conclude Sensors 2020, 20, 4719 3 of 22 that this method can accurately detect the large obstacle (such as cyclists or cars) by using the stereo vision sensor [22] and can also help to identify the road boundaries by using LIDAR data [23]. However, since the probability function is closely related to the number of measurements in one grid, this method may not be suitable for small obstacle detection by using the stereo vision if the observation is scarce and noisy. Digital Elevation Map (DEM) [24] is one of the algorithms that tries to detect obstacles relying on the fact that they protrude up from a dominant ground surface. The obstacle detection algorithm proposed in [24] marks DEM cells as road or obstacles, using the density of 3D points as a criterion. It also involves fitting a surface model to the road surface. Oniga and Nedevschi [25] presented a random sample consensus (RANSAC)-based algorithm for road surface estimation and density base classifier for obstacle detection by using the DEM constructed on stereo data, whereas in [26], authors used a similar RANSAC-based methodology for curb detection by using the polynomial fitting in stereo DEM. Although, RANSAC-based algorithm is not suitable for detection of small obstacle because the small obstacle and variation of disparity in the image is often similar to the noise levels and a least square fitting may smooth out the small obstacle along with road surface, RANSAC shows an accurate estimate of the vehicles position relative to the road [27]. Fusion of 3D-LIDAR and high-resolution camera seems to be a reasonable approach for robust curb detection. Traditional range visual fusion methods [28,29] use the detection results from the range data to guide the curb searching in the image, which have the advantage of enlarging the detection spot. However, fusion of 3D-LIDAR and visual data search small obstacle in the visual images is more reliable in enhance the robustness of obstacle and edge detection as compared to traditional range visual fusion methods [30]. It recovers a dense depth image of the scene by propagating depth information from sparse range points to the whole high-resolution visual image [31]. J. Tan and J. Li et al. [32] used the following geometric properties to robustly curb detection such as depth image recovery, curb point detection, curb point linking and curb refinement, and parametrization. Scene flow-based approach is also used for obstacle detection [33,34], where each point in constructed 3D cloud is tracked temporally and the flow is analyzed to classify the object in the image such as road surface, obstacle, pedestrian, etc. This approach has limited applicability because it just detected the moving obstacles such as moving vehicle, bicycle, or pedestrians. In addition, the decision-making process is based on the flow of nearby points in 3D cloud, which can be too sparse for small obstacles. For obstacle detection in [35], authors used advanced stereo-based algorithms, combining full 3D data with motion-based approach, and yet it focuses only on large obstacle detection such as vehicles. Hadsell et al. [36] presented the method of vision-based neural network classifier. His idea was to use an online classifier, which is optimized for long-distance prediction and deep stereo analysis to predict the obstacles and to navigate vehicles safely. Ess A. et al. [37] defined the method of combining image-based category detectors with its geometric information received from stereo cameras (such as vehicle or pedestrian detection). However, due to the large differences in its shape, size, and visual look, it will become difficult for vision sensor to train the dataset on small obstacles along the way. Bernini and Bertozzi et al. [38] proposed an overview of several stereo-based generic obstacle detection such as Stixels algorithm [39] and geometric point clusters [40,41]. The Stixels algorithm distinguishes between a global ground surface model and a set of obstacle segments in rectangular vertical, thus, providing a compact and robust representation of the 3D scene. However, geometric relation between 3D point is used to detect and cluster obstacle point. This method is suitable for detecting medium-sized obstacle over close or medium distance. If the distance increases and the size of the obstacle decreases, then position accuracy and obstacle detection become challenging. Zhou J. and J. Li [42] proposed a solution to detect obstacles using a ground plane estimation algorithm based on homography. This work extends [43] to a smaller obstacle by combining several indicators, such as homography score, super-pixel segmentation, and line segment detector in an MRF framework. However, such indexation based on appearance such as line segment detection and pixel Sensors 2020, 20, 4719 4 of 22 segmentation fail when they occurred. Therefore, we directly use the lowest level of details available, i.e., image gradients in both stereo depth images and appearance. Many researchers [44–46] work on the hypothesis of a flat surface assumption, free space, or the ground as a single planar and characterize the obstacles according to their height from the ground. The geometric deviation from the reference plan can be estimated directly from the image data, precalculated point cloud [47], or by extracting from v-disparity histogram model [48]. Instead of relying on the flatness of the road [49], the vertical road profile is modeled as a clothoid [50] and splines [51], which is estimated from the lateral projection of the 3D points. Similarly, free ground profile model was examined using adaptive threshold values in v-disparity domain and multiple filter steps [52]. In addition, we also explore existing datasets that are used in autonomous vehicle community. The dataset is provided by the Udacity (Mountain View, CA, USA) [53], which supports the end-to-end data and image segmentation, but it does not provide the ground truth for the obstacle in the roadway lane. Similarly, KTTI [54] and Cityscape [55] datasets also do not support the obstacle detection as a ground truth data. The dataset which matches our requirements is the Lost and Found dataset [56]. Pinggera P. and Kanzow C. have worked on planar hypothesis testing (PHT) [57,58], fast direct planar hypothesis testing (FPHT), point compatibility (PC) [40,41], Stixels [39,59], and mid-level representation: Cluster-Stixels (CStix) [60–62] for detecting the small obstacle in dataset. In addition, Ramos and Gehrig et al. [63] further extends this work by merging the deep learning with Lost and Found dataset hypothesis results. The people who have worked on Lost and Found dataset [56] did not discuss that how to navigate an autonomous vehicle safely or predict the steering wheel angle for an autonomous vehicle from unexpected obstacle on a roadway. Since the existing dataset is not able to meet our requirements, we have carefully created our dataset using the steering angle sensor (SAS) and ZED stereo device mounted on an electric vehicle as discuss in the experimental setup section. After that, pixel labelling of small obstacles or hindrance on the road are detected by using MRF model and road segmentation model, which help the CNN-model to navigate the autonomous vehicle on roadways from unexpected obstacle by prediction the steering wheel angle.
3. Method In this section, we will describe how to develop a safe autonomous driving system in unexpected and dangerous environments. In this section, we discussed three models.
(a) In the first model, we used various stochastic techniques (such as curvature prepotential, gradient potential, and depth variance potential) to segment obstacles in the image from Markov random field (MRF) frames. These three techniques measure pixel-level images to extract useful information and store it for orientation. In this method, each pixel in the node of interest was distributed in the MRF. Finally, instead of using OR gates, we used AND gates to combine the results of previous techniques. (b) In the second model, semantic segmentation technology was used to segment paths and filter outliers and other important obstacles. (c) Third model was used to predict the steering wheel angle of the autonomous vehicle. We analyzed the unexpected obstacle on the roadway and determined the threat factor (Ttf). This threat factor helped us to ignore that obstacle or consider as accident risk.
The overall pipeline is graphically illustrated in Figure1. The system input consisted of optical encoders used in applications for angle detection as a SAS in vehicles and two stereo images, which were used to calculate depth using the SGBM algorithm (semi-global block matching) proposed by Hirschmuller [64]. The depth variance and color images were used to calculate three different cues, i.e., image gradient, disparity variance, and depth curvature. These three cues were combined to a unary potential in an MRF with an additional pairwise potential. Then, the standard graph cuts were Sensors 2020, 20, 4719 5 of 22 Sensors 2020, 20, x FOR PEER REVIEW 5 of 24 used to obtain the obstacle. In In addition, addition, deep deep lear learning-basedning-based semantic segmentation [65] [65] was used to segment the roads and filterfilter out the abnormal values.
Figure 1. PipelinePipeline of of detection detection of of hazard hazard object and prediction of st steeringeering wheel angles.
3.1. The The Markov Markov Random Random Field Field (MRF) Model During the past decade, various MRF models, inference, and learning methods have been developed toto solvesolve many many low-, low-, medium-, medium-, and and high-level high-level vision vision problems problems [66]. While [66]. inferenceWhile inference concern concernunderlying underlying the image the and image scene and structure scene to structure solve problem to solve such problem as image such restoration, as image reconstruction, restoration, reconstruction,segmentation, edgesegmentation, detection, edge 3D vision, detection, and 3D the vision, object and labeling the object in the labeling image [in67 ].the In image our research [67]. In ourwork, research we elaborate work, we our elaborate query regarding our query small regardin obstacleg small segmentation obstacle segmenta from thetion road from by the defining road by an definingenergy function an energy (such function as cost) (such over as cost) an MRF. over Wean MRF. associate We associate the image the with image random with random process processX with the elements Xs, where s S represents a position of a pixel in the image. Each pixel position in the X with the elements , where s ϵ S represents a position of a pixel in the image. Each pixel position inimage the image is considered is considered as a node as a in node the MRF in the and MRF affi anliatesd affiliates each node each with node the with unary the or unary pairwise or pairwise cost [68]. costMinimum [68]. Minimum energy function energy Efunction is defined E is in defined equation in as:equation as: X X E(X( ))= E ((s))+ E (X , X ) (1) = u + p ( r , s) (1) s (( , ) r,s)N where ( ) represents the unary term and ( , ) represents the pairwise term. = where Eu(s) represents the unary term and Ep(Xr, Xs) represents the pairwise term. X = , ,…, is set of random variables associated with the set of nodes S that takes label X ϵ [0, 1], X , X , ... , Xn is set of random variables associated with the set of nodes S that takes label Xs { 1 2 } which [0, 1], depends which depends on the nature on the natureof appearance, of appearance, either eitherit is road, it is road,texture, texture, or the or small the small obstacle obstacle on the on road. Around each pixel, we calculate the unary term ( ( )) independently. It is the combination the road. Around each pixel, we calculate the unary term ( Eu(s)) independently. It is the combination g ( ) c ( ) of threethree cuescues such such as as gradient gradient potential potentialEu(s) , depth, depth curvature curvature potential potential(Eu(s)) , and depth, and variance depth v variancepotential potential(Eu(s)). The( unary( )). The term unary(Eu( terms)) is defined( ( )) as is defined follows: as follows: ( ) = g( ). c ( ). v ( ) (2) Eu(s) = Eu(s).Eu(s).Eu(s) (2)
3.2. Gradient Potential The potential gradient at the site (i, j) is defined as follows: