International Journal of Pure and Applied Mathematics Volume 119 No. 12 2018, 14501-14507 ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu Special Issue ijpam.eu Real-Time Disparity Map Generation For Stereo vision Based Obstacle Detection System Using SAD Algorithm David Rajkumar Jayakumar M.Tech in Remote Sensing and GIS, Developer - Proprocure London, United Kingdom E-mail:[email protected]

Abstract—Stereo vision sensor captures images to navigate the The closer the object is to the observer, the greater that autonomous ground vehicle by visual perception of the environment. difference in position. Anybody can see this for oneself by Stereo-vision is the ability to infer information about the 3-D holding up a finger in front of his or her face and closing one structure of a scene using two images captured simultaneously by two eye. Line the finger up with any object in the distance. Then different imaging sensors at different viewpoints. Disparity estimation is the process of computing the relative depth of objects switch eyes and watch the finger. within a scene. The disparity map for the two images captured by A. Geometry stereo vision camera is generated to compute the range of the obstacle. On identification of the obstacle the vehicle decides which An artificial stereo vision system uses two cameras at two path to navigate. This paper presents a methodology for real-time known positions. Both cameras take a picture of the scene at multi-resolution disparity map generation for stereo vision based the same time. Using the geometry of the cameras, the obstacle detection system. geometry of the environment can be computed. As in the biological system, the closer the object is to the cameras, the Keywords—stereo vision; disparity; range estimation; digital image processing; sum of absolute differences algorithm; greater its difference in position in the two pictures taken with those cameras. The measure of that distance is called the I. INTRODUCTION disparity. Stereovision system can theoretically yield an accurate and detailed 3D representation of the environment around a vehicle, with a passive sensor at relatively low cost. Disparity map is generated to calculate the range or the depth factor. Capturing a scene(left and right images) from two point of view at the same time, stereo techniques aim at defining conjugate pairs of pixels, one in each image, that correspond to the same spatial point in the scene. The difference between positions of conjugate pixels, called the disparity, yields the depth of the point in the 3D scene. The disparity map is generated using Sum of Absolute Difference(SAD) algorithm.

II. STEREO VISION Fig 1. Geometry of stereo vision

Stereoscopy, stereoscopic imaging or 3D (three- The distance f is the perpendicular distance from each dimensional) imaging is any technique capable of recording focal point to its corresponding image plane. Point P is some three-dimensional visual information or creating the illusion of point in space which appears in the images taken by these depth in an image. The illusion of depth in a photograph, cameras. Point P has coordinates (x, y, z) measured with movie, or other two-dimensional image is created by respect to a reference frame that is fixed to the two cameras presenting a slightly different image to each eye. and whose origin is at the midpoint of the line connecting the is useful in viewing images rendered from large multi- focal points. The projection of point P is shown as Pr in the dimensional data sets such as are produced by experimental right image and Pl in the left image and the coordinates of data[5]. The three-dimensional depth information can be these points are written as (x , y ) and (x , y ) in terms of the reconstructed from two images using a computer by the r r l l image plane coordinate systems shown in the figure. Note that corresponding the pixels in the left and right images. Solving the disparity defined above is x - x . Using simple geometry, the in the field of l r

aims to create meaningful depth information from two images. Artificial stereo vision is based on the same principles as biological stereo vision. A perfect example of stereo vision is the human visual system. Each person has two eyes that see two slightly different views of the observer’s environment. An object seen by the right eye is in a slightly different position in the observer’s field of view than an object seen by the left eye.

14501 International Journal of Pure and Applied Mathematics Special Issue

The distance is inversely proportional to disparity and that C. disparity is directly proportional to the baseline. When Image rectification is a transformation process used to cameras are aligned horizontally, each image shows a project multiple images onto a common image surface. It is horizontal difference, xl - xr, in the location of Pr and Pl, but no used to correct a distorted image into a standard coordinate vertical difference. Each horizontal line in one image has a system. corresponding horizontal line in the other image. These two 1. It is used in computer stereo vision to simplify the matching lines have the same pixels, with a disparity in the problem of finding matching points between location of the pixels. The process of stereo correlation finds images. the matching pixels so that the disparity of each point can be 2. It is used in geographic information systems to known. Note that objects at a great distance will appear to merge images taken from multiple perspectives have no disparity. Since disparity and baseline are into a common map coordinate system. proportional, increasing the baseline will make it possible to Stereo vision uses triangulation based on detect a disparity in objects that are farther away. However, it to determine distance of an object. Between two cameras there is not always advantageous to increase the baseline because is a problem of finding a corresponding point viewed by one objects that are closer will disappear from the view of one or camera in the image of the other camera. (This is called the both cameras. correspondence problem.)[7] In most camera configurations, finding correspondences requires a search in two dimensions. B. Methodologies However, if the two cameras are aligned to have a common The steps for the range estimation involves: image plane, the search is simplified to one dimension - a line that is parallel to the line between the cameras (the baseline). Image rectification is an equivalent (and more often used) alternative to this precise camera alignment. It transforms the images to make the epipolar lines (epipolar geometry) align horizontally. D. Disparity The measure of difference between positions of pixels is called the disparity. There are two types of disparity. They are Horizontal Disparity and Vertical Disparity. If the two cameras are mounted on a vertically aligned frame then there will be horizontal disparity alone and no vertical disparity[6]. If the two cameras are mounted on a horizontally aligned frame then there will be vertical disparity and no horizontal disparity.

The task of developing a stereo vision system presents many issues with both software and hardware. If the system is to be used outdoors, problems with variable lighting and weather are added. A system where the scene in the images is not stationary adds timing issues with respect to image capture. Mounting this system on a robotic platform which traverses a rugged landscape adds vibrations to the system which can sometimes be intense. The stereo vision system must accomplish the following tasks: 1. Capture two images of the scene: This requires two cameras and two camera lenses. This is mostly a hardware issue. Fig 2. Offset of the image 2. Transfer these images to a computer for The figure displays stereo geometry. Two images of the processing: This may be done with capture cards same object are taken from different viewpoints. The distance or some other way of digital data transfer such as between the viewpoints is called the baseline (b). The focal fire wire. This is both hardware and a software length of the lenses is f. The horizontal distance from the issue. image center to the object image is dl for the left image, and dr 3. Process the images for 3D data: This requires for the right image. stereo processing software which may be Normally, the stereo cameras are set up so that their image purchased. planes are embedded within the same plane. Under this Process the 3D data into a traversibility grid: This requires condition, the difference between dl and dr is called the grid computing software which must be written for this task. disparity, and is directly related to the distance r of the object

14502 International Journal of Pure and Applied Mathematics Special Issue

normal to the image plane. The relationship is r = b * f / d, Where, d = dl - dr. E. Range Estimation A pixel in the disparity image represents the range of an object. This range, together with the position of the pixel in the image, determines the 3D position of the object. The coordinate system for the 3D image is taken from the optic Fig 4.Epipolar line center of the left camera of the stereo rig. Z is along the optic C. Sum of Absolute Differences axis, with positive Z in front of the camera. X is along the camera scan lines, positive values to the right when looking Disparity map generation algorithm uses a simple along the Z axis. Y is vertical, perpendicular to the scan lines, technique, SAD. SAD stands for Sum of Absolute with positive values down. Finally, the viewpoint can be Differences[4]. rotated around a point in the image, to allow good assessment C(i, j,d)   ((Al(i  k, j  k)  Br(i  k  d, j  l)) of the 3D quality of the stereo processing[6]. The rotation point k l is selected automatically by finding the point closest to the left Where, camera, near the optic ray of that camera. C(i,j,d) = Disparity Image array F. Al = Left Image array The image coordinates of the mouse are given by the x,y Br = Right Image array. values. The values in square brackets are the pixel values of D. Steps the left and right images. If the right image is displaying stereo The steps involved in disparity dap generation are disparities, then the right value is the disparity value. Finally, 1. Read left and right image. the X,Y,Z values are the real-world coordinates of the image 2. Store the images in a image array. point, in meters. 3. Calculate the absolute differences of each pixel in III. DISPARITY MAP GENERATION the image array using 5x5 neighborhood windows. 4. Find the sum of absolute differences. A. Disparity Map 5. Calculate the sum of absolute differences by Disparity map is an array constructed by calculating the linearly searching along 32 pixels in the right disparity values for each element in the left and the right image for each pixel in the left image. Here we image array. It is constructed by finding pixel-to-pixel search for 32 pixels because we use 32 bit correspondences between the left and the right image array. disparity. For each pixel in the left image, search for the most similar 6. Compare and find the minimum sum of absolute pixel in the right image. The disparity map will have greater difference. accuracy by using neighborhood windows[4]. 7. Plot the minimum sum of absolute difference in a new disparity image array. 8. Write the image array and display the disparity image.

Fig 3. Disparity estimation B. Epipolarity Fig 5. Linear Search Epipolar line is an imaginary line considered across the left and right image to compare the pixels in the image array and E. Multi Resolution to calculate disparity. If the two cameras are horizontally Decomposing the images into a set of spatial frequency aligned and warped, the epipolar line is taken horizontally. If band pass component images is called multi resolution the two cameras are vertically aligned and warped, the analysis[2]. This is achieved by decomposing the image into a epipolar line is taken vertically. set of spatial frequency band pass component images. Individual samples of a component image represent image pattern information that is appropriately localized, while the

14503 International Journal of Pure and Applied Mathematics Special Issue

band passed image as a whole represents information about a achieved by reversing the REDUCE process. This is called as particular fineness of detail or scale. There is evidence that the EXPAND operation. Let Gl,k be the image obtained by human visual system uses such a representation, and multi expanding Gl k times. resolution schemes are becoming increasingly popular in Then Gl,k = EXPAND [G Gl,k-1] or, to be precise, Gl,0 = Gl, machine vision and in image processing in general. and for k>0, The importance of analyzing images at many scales arises from the nature of image themselves. Scenes in the world contain objects of many sizes, and these objects contain features of many sizes. Moreover, objects can be at various Here, only the terms for which (2i+m)/2 and (2j+n)/2 are distances from the viewer. As a result, any analysis procedure integers contribute to the sum. The expand operation doubles that is applied only at a single scale may miss information at the size of the image with each iteration, so that G l,1, is the size other scales. The solution is to carry out analyses at all scales of G , and G is the same size as that of the original image. simultaneously. l,1 l,1 G. Laplacian Pyramids F. Gaussian Pyramids The levels of the band pass pyramid, L0, L1, ...., LN, may The bottom, or zero level of the pyramid, G0, is equal to the now be specified in terms of the low pass pyramid levels as original image. This is low pass- filtered and sub sampled by a follows: factor of two to obtain the next pyramid level, G1. G1 is then Ll = Gl - EXPAND [Gl+1] = Gl - Gl+1 filtered in the same way and sub sampled to obtain G2. Further Gaussian pyramid could have been obtained directly by repetitions of the filter/subsample steps generate the remaining convolving a Gaussian like equivalent weighting function with pyramid levels. To be precise, the levels of the pyramid are the original image and each value of this band pass pyramid obtained iteratively as follows. For 0 < l < N. could be obtained by convolving a difference of two However, it is convenient to refer to this process as a Gaussians with the original image. These functions closely standard REDUCE operation, and simply write resemble the Laplacian operators commonly used in image Gl = REDUCE [G l-1] processing. For this reason to the band pass pyramid is The weighting function w(m,n) is called the "generating referred to as a "Laplacian pyramid”[5]. kernel." For reasons of computational efficiency this should be small and separable. A five-tap filter was used to generate the pyramid[5].

Fig 7.Laplacian pyramid

An important property of the Laplacian pyramid is that it is a complete image representation: the steps used to construct Fig 6. Gaussian Pyramid the pyramid may be reversed to recover the original image

exactly. The top pyramid level, L , is first expanded and Pyramid construction is equivalent to convolving the N added to L to form G then this array is expanded and original image with a set of Gaussian-like weighting functions. N-1 N-1 added to L to recover G , and so on. Alternatively, we Note that the functions double in width with each level. The N-2 N-2 may write G = L . The pyramid has been introduced here as a convolution acts as a low pass filter with the band limit 0 i,j data structure for supporting scaled image analysis. The same reduced correspondingly by one octave with each level. structure is well suited for a variety of other image processing Because of this resemblance to the Gaussian density function tasks. we refer to the pyramid of lowpass images as the "Gaussian pyramid"[5]. Band pass, rather than low pass, images are IV. AUTONOMOUS NAVIGATION required for many purposes. These may be obtained by Range is the relative distance of the obstacle from the subtracting each Gaussian (low pass) pyramid level from the stereo rig. The depth at which the obstacle is located is next lower level in the pyramid. Because these levels differ in calculated using the disparity value, focal length of the camera their sample density it is necessary to interpolate new sample and the base line. On finding the range the autonomous values between those in a given level before that level is navigation of the vehicle is performed based on the instructions subtracted from the next-lower level. Interpolation can be given by the microcontroller.

14504 International Journal of Pure and Applied Mathematics Special Issue

A. Range Estimation

Fig 8. Range estimation

Fig 9. Display of disparity map r = (f*B)/d = (f*B)/(XL-XR) Where, The left image, right image and the range map is displayed r => range in a new window. B => baseline f => focal length d => disparity,(XL-XR) B. Unmanned Ground Vehicles Unmanned ground vehicles (UGV) are robotic platforms that are used as an extension of human capability. This type of robot is generally capable of operating outdoors and over a wide variety of terrain, functioning in place of humans. C. Navigation The navigation of the vehicle is performed as per the instructions given to the microcontroller. The action to be performed is fed into the microcontroller for the various inputs perceived from the environment. The micro controller is interfaced with the system and the stereo cameras for choosing Fig 10. Range Map the navigation path VI. RESULTS V. IMPLEMENTATION A. Disparity Map A. Bitmap Files Disparity map is generated which informs the system about A bitmap is file in which an image is stored in the specific the existence of obstacle. Obstacle identification and format. The images obtained from the camera are stored in the classification is done using disparity map. memory in the form of bitmaps. The bitmap images are processed to produce disparity map. The generated outputs disparity map and the range map are stored in the memory in the bitmap format[1]. B. WPF Application A WPF application is window based project. The source files are created implicitly on creating the workspace. The files are integrated and any modification of code is performed in the functions definition in the class files. It runs as windows based service watching the image files in the workspace location[1]. As soon as the images are captured and saved by the camera in the workspace location the service picks up the files for processing and generates the disparity map. Fig11. Disparity Map B. Multi Resolution Disparity Map Multi resolution disparity map is generated using Gaussian pyramid. The pyramid is constructed. The disparity maps are

14505 International Journal of Pure and Applied Mathematics Special Issue

generated by creating resized image array. The classification 3. The system can be easily integrated with other of obstacle is performed using the disparity maps generated. routines and hardware. The farther images are detected using disparity map of lower 4. Economic when compared with RADAR systems. range and the nearer objects are easily identified using the F. Future Enhancements disparity map of higher range. The generation of multi resolution disparity map minimizes the search range and There are possibilities for much advancement in the future. makes the obstacle detection task simpler. 1. The multi resolution disparity image is fused to get the enhance disparity image. 2. The system can be integrated with GPS tracking device which autonomously drive the vehicle from the source to destination. 3. The set up can be modified to adjust its baseline automatically to identify the closer and longer obstacle. G. Summary Disparity map is generated using Sum of Absolute Differences(SAD) algorithm and the obstacle is identified efficiently. Stereo vision based obstacle detection system is a real time application which could be applied for the

Fig12. Display Gaussian pyramid autonomous navigation of transportation vehicle C. Range Map REFERENCES Range map generated gives the approximate range of the [1] Christian Nagel, Bill Evjen, Jay Glynn, ―Professional C# 2005 with .NET 3.0‖,Wiley India Pvt. Ltd., ISBN: 81- obstacle 265-1332-2. [2] Willam K pratt, Digital Image Processing, John Willey(2001). [3] Image Processing Analysis and Machine Vision-Millman Sonka, Vaclav hlavac, Roger Boyle, Broos/colic, Thompson Learnity(1999). [4] Nadir Nourain, Dawoud,Brahim, Belhaouri samir, josefina janier fast template matching method based optimized sum of absolute difference algorithm for face Fig13. Range Map localization international journal of computer application vol.18 no8 ,March 2011 D. Sample Range Calculation [5] D. Scharstein, H. Hirschmuller, Y. Kitajima, G. Krathwohl, ¨ N. Nesiˇ c, X. Wang, and P. Westling. High- r = ( f * B ) / d resolution stereo ´ datasets with subpixel-accurate ground Assuming d=6, truth. In German Conference on Pattern Recognition, pages 31–42. Springer, 2014. r = ( 887 * 0.176 ) / 6 [6] S. Ryan Fanello, C. Rhemann, V. Tankovich, A. Kowdle, = 26 m S. Orts Escolano, D. Kim, and S. Izadi. Hyperdepth: An obstacle at the range between 5 to 52 m could be Learning depth from structured light without matching. In Proceedings of the IEEE Conference on Computer Vision detected. and Pattern Recognition, pages 5441–5450, 2016. [7] M. Shahbazi, G. Sohn, J. Theau, and P. M ´ enard. E. Advantages Revisiting ´ Intrinsic Curves for Efficient Dense Stereo The advantages of using stereo vision system are Matching. ISPRS Annals of , Remote Sensing and Spatial Information Sciences, pages 123– 1. Highly reliable and effective. 130, June 2016. 2. Stereo cameras are passive sensor.

14506 14507 14508