A Review on Comparative Study of Different Algorithms for Object Detection ABHISHEK DAS1 ,KESHAV KISHORE2 ,NISHA GAUTAM3 AP Goyal Shimla University
Total Page:16
File Type:pdf, Size:1020Kb
IJRECE VOL. 4 ISSUE 4 OCT.-DEC. 2016 ISSN: 2393-9028 (PRINT) | ISSN: 2348-2281 (ONLINE) A Review on Comparative Study of Different Algorithms for Object Detection ABHISHEK DAS1 ,KESHAV KISHORE2 ,NISHA GAUTAM3 AP Goyal Shimla University Abstract- Object detection and tracking are playing an A. Feature-based object detection important role in many computer vision and pattern recognition applications such as video classification, vehicle navigation, surveillance and autonomous robot routing.It is a challenging field in computer visualization and pattern analysis research area. There are many techniques which have been proposed and developed. In this paper we present different approaches of detecting objects using different methods. The objective of this paper is to focus on the main techniques and algorithms on object detection. Keywords- Object Detection, Computer vision, Algorithms, Fig 1: Detecting a reference object (left) in a cluttered scene (right) using SLAM, SIFT, SURF, HOG, DPM, Occlusion. feature extraction and matching. RANSAC is used to estimate the location of the object in the test image. I. INTRODUCTION Object tracking and detection is a classical research area in B. Viola-Jones object detection the field of computer vision from decades. Numerous kinds Fig 2:Face detection (left) and stop sign detection (right) using the Viola- of applications are dependent on the area of object detection, Jones Object Detector. such as advance driving assistance system, traffic surveillance, scene understanding, autonomous navigation etc.[1]The core challenge and the basic step in tracking is to accurately detect the object in different environments, but due to complex backgrounds, weather conditions, cast shadows and occlusions it becomes difficult to track an object. In this paper we will be focusing on different object detection algorithms. Many computer vision algorithms suffer due to the presence of occluded objects in a scene. The region, which is occluded though, depends on the camera viewpoint. In some scenarios angle of the camera can define which part is occluded and which one is not, C. SVM classification with histograms of oriented hence minimization approach, temporal selection, graph cut gradients (HOG) features method and sum of squared distance are followed for handling the same problem of occlusion.[2] Object detection is the process of finding instances of real-world objects such as faces, bicycles, and buildings in images or videos. Object detection algorithms typically use extracted features and learning algorithms to recognize instances of an object category. It is commonly used in applications such as image retrieval, security, surveillance, and automated vehicle parking systems.[3] You can detect objects using a variety of models, including Fig 3: Human detection using pertained SVM with HOG features. INTERNATIONAL JOURNAL OF RESEARCH IN ELECTRONICS AND COMPUTER ENGINEERING A UNIT OF I2OR 95 | P a g e IJRECE VOL. 4 ISSUE 4 OCT.-DEC. 2016 ISSN: 2393-9028 (PRINT) | ISSN: 2348-2281 (ONLINE) D. Image segmentation and blob analysis data about the surroundings of the robot. Extended Kalman filter, a traditional approach is also quite often used for the estimation in robotics. While in SLAM during robot navigation in an environment the robot localize itself using maps. It is basically concerned with a problem of building a map of an unknown environment by a mobile robot while at the same time navigating the environment using the map. SLAM consists of multiple parts; Landmark extraction, data association, state estimation, state update and landmark update. The SLAM is based on Extended Kalman filter which utilizes the a priori map of the locations. The objective of the SLAM problem is to estimate the position and orientation of the robot together with the locations of all the features. SLAM is frequently used in Autonomous underwater vehicles, unmanned aerial vehicles and Fig 4: Moving cars are detected using blob analysis. autonomous ground vehicles. Other methods for detecting objects with computer vision B. SIFT – Scale Invariant Feature Transform[5] include using gradient-based, derivative-based, and template SIFT, in computer vision is used to detect and describe local matching approaches.[3] features in images. In any particular image for object detection, interesting points can be figured out to describe a II LITERATURE REVIEW set of features. The particular description taken out from a training image in terms of numbers can be further used on a During literature review we came across a wide range of testing image to identify a particular object. For the best techniques and algorithms but here we will be focusing on recognition results the features should be detected even in the impactful algorithms given in the below image. noisy and illuminated scenes or images. SIFT works on the principle of Euclidean distance of the feature vectors, firstly the key points of SIFT are extracted from a reference image SLAM and further stored in a database, after this an object is recognized in a new image by comparing the features of SIFT both the images. Scale space filtering is used to detect the larger corners with larger windows. In SIFT, Difference of Gaussian is used which is an approximation of LOG. The SURF initial step of SIFT is to create internal representations of the original image to ensure scale invariance which is done HOG by generating a scale space. The Laplacian of Gaussian (LOG) is great for finding interesting points which is the nucleus of this algorithm. At last with scaling and rotation Object detection Object DPM invariance it’s easy to identify the last set of unique features. Fig 5:Important and reviewed techniques / algorithms C. SURF – Speeded Up Robust Features[6] Speeded up robust features (SURF), is inspired by Scale A. SLAM – Simultaneous Localizationand Mapping[4] Invariant feature transform (SIFT), it is a local feature detector and descriptor that can be used for tasks such as SLAM can be implemented in many ways. First of all there object recognition or 3D reconstruction. The basic version is a huge amount of different hardware that can be used. of SURF is several times faster than SIFT and is much more Secondly SLAM is more like a concept than a single robust. The algorithm works on interest point detection, algorithm. There are many steps involved in SLAM and local neighbourhood description and matching. SURF uses these different steps can be implemented using a number of wavelet responses in horizontal and vertical direction for a different algorithms. SLAM consists of multiple parts; neighbourhood pixels. The surf algorithm is based on two Landmark extraction, data association, state estimation, state basic steps feature detection and description. The detection update and landmark update. There are many ways to solve of features in SURF relies on a scale-space representation, each of the smaller parts. It is a process by a mobile device combined with first and second order differential operators. or a robot builds the map of an environment and uses this The originality of the SURF algorithm (Speeded up Robust map at the same time to deduce its location in it. The SLAM Features) is that these operations are speeded up by the use is commonly used for the robot navigation systems in an of box filters techniques. unknown environment, it’s more like a concept than a single algorithm. The first step in the SLAM process is to obtain INTERNATIONAL JOURNAL OF RESEARCH IN ELECTRONICS AND COMPUTER ENGINEERING A UNIT OF I2OR 96 | P a g e IJRECE VOL. 4 ISSUE 4 OCT.-DEC. 2016 ISSN: 2393-9028 (PRINT) | ISSN: 2348-2281 (ONLINE) D. DPM – Deformable Part Based Model[7] 5. Normalized group of histograms represents the Deformable part based model is based on an object category block histogram. The set of these block histograms that represents the appearance of the parts and how they represents the descriptor. relate to each other. The part is any element of an object or scene that can be reliably detected using only local image The following figure demonstrates the algorithm evidence. In part based model, each part represents local implementation scheme: visual properties. The basic idea behind deformable parts is to represent an object model using a lower resolution root template and set a spatially flexible high resolution part templates. Deformable part based model is the next revolutionary idea after the Histogram of orientation gradient in object detection. Threshold employed in the deformable part based model in the non-maximum suppression filter is the key root of this algorithm. Lower the threshold, higher the number of detection. Computation of the HOG descriptor requires the following basic configuration parameters: Masks to compute derivatives and gradients Geometry of splitting an image into cells and Fig 5: Basic idea behind DPM grouping cells into a block E. HOG-Histogram of Oriented Gradients [8] Block overlapping Histogram of oriented gradients (HOG) is a feature Normalization parameters descriptor used to detect objects in computer vision and image processing. The HOG descriptor technique counts III. RELATED PAPER SUMMARY occurrences of gradient orientation in localized portions of an image - detection window, or region of interest (ROI). 1. P M Panchal, S R Panchal, S K Shah ,” A Comparison of SIFT and SURF”, they compared the SIFT and SURF algorithm.[9]Two images are taken and Features Implementation of the HOG descriptor algorithm is as are detected in both images using SIFT and SURF follows: algorithm. 1. Divide the image into small connected regions called cells, and for each cell compute a histogram of gradient directions or edge orientations for the pixels within the cell. 2. Discretize each cell into angular bins according to the gradient orientation. 3. Each cell's pixel contributes weighted gradient to its corresponding angular bin. 4. Groups of adjacent cells are considered as spatial (a) Original image1 (b) Original image regions called blocks.