Efficient Nearest-Neighbor Computation for GPU-Based Motion
Total Page:16
File Type:pdf, Size:1020Kb
Efficient Nearest-Neighbor Computation for GPU-based Motion Planning Jia Pan and Christian Lauterbach and Dinesh Manocha Department of Computer Science, UNC Chapel Hill http://gamma.cs.unc.edu/gplanner Abstract— We present a novel k-nearest neighbor search nearest neighbor computation, local planning and graph algorithm (KNNS) for proximity computation in motion search using parallel algorithms. In practice, the overall planning algorithm that exploits the computational capa- planning can be one or two orders of magnitude faster bilities of many-core GPUs. Our approach uses locality sen- sitive hashing and cuckoo hashing to construct an efficient than current CPU-based planners. KNNS algorithm that has linear space and time complexity We present a novel algorithm for GPU-based nearest and exploits the multiple cores and data parallelism effec- neighbors computation. Our formulation is based on tively. In practice, we see magnitude improvement in speed locality sensitive hashing (LSH) and cuckoo hashing and scalability over prior GPU-based KNNS algorithm. techniques, which can compute approximate k-nearest On some benchmarks, our KNNS algorithm improves the performance of overall planner by 20 − 40 times for CPU- neighbors in higher dimensions. We present parallel based planner and up to 2 times for GPU-based planner. algorithms for LSH computation as well KNN compu- tation that exploit the high number of cores and data parallelism of the GPUs. Our approach can perform the I. INTRODUCTION computation using Euclidean as well as non-Euclidean Sampling-based motion planning algorithms are distance metrics. Furthermore, it is memory efficient widely used in robotics and related areas to compute and can handle millions of sample points within 1GB collision-free motion for the robots. These methods use memory of a commodity GPU. Our preliminary results randomized technique to generate samples in the con- indicate that the novel KNNS algorithm can be faster by figuration space and connect nearby samples using local one or two orders of magnitude than prior GPU-based planning methods. A key component in these algorithms KNNS algorithms. This also improves the performance is the computation of the k-nearest neighbors (KNN) of the GPU-based planner by 25−100% as compared to of each sample, which are defined based on a given gPlanner [15]. We highlight its performance on different distance metric. benchmarks. The problem of nearest neighbor computation has The rest of the paper is organized as follows. We been well studied in computational geometry, databases survey related work in Section II. Section III gives an and image-processing besides robotics. Some of the overview of our motion planning framework. We de- well known algorithms are based on Kd-trees, bounding scribe the novel GPU-based parallel algorithm for KNN volume hierarchies (BVHs), R-tree, X-trees, M-trees, computation in Section IV. We highlight its performance VP-trees, GNAT, iDistance, etc [18]. on motion planning benchmarks in Section V. In this paper, we address the problem of k-nearest neighbor search (KNNS) for realtime sampling-based II. RELATED WORK planning. Our work is based on recent developments In this section, we briefly survey the related work on in motion planning that use the computational capabili- motion planning, nearest neighbor computation, locality ties of commodity many-core graphics processing units sensitive hashing as well as GPU-based algorithms. (GPUs) for real-time motion planning [15]. The resulting planner performs all the steps of sampling-based plan- A. Motion Planning ning including sample generation, collision checking, An excellent survey of various motion planning al- Jia Pan, Christian Lauterbach and Dinesh Manocha are with the gorithms is given in [13]. Most of the recent work is Department of Computer Science, University of North Carolina at based on randomized or sampling-based methods such Chapel Hill, NC 27599 USA {panj, cl,dm}@cs.unc.edu. This as PRMs and RRTs. work was supported in part by ARO Contract W911NF-04-1-0088, NSF awards 0636208, 0917040 and 0904990, DARPA/RDECOM Many task planning applications need a real-time Contract WR91CRB-08-C-0137, and Intel. motion planning algorithm for dynamic environments ai · x + bi with moving objects. Some approaches use application- hi(x) = ⌊ ⌋ i = 1, 2, ..., M. (3) specific heuristics to accelerate the planners. Many par- W allel algorithms have also been proposed for real-time In this case, ai is a d-dimensional random vector with motion planning that can be parallelized on multi-CPU entries chosen independently from a normal distribution systems, such as [8], [3] etc. N (0, 1) and bi is drawn uniformly from the range , W M W B. k-Nearest Neighbor Search [0 ]. and are parameters used to control the locality sensitivity of the hash function. k-nearest neighbor search (KNNS), also known as The KNNS search structure is a hash table: points proximity computation, is an optimization problem in with the same hash values are pushed into the same the metric space: given a set S of points in a d- bucket of hash table. The query process is to scan dimensional space M with metric k · kp and a query within the buckets which the query point is hashed to. point q ∈ M, find the closest k points in S to q. In order to improve the quality of KNNS, generally L Various solutions to KNNS computation have been different hash functions g (·),j = 1, ..., L are chosen proposed. Some of them are based on spatial data j randomly from function family F. As a result, query structures, e.g. Kd-tree, and compute the exact k-nearest algorithm must perform repeatedly on all L hash tables. neighbors efficiently when the dimension d is low (e.g., LSH-based KNNS can perform one query in nearly up to 10). However, these methods suffer from either constant time, which means the nearest neighbor in space or query time complexity that tends to grow as motion planning algorithm can be completed within an exponential function of d. For large enough d, these linear time complexity to the number of samples. methods may only provide little improvement over the brute-force search algorithm that compares a query to D. GPU-based Algorithms each point from S. Such phenomenon is called the curse of dimensionality. Some other approaches tend to over- Many-core GPUs have been used for many geometric come the difficulty by using approximate methods for and scientific computations. The rasterization capabil- KNNS computation [4], [14], [1]. In these formulations, ities of a GPU can be used for motion planning of the algorithm is allowed to return a point whose distance multiple low DOF robots [19], motion planning in from the query point is at most 1 + ǫ times the distance dynamic scenes [9] or improve the sample generation in from the query to its nearest points; ǫ > 1 is called the narrow passages [6], [16]. However, rasterization based approximation factor. The appeal of these approaches is planning algorithms are accurate only to the resolution that in most cases an approximate nearest neighbor can of image-space. be almost as good as the exact one. The computational capabilities of many-core GPUs have been exploited to improve KNNS. Garcia et al. C. Locality Sensitive Hashing [7] describe an implementation of brute-force KNNS Locality-sensitive hashing (LSH) is another popular on GPU. Recently, some efficient GPU-based algorithms algorithm for approximate KNNS in high-dimensional have been proposed to construct Kd-trees [20]. Pan et spaces [10]. The key idea is to hash the points in M al. [15] propose a new approach to compute KNNS in using several hashing functions to ensure that for each low-dimensional space, which is based on the collision function the probability of collision is much higher for operations between bounding volume hierarchies (BVH) points that are close to each other than for those that are on GPU [12], [11]. far apart. One formal description is: III. OVERVIEW Ph∈F [h(x) = h(y)] ∝ sim(x, y), (1) In this section, we give an overview of the GPU- where h is the hash function and belongs to the so (·) based motion planning framework, which is relatively called locality-sensitive hash (LSH) function family ; F easy to parallelize and can be used for single-query and x, y are two points in d-dimensional space; sim is a (·) multiple-query problems. function to measure the similarity between x and y: for Our work is built on GPU-based sample-based plan- KNNS large sim(x, y) means kx − yk , the distance p ning algorithm called gPlanner [15]. We choose PRM as between x and y, is small. Different function families the underlying motion planning algorithm, because it is F can be used for different metrics. Datar et al. [5] more suitable to exploit the multiple cores and data par- proposed LSH families for l2 metric based on p-stable allelism on GPUs. The PRM algorithm is composed of distributions, where each hash function is defined as: several steps and each step performs similar operations g(x) = hh1(x),h2(x), ..., hM (x)i (2) on the input samples or the links joining those samples. A. Parallel LSH Computation In this step, each GPU thread computes the LSH value for one milestone q. We assume that M is an Euclidean d 2 2 1/2 space with weighted l2 metric: kqkλ = ( i=1 qi λi ) , where λ is the weight vector. We willP discuss how to handle non-Euclidian case in Section IV-D. The LSH function for weighted Euclidean space is similar to Equ 2 and 3, except that we need to consider the weight function as LSH only handles l2 metric: λ λ λ λ g (x) = hh1 (x),h2 (x), ..., hM (x)i (4) λ ai · x + bi hi (x) = ⌊ ⌋, i = 1, 2, ..., M (5) Fig.