Faster Cover Trees
Total Page:16
File Type:pdf, Size:1020Kb
Faster Cover Trees Mike Izbicki [email protected] Christian Shelton [email protected] University of California Riverside, 900 University Ave, Riverside, CA 92521 Abstract est neighbor of p in X is defined as The cover tree data structure speeds up exact pnn = argmin d(p;q) nearest neighbor queries over arbitrary metric q2X−fpg spaces (Beygelzimer et al., 2006). This paper makes cover trees even faster. In particular, we The naive method for computing pnn involves a linear scan provide of all the data points and takes time q(n), but many data structures have been created to speed up this process. The 1. A simpler definition of the cover tree that kd-tree (Friedman et al., 1977) is probably the most fa- reduces the number of nodes from O(n) to mous. It is simple and effective in practice, but it can only exactly n, be used on Euclidean spaces. We must turn to other data 2. An additional invariant that makes queries structures when given an arbitrary metric space. The sim- faster in practice, plest and oldest of these structures is the ball tree (Omo- 3. Algorithms for constructing and querying hundro, 1989). Although attractive for its simplicity, it pro- the tree in parallel on multiprocessor sys- vides only the trivial runtime guarantee that queries will tems, and take time O(n). Subsequent research focused on provid- 4. A more cache efficient memory layout. ing stronger guarantees, producing more complicated data structures like the metric skip list (Karger & Ruhl, 2002) On standard benchmark datasets, we reduce the and the navigating net (Krauthgamer & Lee, 2004). Be- number of distance computations by 10–50%. cause these data structures were complex and had large On a large-scale bioinformatics dataset, we re- constant factors, they were mostly of theoretical interest. duce the number of distance computations by The cover tree (Beygelzimer et al., 2006) simplified nav- 71%. On a large-scale image dataset, our parallel igating nets while maintaining good run time guarantees. algorithm with 16 cores reduces tree construction Further research has strengthened the theoretical runtime time from 3.5 hours to 12 minutes. bounds provided by the cover tree (Ram et al., 2010). Our contributions make the cover tree faster in practice. 1. Introduction The cover tree as originally introduced is simpler than re- lated data structures, but it is not simple. The original ex- Data structures for fast nearest neighbor queries are most planation required an implicit tree with an infinite number often used to speed up the k-nearest neighbor classifica- of nodes; but a smaller, explicit tree with O(n) nodes ac- tion algorithm. But many other learning tasks also re- tually gets implemented. We refer to this presentation as quire neighbor searches, and cover trees can speed up these the original cover tree. In the remainder of this paper we tasks as well: localized support vector machines (Segata introduce the simplified cover tree and the nearest ancestor & Blanzieri, 2010), dimensionality reduction (Lisitsyn cover tree; parallel construction and querying algorithms; et al., 2013), and reinforcement learning (Tziortziotis et al., and a cache-efficient layout suitable for all three cover 2014). Making cover trees faster—the main contribution of trees. We conclude with experiments showing: on stan- this paper—also makes these other tasks faster. dard benchmark datasets we outperform both the original Given a space of points X , a dataset X ⊆ X , a data point implementation (Beygelzimer et al., 2006) and MLPack’s implementation (Curtin et al., 2013a); on a large bioinfor- p 2 X , and a distance function d : X ×X ! R, the near- matics dataset, our nearest ancestor tree uses 71% fewer Proceedings of the 32nd International Conference on Machine distance comparisons than the original cover tree; and on a Learning, Lille, France, 2015. JMLR: W&CP volume 37. Copy- large image data set, our parallelization algorithm reduces right 2015 by the author(s). tree construction time from 3.5 hours to 12 minutes. Faster Cover Trees 2. Simplified cover trees Algorithm 1 Find nearest neighbor function findNearestNeighbor(cover tree p, query Definition 1. Our simplified cover tree is any tree where: point x, nearest neighbor so far y) (a) each node p in the tree contains a single data point (also denoted by p); and (b) the following three invariants 1: if d(p;x) < d(y;x) then are maintained. 2: y p 3: for each child q of p sorted by distance to x do maxdist 1. The leveling invariant. Every node p has an associ- 4: if d(y;x) > d(y;q) − (q) then findNearestNeighbor ated integer level(p). For each child q of p 5: y (q;x;y) 6: return y level(q) = level(p) − 1 : Algorithm 2 Simplified cover tree insertion 2. The covering invariant. For every node p, define the function insert(cover tree p, data point x) function covdist(p) = 2level(p). For each child q of p 1: if d(p;x) > covdist(p) then d(p;q) ≤ covdist(p) : 2: while d(p;x) > 2covdist(p) do 3: Remove any leaf q from p 0 3. The separating invariant. For every node p, define the 4: p tree with root q and p as only child 0 function sepdist(p) = 2level(p)−1. For all distinct 5: p p children q1 and q2 of p 6: return tree with x as root and p as only child 7: return insert (p,x) d(q ;q ) > sepdist(p) 1 2 function insert (cover tree p, data point x) prerequisites: d(p;x) ≤ covdist(p) Throughout this paper, we will use the functions 1: for q 2 children(p) do children descendants (p) and (p) to refer to the set 2: if d(q;x) ≤ covdist(q) then of nodes that are children or descendants of p respectively. 3: q0 insert (q;x) maxdist We also define the function as 4: p0 p with child q replaced with q0 5: return p0 maxdist(p) = argmax d(p;q) q2descendants(p) 6: return p with x added as a child In words, this is the greatest distance from p to any of its descendants. This value is upper bounded by 2level(p)+1, a O(c4) and its exact value can be cached within the data structure.1 ( ) any node in the cover tree can have at most chil- dren; (b) the depth of any node in the cover tree is at most In order to query a cover tree (any variant), we use the O(c2 logn). These were proven for the original cover tree generic “space tree” framework developed by Curtin et (Beygelzimer et al., 2006) and the proofs for our simplified al. (2013). This framework provides fast algorithms for cover tree are essentially the same. We can use these two finding the k-nearest neighbors, points within a specified facts to show that the runtime of Algorithm 1 is O(c6 logn) distance, kernel density estimates, and minimum spanning for both the original and simplified cover tree. trees. Each of these algorithms has two variants: a single tree algorithm for querying one point at a time, and a dual Algorithm 2 shows how to insert into the simplified cover tree algorithm for querying many points at once. Our faster tree. It is divided into two cases. In the first case, we can- x cover tree algorithms speed up all of these queries; but for not insert our data point into the tree without violating p simplicity, in this paper we focus only on the single tree the covering invariant. So we raise the level of the tree nearest neighbor query. The pseudocode is shown in Algo- by taking any leaf node and using that as the new root. Be- maxdist ≤ covdist rithm 1. cause (p) 2 (p), we are guaranteed that d(p;x) ≤ covdist(x), and so we do not violate the Analysis of the cover tree’s runtime properties is done us- covering constraint. In the second case, the insert func- ing the data-dependent doubling constant c: the minimum tion recursively descends the tree. On each function call, value c such that every ball in the dataset can be covered by we search through children(p) to find a node we can in- c balls of half the radius. We state without proof two facts: sert into without violating the covering invariant. If we find such a node, we recurse; otherwise, we know we can add 1As in the original cover tree, practical performance is improved on most datasets by redefining covdist(p) = x to children(p) without violating the separating invari- 1:3level(p) and sepdist(p) = 1:3level(p)−1. All of our exper- ant. In all cases, exactly one node is added per data point, iments use this modified definition. so the resulting tree will have exactly n nodes. Since every Faster Cover Trees 1 level 3 0.9 10 10 0.8 0.7 level 2 0.6 8 12 8 12 0.5 0.4 level 1 0.3 7 11 9 13 7 9 11 13 0.2 0.1 required for the simplified cover tree 0 yearpredicttwittertinyImagesmnist corel covtypeartificial40faces Figure 2. Using the metric d(a;b) = ja − bj, both trees are valid fraction of nodes in the original cover tree simplified cover trees; but only the right tree is a valid nearest ancestor cover tree. Moving the 9 and 11 nodes reduces the value of maxdist for their ancestor nodes. This causes pruning to Figure 1. Fraction of nodes required for the simplified cover tree.