Tree-Independent Dual-Tree Algorithms

Tree-Independent Dual-Tree Algorithms Ryan R. Curtin [email protected] William B. March [email protected] Parikshit Ram [email protected] David V. Anderson [email protected] Alexander G. Gray [email protected] Charles L. Isbell, Jr. [email protected] Georgia Institute of Technology, 266 Ferst Drive NW, Atlanta, GA 30332 USA Abstract important in astrophysics, is an n-body problem and Dual-tree algorithms are a widely used class can be solved quickly with trees (March et al., 2012). of branch-and-bound algorithms. Unfortu- In addition, Euclidean minimum spanning trees can nately, developing dual-tree algorithms for be found quickly using tree-based algorithms (March use with different trees and problems is often et al., 2010). Other dual-tree algorithms include ker- complex and burdensome. We introduce a nel density estimation (Gray & Moore, 2003a), mean four-part logical split: the tree, the traversal, shift (Wang et al., 2007), Gaussian summation (Lee the point-to-point base case, and the pruning & Gray, 2006), kernel density estimation, fast singu- rule. We provide a meta-algorithm which al- lar value decomposition (Holmes et al., 2008), range lows development of dual-tree algorithms in search, furthest-neighbor search, and many others. a tree-independent manner and easy exten- The dual-tree algorithms referenced above are each sion to entirely new types of trees. Repre- quite similar, but no formal connections between the sentations are provided for five common al- algorithms have been established. The types of trees gorithms; for k-nearest neighbor search, this used to solve each problem may differ, and in addition, leads to a novel, tighter pruning bound. The the manner in which the trees are traversed can differ meta-algorithm also allows straightforward (depending on the problem or the tree). In practice, extensions to massively parallel settings. a researcher may have to implement entirely separate algorithms to solve the same problems with different 1. Introduction trees; this is time-consuming and makes it difficult to In large-scale machine learning applications, algorith- explore the properties of tree types that make them mic scalability is paramount. Hence, much study has more suited for particular problems. Worse yet, par- been put into fast algorithms for machine learning allel dual-tree algorithms are difficult to develop and tasks. One commonly-used approach is to build trees appear to be far more complex than serial implemen- on data and then use branch-and-bound algorithms to tations; yet, both solve the same problem. We make minimize runtime. A popular example of a branch- these contributions to address these shortcomings: and-bound algorithm is the use of the trees for near- • A representation of dual-tree algorithms as est neighbor search, pioneered by Bentley (1975), and four separate components: a space tree, a subsequently modified to use two trees (\dual-tree") traversal, a base case, and a pruning rule. (Gray & Moore, 2001). Later, an optimized tree structure, the cover tree, was designed (Beygelzimer et al., • A meta-algorithm that produces dual-tree al- 2006), giving provably linear scaling in the number of gorithms, given those four separate components. queries (Ram et al., 2009)|a significant improvement • Base cases and pruning rules for a variety of over the quadratically-scaling brute-force algorithm. dual-tree algorithms, which can be used with Asymptotic speed gains as dramatic as described any space tree and any traversal. above are common for dual-tree branch-and-bound al- • A theoretical framework, used to prove the gorithms. These types of algorithms can be applied correctness of these meta-algorithms and develop to a class of problems referred to as `n-body prob- a new, tighter bound for k-nearest neighbor lems' (Gray & Moore, 2001). The n-point correlation, search. Proceedings of the 30 th International Conference on Ma- • Implications of our representation, including easy chine Learning, Atlanta, Georgia, USA, 2013. JMLR: creation of large-scale distributed dual-tree W&CP volume 28. Copyright 2013 by the author(s). algorithms via our meta-algorithm. Tree-Independent Dual-Tree Algorithms 2. Overview of Meta-Algorithm : fx ; x g In other works, dual-tree algorithms are described as Nr 1 3 standalone algorithms that operate on a query dataset x4 Sp Sq and a reference dataset Sr. By observing common- x2 x5 alities in these algorithms, we propose the following Np : fx1; x5g x1 logical split of any dual-tree algorithm into four parts: • A space tree (a type of data structure). • A pruning dual-tree traversal, which visits nodes Nc : fx2g Nd : fx4g x3 Sr in two space trees, and is parameterized by a (a) Abstract representation. (b) <2 representation. BaseCase() and a Score() function. Figure 1. An example space tree. • A BaseCase() function that defines the action to take on a combination of points. • The set of points held in a node Ni is denoted P(Ni) or Pi. • A Score() function that determines if a subtree should be visited during the traversal. • The convex subset represented by node Ni is denoted S (Ni) or Si. We can use this to define a meta-algorithm: • The set of descendant nodes of a node Ni, de- Given a type of space tree, a pruning dual- n n noted D (Ni) or Di , is the set of nodes C (Ni) [ tree traversal, a BaseCase() function, and 1 C (C (Ni)) [ ::: . a Score() function, use the pruning dual- tree traversal with the given BaseCase() and • The set of descendant points of a node Ni, de- p p Score() functions on two space trees Tq noted D (Ni) or Di , is the set of points f p : p 2 n 2 (built on Sq) and Tr (built on Sr). P(D (Ni)) [ P(Ni) g . In Sections3 and4, space trees, traversals, and related • The parent of a node Ni is denoted Par(Ni). quantities are rigorously defined. Then, Sections5{8 An abstract representation of an example space tree on define BaseCase() and Score() for various dual-tree 2 algorithms. Section9 discusses implications and future a five-point dataset in < is shown in Figure 1(a). In possibilities, including large-scale parallelism. this illustration, Nr is the root node of the tree; it has no parent and it contains the points x3 and x1 (that is, Pr = fx1; x5g. The node Np contains points x1 and 3. Space Trees x5 and has children Nc and Nd (which each have no To develop a framework for understanding dual-tree children and contain points x2 and x4, respectively). algorithms, we must introduce some terminology. In Figure 1(b), the points in the tree and the subsets Sr (darker rectangle) and Sp (lighter rectangle) are Definition 1. A space tree on a dataset S 2 <N×D plotted. Sc = fx2g and Sd = fx4g are not labeled. is an undirected, connected, acyclic, rooted simple graph with the following properties: Definition 2. The minimum distance between two nodes Ni and Nj is defined as • Each node (or vertex), holds a number of points p p (possibly zero) and is connected to one parent node dmin(Ni; Nj) = min kpi − pjk 8 pi 2 Di ; pj 2 Dj : and a number of child nodes (possibly zero). Definition 3. The maximum distance between two • There is one node in every space tree with no par- nodes Ni and Nj is defined as ent; this is the root node of the tree. p p dmax(Ni; Nj) = max kpi − pjk 8 pi 2 Di ; pj 2 Dj : • Each point in S is contained in at least one node. Definition 4. The maximum child distance of a D • Each node N has a convex subset of < contain- node Ni is defined as the maximum distance between ing each point in that node and also the convex the centroid Ci of Si and each point in Pi: subsets represented by each child of the node. ρ(Ni) = max kCi − pk: p2Pi Notationally, we use the following conventions: 1 By C (C (Ni)) we mean all the children of the children • The set of child nodes of a node Ni is denoted of node Ni: C (C (Ni)) = fC (Nc): Nc 2 C (Nq)g. 2 n C (Ni) or Ci. The meaning of P(D (Ni)) is similar to C (C (Ni)). Tree-Independent Dual-Tree Algorithms Definition 5. The maximum descendant dis- Algorithm 1 DepthFirstTraversal(Nq, Nr). tance of a node Ni is defined as the maximum dis- if Score( q, r) = 1 then return p N N tance between the centroid Ci of Si and points in Di : for each pq 2 Pq; pr 2 Pr do λ( i) = max kCi − pk: N p BaseCase(pq, pr) p2Di It is straightforward to show that kd-trees, octrees, for each Nqc 2 Cq; Nrc 2 Cr do metric trees, ball trees, cover trees (Beygelzimer DepthFirstTraversal(Nqc, Nrc) et al., 2006), R-trees, and vantage-point trees all sat- isfy the conditions of a space tree. The quantities dmin(Nq; Nr), dmax(Nq; Nr), λ(Ni), and ρ(Ni) are contained within that node. If no nodes are pruned, easily derived (or bounded, which in many cases is then the traversal will visit each node in the tree once. sufficient) for each of these types of trees. Clearly, a pruning single-tree traversal that does not For a kd-tree, dmin(Ni; Nj) is bounded below by the prune any nodes is a single-tree traversal. A prun- minimum distance between Si and Sj; dmax(Ni; Nj) ing single-tree traversal can be implemented with two is bounded above similarly by the maximum distance callbacks: BaseCase(point) and Score(node).

Tree-Independent Dual-Tree Algorithms

Lecture 04 Linear Structures Sort

Cover Trees for Nearest Neighbor

Faster Cover Trees

Canopy – Fast Sampling Using Cover Trees

Cover Trees for Nearest Neighbor

SURVEY on VARIOUS-WIDTH CLUSTERING for EFFICIENT K-NEAREST NEIGHBOR SEARCH

An Empirical Comparison of Exact Nearest Neighbour Algorithms

COVER TREES for NEAREST NEIGHBOR Nearest Neighbor

All-Near-Neighbor Search Among High-Dimensional Data Via Hierarchical Sparse Graph Code Filtering

Improving Dual-Tree Algorithms

Non-Rigid Surface Registration Using Cover Tree Based Clustering and Nearest Neighbor Search

Introduction to Cover Tree