From Cover Trees to Net-Trees

From Cover Trees to Net-Trees Mahmoodreza Jahanseir∗ Donald R. Sheehyy 1 Introduction The constant τ, called the scale factor of the tree de- termines the change in scale between levels as will be Cover trees are a popular data structure for (approxi- seen in the following definitions. mate) nearest neighbor search on metric spaces of low intrinsic dimension [1]. They are superficially similar Cover Trees. A cover tree is defined by the follow- to the net-trees of Har-Peled & Mendel [2] as both ing properties. (Packing) For all distinct p; q 2 L`, 0 structures may be interpreted as generalizations of d(p; q) > τ `.(Covering) If q` is the parent of p` then 0 compressed quadtrees beyond the Euclidean setting. d(p; q) ≤ cτ ` . We call c the covering constant, and Cover trees are the simplest of these data structures, in a usual cover tree c = 1. requiring linear space, independent of the ambient Net-trees. For each node p` in a net-tree, the or intrinsic dimension of the data set. An efficient τ−5 following invariants hold. (Packing) B(p; 2(τ−1) · implementation of cover trees is available. On the ` T 2τ ` 1 τ ) P ⊂ P ` .(Covering) P ` ⊂ B(p; · τ ). other hand, net-trees have better theoretical guaran- p p τ−1 tees for preprocessing and query times. However, to The main difference in the definitions lies in the achieve subquadratic preprocessing time, it uses com- packing conditions. The net-tree requires the packing plex techniques which are not efficient in practice. to be consistent with the hierarchical structure of the tree, a property not necessarily satisfied by the cover We show how a slight modification to the definition trees. Also, Har-Peled and Mendel [2] set τ = 11, of a cover tree allows it to satisfy the stronger condi- whereas optimized cover tree code sets τ = 1:3. tions of a net-tree. Leveraging this structural result, A net-tree can be augmented to maintain a list of we give a simple algorithm to turn a given cover tree nearby nodes with no additional cost. For each node into a net-tree in linear time. p`, Rel(p`) is defined as follows. 2 Background Rel(p`) =fqi 2 T with parent rj j i ≤ ` < j; The input is a set of n points P in a metric space. ` The closed metric ball centered at p with radius r and d(p; q) ≤ 13 · τ g is denoted B(p; r) := fq 2 P j d(p; q) ≤ rg. The We add a new, easy to implement condition on doubling constant ρ of P is the minimum ρ 2 such R cover trees. We require that children of a node p` are that every ball B(p; r) can be covered by ρ balls of closest to p than to any other point in L . radius r=2. We assume ρ is constant. ` Cover trees and net-trees are both examples of hi- 3 From cover trees to net-trees erarchical trees. In such trees, the input points are Theorem 1. A cover tree T with scale factor τ > 3 is leaves and each point p can be associated with many ` a net-tree, such that for each node p`, B(p; (τ−3)τ ) \ internal nodes. Each node is uniquely identified by its 2(τ−1) τ `+1 associated point and an integer called its level. The P ⊂ Pp` and B(p; τ−1 ) ⊃ Pp` . node in level ` associated with a point p is denoted p`. 0 If q` is the parent of p`, then `0 > ` and if `0 > ` + 1 Proof. Fix a node p` 2 T . The covering property ` then p = q. Moreover, if p is a non-leaf node, then and the triangle inequality imply that d(p; Pp` ) ≤ m 1 `+1 `+1 it has a child p , where m < `. Let L` be the points P `−i τ τ i=0 τ = τ−1 . Therefore, Pp` ⊂ B(p; τ−1 ). associated with nodes in level at least `. Let Pp` de- Suppose for contradiction there exists a point r 2 note leaves of the subtree rooted at p`. The levels of τ−3 ` B(p; 2(τ−1) τ ) such that r2 = Pp` . Then, there exists the tree represent the metric space at different scales. 1The packing condition we give is slightly different from [2], ∗University of Connecticut [email protected] but it is an easy exercise to prove this (more useful) version is yUniversity of Connecticut [email protected] equivalent. 1 a node xi 2 T that it is the lowest node with i ≥ ` Then, we define a mapping between nodes of T and 0 ` `0 and r 2 Pxi . T . Here, each node p in T maps to a node p = First, assume that i = `. Let yj be the child of pb`=kc in T 0. If ` = k`0, then levels `; `+1; : : : ; `+k−1 i 0 0 x such that r 2 Pyj . If j < i − 1, then x = y and of T combine into the level ` in T . Note that the d(p; y) > τ `. By the triangle inequality, level of root is 1. If rm 2 T is the descendant at most k levels ` ` ` ` (τ − 3)τ τ down from some p , then by the covering property d(y; r) ≥ d(y; p) − d(p; r) > τ − > : Pk−1 `−i 2(τ − 1) τ − 1 and the triangle inequality, d(p; r) ≤ i=0 τ = 0 `0 Pk−1 i ` (τ ) i=0 1/τ . Because we are combining sets of k τ 0 However, d(y; r) ≤ τ−1 , because y is the ancestor of consecutive levels, it follows that each node in T will r. Otherwise, when j = i−1, x is the nearest neighbor have a node in the level above whose distance is at of y among Li. So, d(y; p) ≥ d(p; x) − d(x; y) > most this amount. It follows that T 0 has a covering τ `−d(y; p) > τ `=2. This case as well as the case when Pk−1 i constant i=0 1/τ . i > ` also yield a contradiction by similar arguments If ` = k`0, then the minimum distance between 0 0 (In the latter case, y = x). Therefore, r 2 Pp` . points in level ` of T is equal to the minimum distance between points in level ` of T , which is at least 0 Theorem 2. For a given cover tree T on a set of n τ ` = (τ 0)` . Thus, the points in level `0 of T 0 satisfy points P with doubling constant ρ, one can augment the packing condition. O(1) 0 T with relative lists in ρ n. To find a correct parent for a node p` in T 0, we find m ` 0 0 Proof. When constructing a cover tree, we also build the lowest ancestor q of p in T such that m > ` . 0 0 i m a hash table to support constant time access to the if m = ` +1, then we find a node r in relatives of q m list of nodes in a specific level with the same asymp- or descendants of relatives of q such that it mini- ` i O(lg( τ )) totic cost. We process each list of nodes from highest mizes d(p ; r ). This step can be done in kρ τ−1 . k(l0+1) to lowest. For each unvisited node u, find Rel(u) for Otherwise, we pretend that we have a node q the part of the tree that corresponds to the visited in T and we find its relatives. The total amortized nodes using a method similar to [2]. Then, mark cost of finding relatives of such dummy nodes follows u as visited and continue until process all nodes of from the same argument as updating relatives in [2], O(lg( τ )) the cover tree. The time complexity follows from the and it is nρ τ−1 . Then, we do the same search same analysis as [2]. as we did for the first case. It can be proved that this algorithm correctly finds such ri. We omit the proof In Theorem 1, we assumed that τ > 3. However, due to lack of space. 0 0 Beygelzimer et. al. [1] set τ = 2, and they found By finding ri in T , we connect p` and r` as chil- 0 τ = 1:3 is even more efficient in practice. Because dren of r` +1. Also, we might need to update the 0 such cover trees may not be net-trees, we give an parent and children of some nodes in a path from r` 0 0 0 algorithm to increase the scale factor of a cover tree, to ri or from ri to r` +1. and show when such a coarsened cover tree is a net- tree. Corollary 4. Given a cover tree T with τ > 1 and c = 1 on a set of n points P , one can find a net-tree T 0 Theorem 3. Given a cover tree T with τ > 1 and with scale factor τ 0 = τ k, for a constant integer k ≥ 0 0 0 Pk−1 i c = 1, and a constant integer k > 1. T can be turned 1, and τ > 1 + 2c , where c = i=0 1/τ , from T in 0 0 k O(lg( τ )) 0 into a cover tree T with scale factor τ = τ and nkρ τ−1 time such that for each node p` in T 0, τ 0 Pk−1 i O(lg( )) 0 0 `0+1 covering constant c = 1/τ in knρ τ−1 τ 0−1−2c0 0 `0 c (τ ) i=0 B(p; (τ ) ) \ P ⊂ P `0 and B(p; ) ⊃ time.

From Cover Trees to Net-Trees

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support