From Cover Trees to Net-Trees

Mahmoodreza Jahanseir∗ Donald R. Sheehy†

1 Introduction The constant τ, called the scale factor of the de- termines the change in scale between levels as will be Cover trees are a popular for (approxi- seen in the following definitions. mate) on metric spaces of low intrinsic dimension [1]. They are superficially similar Cover Trees. A cover tree is defined by the follow- to the net-trees of Har-Peled & Mendel [2] as both ing properties. (Packing) For all distinct p, q ∈ L`, 0 structures may be interpreted as generalizations of d(p, q) > τ `.(Covering) If q` is the parent of p` then 0 compressed beyond the Euclidean setting. d(p, q) ≤ cτ ` . We call c the covering constant, and Cover trees are the simplest of these data structures, in a usual cover tree c = 1. requiring linear space, independent of the ambient Net-trees. For each node p` in a net-tree, the or intrinsic dimension of the data . An efficient τ−5 following invariants hold. (Packing) B(p, 2(τ−1) · implementation of cover trees is available. On the ` T 2τ ` 1 τ ) P ⊂ P ` .(Covering) P ` ⊂ B(p, · τ ). other hand, net-trees have better theoretical guaran- p p τ−1 tees for preprocessing and query times. However, to The main difference in the definitions lies in the achieve subquadratic preprocessing time, it uses com- packing conditions. The net-tree requires the packing plex techniques which are not efficient in practice. to be consistent with the hierarchical structure of the tree, a property not necessarily satisfied by the cover We show how a slight modification to the definition trees. Also, Har-Peled and Mendel [2] set τ = 11, of a cover tree allows it to satisfy the stronger condi- whereas optimized cover tree code sets τ = 1.3. tions of a net-tree. Leveraging this structural result, A net-tree can be augmented to maintain a list of we give a simple algorithm to turn a given cover tree nearby nodes with no additional cost. For each node into a net-tree in linear time. p`, Rel(p`) is defined as follows. 2 Background Rel(p`) ={qi ∈ T with parent rj | i ≤ ` < j, The input is a set of n points P in a metric space. ` The closed metric ball centered at p with radius r and d(p, q) ≤ 13 · τ } is denoted B(p, r) := {q ∈ P | d(p, q) ≤ r}. The We add a new, easy to implement condition on doubling constant ρ of P is the minimum ρ ∈ such R cover trees. We require that children of a node p` are that every ball B(p, r) can be covered by ρ balls of closest to p than to any other point in L . radius r/2. We assume ρ is constant. ` Cover trees and net-trees are both examples of hi- 3 From cover trees to net-trees erarchical trees. In such trees, the input points are Theorem 1. A cover tree T with scale factor τ > 3 is leaves and each point p can be associated with many ` a net-tree, such that for each node p`, B(p, (τ−3)τ ) ∩ internal nodes. Each node is uniquely identified by its 2(τ−1) τ `+1 associated point and an integer called its level. The P ⊂ Pp` and B(p, τ−1 ) ⊃ Pp` . node in level ` associated with a point p is denoted p`. 0 If q` is the parent of p`, then `0 > ` and if `0 > ` + 1 Proof. Fix a node p` ∈ T . The covering property ` then p = q. Moreover, if p is a non-leaf node, then and the triangle inequality imply that d(p, Pp` ) ≤ m ∞ `+1 `+1 it has a child p , where m < `. Let L` be the points P `−i τ τ i=0 τ = τ−1 . Therefore, Pp` ⊂ B(p, τ−1 ). associated with nodes in level at least `. Let Pp` de- Suppose for contradiction there exists a point r ∈ note leaves of the subtree rooted at p`. The levels of τ−3 ` B(p, 2(τ−1) τ ) such that r∈ / Pp` . Then, there exists the tree represent the metric space at different scales. 1The packing condition we give is slightly different from [2], ∗University of Connecticut [email protected] but it is an easy exercise to prove this (more useful) version is †University of Connecticut [email protected] equivalent.

1 a node xi ∈ T that it is the lowest node with i ≥ ` Then, we define a mapping between nodes of T and 0 ` `0 and r ∈ Pxi . T . Here, each node p in T maps to a node p = First, assume that i = `. Let yj be the child of pb`/kc in T 0. If ` = k`0, then levels `, `+1, . . . , `+k−1 i 0 0 x such that r ∈ Pyj . If j < i − 1, then x = y and of T combine into the level ` in T . Note that the d(p, y) > τ `. By the triangle inequality, level of root is ∞. If rm ∈ T is the descendant at most k levels ` ` ` ` (τ − 3)τ τ down from some p , then by the covering property d(y, r) ≥ d(y, p) − d(p, r) > τ − > . Pk−1 `−i 2(τ − 1) τ − 1 and the triangle inequality, d(p, r) ≤ i=0 τ = 0 `0 Pk−1 i ` (τ ) i=0 1/τ . Because we are combining sets of k τ 0 However, d(y, r) ≤ τ−1 , because y is the ancestor of consecutive levels, it follows that each node in T will r. Otherwise, when j = i−1, x is the nearest neighbor have a node in the level above whose distance is at of y among Li. So, d(y, p) ≥ d(p, x) − d(x, y) > most this amount. It follows that T 0 has a covering τ `−d(y, p) > τ `/2. This case as well as the case when Pk−1 i constant i=0 1/τ . i > ` also yield a contradiction by similar arguments If ` = k`0, then the minimum distance between 0 0 (In the latter case, y = x). Therefore, r ∈ Pp` . points in level ` of T is equal to the minimum dis- tance between points in level ` of T , which is at least 0 Theorem 2. For a given cover tree T on a set of n τ ` = (τ 0)` . Thus, the points in level `0 of T 0 satisfy points P with doubling constant ρ, one can augment the packing condition. O(1) 0 T with relative lists in ρ n. To find a correct parent for a node p` in T 0, we find m ` 0 0 Proof. When constructing a cover tree, we also build the lowest ancestor q of p in T such that m > ` . 0 0 i m a hash table to support constant time access to the if m = ` +1, then we find a node r in relatives of q m list of nodes in a specific level with the same asymp- or descendants of relatives of q such that it mini- ` i O(lg( τ )) totic cost. We process each list of nodes from highest mizes d(p , r ). This step can be done in kρ τ−1 . k(l0+1) to lowest. For each unvisited node u, find Rel(u) for Otherwise, we pretend that we have a node q the part of the tree that corresponds to the visited in T and we find its relatives. The total amortized nodes using a method similar to [2]. Then, mark cost of finding relatives of such dummy nodes follows u as visited and continue until process all nodes of from the same argument as updating relatives in [2], O(lg( τ )) the cover tree. The time complexity follows from the and it is nρ τ−1 . Then, we do the same search same analysis as [2]. as we did for the first case. It can be proved that this algorithm correctly finds such ri. We omit the proof In Theorem 1, we assumed that τ > 3. However, due to lack of space. 0 0 Beygelzimer et. al. [1] set τ = 2, and they found By finding ri in T , we connect p` and r` as chil- 0 τ = 1.3 is even more efficient in practice. Because dren of r` +1. Also, we might need to update the 0 such cover trees may not be net-trees, we give an parent and children of some nodes in a path from r` 0 0 0 algorithm to increase the scale factor of a cover tree, to ri or from ri to r` +1. and show when such a coarsened cover tree is a net- tree. Corollary 4. Given a cover tree T with τ > 1 and c = 1 on a set of n points P , one can find a net-tree T 0 Theorem 3. Given a cover tree T with τ > 1 and with scale factor τ 0 = τ k, for a constant integer k ≥ 0 0 0 Pk−1 i c = 1, and a constant integer k > 1. T can be turned 1, and τ > 1 + 2c , where c = i=0 1/τ , from T in 0 0 k O(lg( τ )) 0 into a cover tree T with scale factor τ = τ and nkρ τ−1 time such that for each node p` in T 0, τ 0 Pk−1 i O(lg( )) 0 0 `0+1 covering constant c = 1/τ in knρ τ−1 τ 0−1−2c0 0 `0 c (τ ) i=0 B(p, (τ ) ) ∩ P ⊂ P `0 and B(p, ) ⊃ time. 2(τ 0−1) p τ 0−1 Pp`0 . Proof. Using Theorem 2, we augment T with relative lists. However, instead of constant 13 in the definition References 3τ [1] A. Beygelzimer, S. Kakade, and J. Langford. Cover of relatives, we use a constant greater than τ−1 . This O(lg( τ )) trees for nearest neighbor. In ICML, 2006. can be done in nρ τ−1 time. Next, for each point which is located in a level higher than the lowest level [2] S. Har-Peled and M. Mendel. Fast construction of of T , we create a node for this point in this level and nets in low dimensional metrics, and their applica- we move this point to the lowest level of T . This step tions. SIAM Journal on Computing, 35(5):1148–1184, 2006. takes O(n) time.

2