An Intoductory Tutorial on Kd-Trees
Total Page:16
File Type:pdf, Size:1020Kb
An into ductory tutorial on kd-trees Andrew W. Mo ore Carnegie Mellon University [email protected] Extract from Andrew Mo ore's PhD Thesis: Ecient Memory-basedLearning for Robot Control PhD. Thesis; Technical Rep ort No. 209, Computer Lab oratory, University of Cambridge. 1991. Chapter 6 Kd-trees for Cheap Learning This chapter gives a speci cation of the nearest neighbour algorithm. It also gives both an informal and formal introduction to the kd-tree data structure. Then there is an explicit, detailedaccount of how the nearest neighbour search algorithm is implemented eciently, which is fol lowed by an empirical investigation into the al- gorithm's performance. Final ly, there is discussion of some other algorithms related to the nearest neighbour search. 6.1 Nearest Neighb our Sp eci cation k k r d Given twomulti-dimensional spaces Domain = < and Range = < , let an exemplar be a member of Domain Range and let an exemplar-set b e a nite set of exemplars. Given an exemplar-set , E, and a target domain vector, d, then a nearest neighbour of d is any. exemplar 0 0 0 d ; r 2 E such that None-nearerE; d; d . Notice that there might b e more than one suit- able exemplar . This ambiguity captures the requirement that any nearest neighb our is adequate. None-nearer is de ned thus: 0 00 00 0 00 None-nearerE; d; d , 8d ; r 2 E j d d jj d d j 6.1 In Equation 6.1 the distance metric is Euclidean, though any other L -norm could have b een used. p v u i=k d u X 0 t 0 2 j d d j= d d 6.2 i i i=1 where d is the ith comp onentofvector d. i In the following sections I describ e some algorithms to realize this abstract sp eci cation with the additional informal requirement that the computation time should b e relatively short. 6-1 Algorithm: Nearest Neighb our by Scanning. Data Structures: domain-vector Avector of k oating p ointnumb ers. d range-vector Avector of k oating p ointnumb ers. r exemplar A pair: domain-vector ; range-vector Input: exlist ,oftyp e list of exemplar dom,oftyp e domain-vector Output: nearest,oftyp e exemplar Preconditions: exlist is not empty 0 0 Postconditions: if nearest represents the exemplar d ; r , and exlist represents the exemplar set E , and dom represents the vector d, 0 0 0 then d ; r 2 E and None-nearerE; d; d . Co de: 1. nearest-dist := in nity 2. nearest := unde ned 3. for ex := each exemplar in exlist 3.1 dist := distance b etween dom and the domain of ex 3.2 if dist < nearest-dist then 3.2.1 nearest-dist := dist 3.2.2 nearest := ex Table 6.1: Finding Nearest Neighb our by scanning a list. 6.2 Naive Nearest Neighb our This op eration could b e achieved by representing the exemplar-set as a list of exemplars. In Ta- ble 6.1, I give the trivial nearest neighb our algorithm which scans the entire list. This algorithm has time complexity O N where N is the size of E. By structuring the exemplar-set more intelligently, it is p ossible to avoid making a distance computation for every memb er. 6.3 Intro duction to kd-trees A kd-tree is a data structure for storing a nite set of p oints from a k -dimensional space. It was [ ] examined in detail by J. Bentley Bentley, 1980; Friedman et al., 1977 . Recently, S. Omohundro has recommended it in a survey of p ossible techniques to increase the sp eed of neural network [ ] learning Omohundro, 1987 . A kd-tree is a binary tree. The contents of each no de are depicted in Table 6.2. Here I provide an informal description of the structure and meaning of the tree, and in the following subsection I 6-2 Field Name: Field Typ e Description dom-elt domain-vector A p oint from k -d space d range-elt range-vector A p oint from k -d space r split integer The splitting dimension left kd-tree A kd-tree representing those p oints to the left of the splitting plane right kd-tree A kd-tree representing those p oints to the right of the splitting plane Table 6.2: The elds of a kd-tree no de give a formal de nition of the invariants and semantics. The exemplar-set E is represented by the set of no des in the kd-tree, each no de representing one exemplar. The dom-elt eld represents the domain-vector of the exemplar and the range-elt eld represents the range-vector. The dom-elt comp onent is the index for the no de. It splits the space into two subspaces according to the splitting hyp erplane of the no de. All the p oints in the \left" subspace are represented by the left subtree, and the p oints in the \right" subspace by the right subtree. The splitting hyp erplane is a plane which passes through dom-elt and whichis p erp endicular to the direction sp eci ed by the split eld. Let i b e the value of the split eld. Then a p oint is to the left of dom-elt if and only if its ith comp onent is less than the ith comp onentof dom-elt . The complimentary de nition holds for the right eld. If a no de has no children, then the splitting hyp erplane is not required. Figure 6.1 demonstrates a kd-tree representation of the four dom-elt p oints 2; 5, 3; 8, 6; 3 and 8; 9. The ro ot no de, with dom-elt 2; 5 splits the plane in the y -axis into two subspaces. The p oint3; 8 lies in the lower subspace, that is fx; y j y<5g, and so is in the left subtree. Figure 6.2 shows how the no des partition the plane. 6.3.1 Formal Sp eci cation of a kd-tree The reader who is content with the informal description ab ove can omit this section. I de ne a mapping exset-rep : kd-tree ! exemplar-set 6.3 which maps the tree to the exemplar-set it represents: exset-repempty = exset-rep< d; r; ; empty; empty > = fd; rg 6.4 exset-rep< d; r; split; tree ; tree > = left right exset-reptree [fd; rg[ exset-reptree left right 6-3 [2,5] Figure 6.1 A 2d-tree of four elements. [6,3] [3,8] The splitting planes are not indicated. The [2,5] no de splits along the y = 5 plane and the [3,8] no de splits [8,9] along the x = 3 plane. [8,9] [3,8] Figure 6.2 w the tree of Figure 6.1 [2,5] Ho splits up the x,y plane. [6,3] 6-4 The invariant is that subtrees only ever contain dom-elt s which are on the correct side of all their ancestors' splitting planes. Is-legal-kdtreeempty: Is-legal-kdtree< d; r; ; empty; empty >: Is-legal-kdtree< d; r; split ; tree ; tree > , left right 0 0 0 d ^ 8d ; r 2 exset-reptree d 6.5 left split split 0 0 0 >d ^ 8d ; r 2 exset-reptree d right split split Is-legal-kdtreetree ^ left Is-legal-kdtreetree right 6.3.2 Constructing a kd-tree Given an exemplar-set E,akd-tree can b e constructed by the algorithm in Table 6.3. The pivot- cho osing pro cedure of Step 2 insp ects the set and cho oses a \go o d" domain vector from this set to use as the tree's ro ot. The discussion of how sucharootischosen is deferred to Section 6.7. Whichever exemplar is chosen as ro ot will not a ect the correctness of the kd-tree, though the tree's maximum depth and the shap e of the hyp erregions wil l b e a ected. 6.4 Nearest Neighb our Search In this section, I describ e the nearest neighb our algorithm which op erates on kd-trees. I b egin with an informal description and worked example, and then give the precise algorithm. A rst approximation is initially found at the leaf no de which contains the target p oint. In Figure 6.3 the target p oint is marked X and the leaf no de of the region containing the target is coloured black. As is exempli ed in this case, this rst approximation is not necessarily the nearest neighb our, but at least we knowany p otential nearer neighb our must lie closer, and so must lie within the circle centred on X and passing through the leaf no de. Wenow back up to the parent of the current no de. In Figure 6.4 this parent is the black no de. We compute whether it is possible for a closer solution to that so far found to exist in this parent's other child. Here it is not p ossible, b ecause the circle do es not intersect with the shaded space o ccupied by the parent's other child. If no closer neighb our can exist in the other child, the algorithm can immediately move up a further level, else it must recursively explore the other child. In this example, the next parent whichis checked will need to b e explored, b ecause the area it covers i.e.