Fast K Nearest Neighbors Search Algorithm Based on Wavelet Transform

http://www.paper.edu.cn IEICE TRANS. FUNDAMENTALS/COMMUN./ELECTRON./INF. & SYST., VOL. E89-A, 2006 1 LETTER Fast K Nearest Neighbors Search Algorithm Based on Wavelet Transform Yu-Long Qiao, Zhe-Ming Lu Nonmember, Sheng-He Sun, Regular Member Summary This letter proposes a fast k nearest neighbors search the k nearest neighbors. The proposed fast search algorithm based on wavelet transform. The technique exploits algorithm based on these inequalities can largely reduce the important information of the approximation coefficients of the computational complexity. the transform coefficient vector, from which we obtain two crucial inequalities that can be used to reject those vectors for which it is impossible to be k nearest neighbors. The 2. The Theory Background computational complexity for searching k nearest neighbors can be largely reduced. Experimental results on texture classification We can use the Haar wavelet to iteratively decompose and verify the effectiveness of our algorithm. reconstruct a vector x [4], as follows, Key words: K Nearest Neighbors, wavelet transform, texture ⎧ image classification, vector quantization, image retrieval X s =1 2 X s + X s ⎪ j ,r [ j−1,2 r j−1,2r+1 ] ⎨ ⎪X d =1 2 X s − X s ⎩ j ,r []j−1,2 r j−1,2 r+1 1. Introduction ⎧X = 1 2 X + X ⎪ s j−1, 2r [ s j ,r d j ,r ] (1) ⎨ ⎪X s = 1 2 X s − X d The k nearest neighbors search plays an important role in ⎩ j−1, 2r+1 []j ,r j ,r many fields of signal processing including image retrieval, where X and X are approximate and detail vector quantization, texture classification, and so on [1]. s j ,r d j ,r Its theory is straightforward: for a data set of n vectors, coefficients at scale j, respectively. The original spatial 1 2 n vector x is denoted by X at scale 0. When the dimension Ω ={y , y ", y } , and a query vector q, find k closest s0 vectors of q from this data set. Assume that the dimension of x, N, satisfies to 2 L−1 < N ≤ 2 L , x can be fully of a vector is N. The archaic technique, exhaust search, decomposed after L-level wavelet transform, that is, there needs nN multiplications and (2N-1)n additions. This is only one approximate coefficient denoted by . In X s computational burden largely limits its application. L,0 the rest, except where specifically noted, transform means Many fast algorithms [1, 2, 3] have been proposed to full decomposition with Haar wavelet. Due to the reduce the computational complexity. Here, we only orthogonality of the Haar wavelet transform, there is review two fast algorithms based on wavelet transform for two vectors x and y. X and Y are closely related to our algorithm. References [2] and [3] d ( x, y) = d ( X ,Y ) search the k closest vectors in the wavelet domain. The transform coefficient vectors of x and y, respectively. N −1 motivation is that the time for performing the wavelet 2 is the Euclidean distance between d(x, y) = ∑(xi − yi ) transform is low and the energy of the vector is compacted i=0 on a few coefficients. Hwang and Wen [2] proposed a fast x and y. Thus, the k nearest neighbors, searching in the algorithm based on partial distance search in the wavelet spatial domain, are the same as that in the wavelet domain. domain (WKPDS). Although their method speeds up the k Assume the current k nearest neighbors of the query nearest neighbors search, it does not adequately make use j j jk vector Q are Y 1 , Y 2 , ", Y in the checked vectors of of the transform coefficients. Pan et al. [3] exploited the ~ 1 2 n (Q, 1 n−1 n are transformed important information of fully decomposed vectors with Ω = {Y , Y ,", Y } Y , ", Y and Y the Haar wavelet and proposed a fast search algorithm. coefficient vectors of q, y1, ", y n−1 and y n , respectively), Their technique largely reduces the search time compared and the corresponding k distances are d(Q,Y j1 ) ≤ d(Q,Y j2 ) with the method WKPDS. At the same time, it also j i ~ ≤" ≤ d(Q,Y k ) . For a vector Y ∈ Ω , to get the relation preserves other advantages of WKPDS. between i and jk , WKPDS utilizes the In this letter, we study the information content taken d(Q,Y ) d(Q,Y ) by the approximate coefficients (this concept is related to following method, if p wavelet transform) and obtain two inequalities that can be i 2 2 jk , p ≤ N′ −1 ∑(Qr − Yr ) ≥ d (Q,Y ) used to reject those vectors for which it is impossible to be r =0 Manuscript received June 2005. Manuscript revised March 2006. The authors are with Department of Automatic Test and Control, Harbin Institute of Technology, Harbin, China. 转载中国科技论文在线 http://www.paper.edu.cn QIAO, LU AND SUN. FAST K NEAREST NEIGHBORS SEARCH ALGORITHM BASED ON WAVELET TRANSFORM 2 then i jk d(Q,Y ) ≥ d(Q,Y ) X s =1 2 (X s + X s ) L−1,0 L−2,0 L−2,1 where N′ is the dimension of a transform coefficient =1 2 ()1 2 ()X s + X s +1 2 ()X s + X s vector. Pan et al. [3] adopted an additional judgment L−3,0 L−3,1 L−3,2 L−3,3 L−1− j L− j condition, if 1 2 −1 1 2 −1 =" = X s = X s =" L−1− j ∑ j ,i L− j ∑ j−1,i N N 2 i=0 2 i=0 i jk or i jk (2) Ys ≤ Qs − d(Q,Y ) Ys ≥ Qs + d(Q,Y ) L−1 L−1 L,0 L,0 2 L L,0 L,0 2 L 1 2 −1 1 2 −1 2L−1 = X s = xi = M x′ then i jk . L−1 ∑ 0,i L−1 ∑ L−1 d(Q,Y ) ≥ d(Q,Y ) 2 i=0 2 i=0 2 Where Q and Y i are the approximate coefficients of q L sL,0 sL,0 Pan et al. [3] have stated that X equals to N ⋅ M 2 . sL,0 x i and y at scale L, respectively. Once d(Q,Y i ) < d(Q,Y jk ) , Since , there is X s = 1 2 (X s + X s ) L,0 L−1,0 L−1,1 jk i replace Y with Y , and then reorder the new current k L−1 L−1 L−1 X = 2X − X = N ⋅ M 2 − 2 ⋅ M ′ 2 nearest neighbors and the corresponding k distances as [2]. sL−1,1 sL,0 sL−1,0 x x When all vectors of the data set are checked, the current k 1 N −1 N − 2L−1 = xi = M x′′ closest vectors are the final k nearest neighbors. L−1 ∑ L−1 2 i=2 L−1 2 In the same manner, we can obtain L−1 L−1 and L−1 L−1 Y = 2 ⋅ M ′ 2 Y = ()N − 2 ⋅ M ′′ 2 3. Proposed Algorithm sL−1,0 y sL−1,1 y Guan and Kamel [5] have presented the inequality Here we introduce our new algorithm. If the vector x is d(x, y) ≥ N M − M transformed by performing (L-1)-level decomposition with x y the Haar wavelet, we say that x is sub-fully decomposed. By combining the above formulae, we can easily get Now, we present the core proposition of our fast algorithm. X −Y = 2L−1 ⋅M 2L−1 − 2L−1 ⋅M 2L−1 sL−1,0 sL−1,0 x′ y′ Proposition. For two N-dimensional vectors x and y, L−1 L−1 L−1 L−1 2L−1 < N ≤ 2L , after sub-fully decomposed with Haar ≤ 2 2 d(x′, y′) = 2 2 d(X′,Y′) wavelet, there are ≤ d(X,Y ) = d(x, y) 2L−1 The second inequality of Eq.3 can be similarly proved. X −Y ≤ d(X ′,Y ′) ≤ d(X,Y) = d(x, y) sL−1,0 sL−1,0 L−1 2 X sL ,0 N − 2L−1 X −Y ≤ d(X ′′,Y ′′) Lth Level Wavelet Decomposition sL−1,1 sL−1,1 2L−1 (3) L−1 L−1 X X N − 2 N − 2 sL−1,0 sL−1,1 ≤ d(X,Y ) = d(x, y) 2L−1 2L−1 where and are the approximate coefficients at X s Ys X s X s X s X L−1,r L−1,r 1,0 1,1 1,2 s1,⎣⎦N 2 First Level scale L-1, r=0,1. X ′ and X ′′ are the transform coefficient Wavelet Decomposition vectors of x′ and x′′ , respectively. x′ denotes a sub- L−1 X X X X X X X vector of vector x that is constituted by the first 2 s0,0 s0,1 s0, 2 s0,3 s0, 4 s0,5 s0,N −1 elements of x. x′′ , comprising of the last N − 2L−1 Fig. 1 The relationship among approximate coefficients elements of x, is the other sub-vector of x. For Y ′ and Y ′′, at neighboring scales the meanings are similar with X ′ and X ′′ , respectively. i L−1 L−1 jk Notice that if Proof: Let x = (x , x ,", x ) , ′ " L−1 Ys − Qs ≥ (N − 2 ) 2 d(Q,Y ) 0 1 N −1 x = (x0, x1, , x2 −1) L −1,1 L−1,1 j i jk i k and x′′ = (x L−1 , x L−1 ,", x ) .

Load more