http://www.paper.edu.cn

IEICE TRANS. FUNDAMENTALS/COMMUN./ELECTRON./INF. & SYST., VOL. E89-A, 2006 1

LETTER Fast K Nearest Neighbors Search Algorithm Based on Transform

Yu-Long Qiao, Zhe-Ming Lu Nonmember, Sheng-He Sun, Regular Member

Summary This letter proposes a fast k nearest neighbors search the k nearest neighbors. The proposed fast search algorithm based on . The technique exploits algorithm based on these inequalities can largely reduce the important information of the approximation coefficients of the computational complexity. the transform coefficient vector, from which we obtain two crucial inequalities that can be used to reject those vectors for which it is impossible to be k nearest neighbors. The 2. The Theory Background computational complexity for searching k nearest neighbors can be largely reduced. Experimental results on texture classification We can use the Haar wavelet to iteratively decompose and verify the effectiveness of our algorithm. reconstruct a vector x [4], as follows, Key words: K Nearest Neighbors, wavelet transform, texture ⎧ image classification, vector quantization, image retrieval X s =1 2 X s + X s ⎪ j ,r [ j−1,2 r j−1,2r+1 ] ⎨ ⎪X d =1 2 X s − X s ⎩ j ,r []j−1,2 r j−1,2 r+1 1. Introduction ⎧X = 1 2 X + X ⎪ s j−1, 2r [ s j ,r d j ,r ] (1) ⎨ ⎪X s = 1 2 X s − X d The k nearest neighbors search plays an important role in ⎩ j−1, 2r+1 []j ,r j ,r many fields of signal processing including image retrieval, where X and X are approximate and detail vector quantization, texture classification, and so on [1]. s j ,r d j ,r Its theory is straightforward: for a data set of n vectors, coefficients at scale j, respectively. The original spatial 1 2 n vector x is denoted by X at scale 0. When the dimension Ω ={y , y ", y } , and a query vector q, find k closest s0 vectors of q from this data set. Assume that the dimension of x, N, satisfies to 2 L−1 < N ≤ 2 L , x can be fully of a vector is N. The archaic technique, exhaust search, decomposed after L-level wavelet transform, that is, there needs nN multiplications and (2N-1)n additions. This is only one approximate coefficient denoted by . In X s computational burden largely limits its application. L,0 the rest, except where specifically noted, transform means Many fast algorithms [1, 2, 3] have been proposed to full decomposition with Haar wavelet. Due to the reduce the computational complexity. Here, we only of the Haar wavelet transform, there is review two fast algorithms based on wavelet transform for two vectors x and y. X and Y are closely related to our algorithm. References [2] and [3] d ( x, y) = d ( X ,Y ) search the k closest vectors in the wavelet domain. The transform coefficient vectors of x and y, respectively. N −1 motivation is that the time for performing the wavelet 2 is the Euclidean distance between d(x, y) = ∑(xi − yi ) transform is low and the energy of the vector is compacted i=0 on a few coefficients. Hwang and Wen [2] proposed a fast x and y. Thus, the k nearest neighbors, searching in the algorithm based on partial distance search in the wavelet spatial domain, are the same as that in the wavelet domain. domain (WKPDS). Although their method speeds up the k Assume the current k nearest neighbors of the query nearest neighbors search, it does not adequately make use j j jk vector Q are Y 1 , Y 2 , ", Y in the checked vectors of of the transform coefficients. Pan et al. [3] exploited the ~ 1 2 n (Q, 1 n−1 n are transformed important information of fully decomposed vectors with Ω = {Y , Y ,", Y } Y , ", Y and Y the Haar wavelet and proposed a fast search algorithm. coefficient vectors of q, y1, ", y n−1 and y n , respectively), Their technique largely reduces the search time compared and the corresponding k distances are d(Q,Y j1 ) ≤ d(Q,Y j2 ) with the method WKPDS. At the same time, it also j i ~ ≤" ≤ d(Q,Y k ) . For a vector Y ∈ Ω , to get the relation preserves other advantages of WKPDS. between i and jk , WKPDS utilizes the In this letter, we study the information content taken d(Q,Y ) d(Q,Y ) by the approximate coefficients (this concept is related to following method, if p wavelet transform) and obtain two inequalities that can be i 2 2 jk , p ≤ N′ −1 ∑(Qr − Yr ) ≥ d (Q,Y ) used to reject those vectors for which it is impossible to be r =0 Manuscript received June 2005. Manuscript revised March 2006.

The authors are with Department of Automatic Test and Control,

Harbin Institute of Technology, Harbin, China. 转载 中国科技论文在线 http://www.paper.edu.cn

QIAO, LU AND SUN. FAST K NEAREST NEIGHBORS SEARCH ALGORITHM BASED ON WAVELET TRANSFORM 2

then i jk d(Q,Y ) ≥ d(Q,Y ) X s =1 2 (X s + X s ) L−1,0 L−2,0 L−2,1 where N′ is the dimension of a transform coefficient =1 2 ()1 2 ()X s + X s +1 2 ()X s + X s vector. Pan et al. [3] adopted an additional judgment L−3,0 L−3,1 L−3,2 L−3,3 L−1− j L− j condition, if 1 2 −1 1 2 −1 =" = X s = X s =" L−1− j ∑ j ,i L− j ∑ j−1,i N N 2 i=0 2 i=0 i jk or i jk (2) Ys ≤ Qs − d(Q,Y ) Ys ≥ Qs + d(Q,Y ) L−1 L−1 L,0 L,0 2 L L,0 L,0 2 L 1 2 −1 1 2 −1 2L−1 = X s = xi = M x′ then i jk . L−1 ∑ 0,i L−1 ∑ L−1 d(Q,Y ) ≥ d(Q,Y ) 2 i=0 2 i=0 2 Where Q and Y i are the approximate coefficients of q L sL,0 sL,0 Pan et al. [3] have stated that X equals to N ⋅ M 2 . sL,0 x i and y at scale L, respectively. Once d(Q,Y i ) < d(Q,Y jk ) , Since , there is X s = 1 2 (X s + X s ) L,0 L−1,0 L−1,1 jk i replace Y with Y , and then reorder the new current k L−1 L−1 L−1 X = 2X − X = N ⋅ M 2 − 2 ⋅ M ′ 2 nearest neighbors and the corresponding k distances as [2]. sL−1,1 sL,0 sL−1,0 x x When all vectors of the data set are checked, the current k 1 N −1 N − 2L−1 = x = M ′′ closest vectors are the final k nearest neighbors. L−1 ∑ i L−1 x 2 i=2 L−1 2 In the same manner, we can obtain L−1 L−1 and L−1 L−1 Y = 2 ⋅ M ′ 2 Y = ()N − 2 ⋅ M ′′ 2 3. Proposed Algorithm sL−1,0 y sL−1,1 y Guan and Kamel [5] have presented the inequality Here we introduce our new algorithm. If the vector x is d(x, y) ≥ N M − M transformed by performing (L-1)-level decomposition with x y the Haar wavelet, we say that x is sub-fully decomposed. By combining the above formulae, we can easily get Now, we present the core proposition of our fast algorithm. X −Y = 2L−1 ⋅M 2L−1 − 2L−1 ⋅M 2L−1 sL−1,0 sL−1,0 x′ y′ Proposition. For two N-dimensional vectors x and y, L−1 L−1 L−1 L−1 2L−1 < N ≤ 2L , after sub-fully decomposed with Haar ≤ 2 2 d(x′, y′) = 2 2 d(X′,Y′) wavelet, there are ≤ d(X,Y ) = d(x, y) 2L−1 The second inequality of Eq.3 can be similarly proved. X −Y ≤ d(X ′,Y ′) ≤ d(X,Y) = d(x, y) sL−1,0 sL−1,0 L−1 2 X sL ,0 N − 2L−1 X −Y ≤ d(X ′′,Y ′′) Lth Level Wavelet Decomposition sL−1,1 sL−1,1 L−1 (3) 2 L−1 L−1 X X N − 2 N − 2 sL−1,0 sL−1,1 ≤ d(X,Y ) = d(x, y) 2L−1 2L−1 where and are the approximate coefficients at X s Ys X s X s X s X L−1,r L−1,r 1,0 1,1 1,2 s1,⎣⎦N 2 First Level scale L-1, r=0,1. X ′ and X ′′ are the transform coefficient Wavelet Decomposition vectors of x′ and x′′ , respectively. x′ denotes a sub- L−1 X X X X X X X vector of vector x that is constituted by the first 2 s0,0 s0,1 s0, 2 s0,3 s0, 4 s0,5 s0,N −1 elements of x. x′′ , comprising of the last N − 2L−1 Fig. 1 The relationship among approximate coefficients elements of x, is the other sub-vector of x. For Y ′ and Y ′′, at neighboring scales the meanings are similar with X ′ and X ′′ , respectively. Notice that if i L−1 L−1 jk Proof: Let x = (x , x ,", x ) , ′ " L−1 Ys − Qs ≥ (N − 2 ) 2 d(Q,Y ) 0 1 N −1 x = (x0, x1, , x2 −1) L −1,1 L−1,1 j i jk i k and x′′ = (x L−1 , x L−1 ,", x ) . The mean values of x, x′ or Y − Q ≥ d(Q,Y ) , there is d(Q,Y ) ≥ d(Q,Y ) 2 2 +1 N−1 sL−1,0 sL−1,0 1 N −1 1 2 L−1 −1 due to Eq.3. That is to say, if and x′′ are given as M = x , M = x and x ∑ i x′ L−1 ∑ i i j i j N 2 Y ≤ Q − d(Q,Y k ) or Y ≥ Q + d(Q,Y k ) , i=0 i=0 sL−1,0 sL−1,0 sL−1,0 sL−1,0 1 N −1 M = x , respectively. From the Haar wavelet or Y i ≤ Q − (N − 2 L−1 ) 2 L−1 d(Q,Y jk ) x′′ L−1 ∑ i sL−1,1 sL−1,1 N − 2 i=2L−1 or Y i ≥ Q + (N − 2 L−1 ) 2 L−1 d(Q,Y jk ) , (4) transform of Eq.1, it can been seen that the approximate sL−1,1 sL−1,1 coefficient X at scale j has and only has relation to two then d(Q,Y i ) ≥ d(Q,Y jk ) . s j ,r approximate coefficients X and X at scale j −1, With the above results in hand, one can determine whether s j−1,2r s j−1,2r+1 Y i is closer than Y jk to Q with three steps. Firstly, it which is shown in Fig.1. So, checks the vector Y i with Eq.2 as [3]. If we can not get

中国科技论文在线 http://www.paper.edu.cn

IEICE TRANS. FUNDAMENTALS/COMMUN./ELECTRON./INF. & SYST., VOL. E89-A, 2006 3 conclusion, Eq.4 is employed. Once any inequality of Eq.4 to k/2). When , set . If J − ⎣k / 2⎦ <1 ji = i i j is true, there is d(Q,Y ) ≥ d(Q,Y k ) . Otherwise, the , set . Let and J − ⎣k / 2⎦ > n − k +1 ji = n − k + i UpI = jk algorithm WKPDS is adopted to get the relation between LowI = j1 . Then, reorder the k distances such that d(Q,Y i ) and d(Q,Y jk ) . During the judgment stage with d ≤ d ≤" ≤ d and the corresponding k closest vectors Eq.4, Y i and Y i can be gotten from Y i and Y i with 1 2 k sL−1,0 sL−1,1 sL,0 d L,0 are also rearranged. Set l=UpI+1. The search order is the inverse Haar wavelet transform. So, it needs no extra performed in up and down manner as in Fig. 2. storage to store them before searching k nearest neighbors. However, we notice there are two multiplication nth vector operators in the inverse Haar wavelet transform of the Rejected (p+1)th step Eq.1, which will consume a great deal of time during the pth step search process. So we modify it as ⎧ X = X + X ⎪ m sL −1,0 sL,0 d L,0 ⎨ 3rd step X = X − X 1st step ⎩⎪ m sL −1,1 sL,0 d L,0 UpIth vector

Then Eq.3 becomes Jth vector L 2 m X s − mYs ≤ d(X ′,Y ′) ≤ 2d(X,Y ) LowIth vector L−1,0 L−1,0 2L−1 2nd step 4th step 2N − 2 L 2N − 2L X − Y ≤ d(X ′′,Y ′′) ≤ d(X,Y ) m sL−1,1 m sL−1,1 L−1 L−1 2 2 pth step

Accordingly, Eq.4 is adjusted as, if (p+1)th step Rejected Y i ≤ Q − 2d(Q,Y jk ) m sL−1,0 m sL−1,0 1st vector or Y i ≥ Q + 2d (Q,Y jk ) , m sL−1,0 m sL−1,0 Fig. 2 Search order or Y i ≤ Q − (2N − 2 L ) 2 L−1 d(Q,Y jk ) m sL−1,1 m sL−1,1

or i L L−1 jk , (5) Step 3: Check the termination of this program (There is mYs ≥ m Qs + (2N − 2 ) 2 d(Q,Y ) L−1,1 L−1,1 no vector to be tested). Test the Eq.2 on vector Y l as i jk then d(Q,Y ) ≥ d(Q,Y ) . follows. When 1 ≤ l < LowI , perform step I). Otherwise, This modified version can save many multiplication go to step II). operations. So, Eq.5 replaces Eq.4 and serves as the I) If the first inequality is satisfied, then vectors Y j , ~ second judgment condition. Now, our proposed algorithm 1 ≤ j ≤ l , are rejected and choose Y l from Ω is summarized as follows: according to the search order shown in Fig.2, go to Step 0: Transform all vectors of the vector set Ω and Step 3. Otherwise, go to Step 4. sort them in ascending order according to their II) If the second inequality is satisfied, then vectors approximate coefficients at the coarsest scale. Assume the j l ~ ~ i Y , l ≤ j ≤ n , are kicked out and choose Y from Ω preprocessed vector set is Ω = {Y 1 , Y 2 , ", Y n }. Where Y according to the search order shown in Fig.2, go to is the transform coefficient vector of vector i , . y i = 1, ", n Step 3. Otherwise, go to Step 4. For simplification of expression, we impliedly assume that Step 4: Compute Y l and Y l , then check Eq.5 on the order of the original vector set is the same as that of m sL−1,0 m sL−1,1 the rearranged transform coefficient vector set. Those Y l . If one of the inequalities is true, we can reject Y l since procedures can be performed off-line. it is impossible for it to be one of the k nearest neighbors. ~ Step 1: For a query vector q, fully decompose it with Then choose Y l from Ω according to the search order Haar wavelet and get the transform coefficient vector Q. shown in Fig.2, go to Step 3. Otherwise, go to the next During the decomposition, store Q and Q . step. m sL−1,0 m sL−1,1 Step 5: Perform WKPDS and determine the relation Step 2: Compute j (J is the l l j J = argmin Q − Y k s L,0 s L,0 between d(Q,Y ) and dk. If d(Q,Y )

中国科技论文在线 http://www.paper.edu.cn

QIAO, LU AND SUN. FAST K NEAREST NEIGHBORS SEARCH ALGORITHM BASED ON WAVELET TRANSFORM 4 direction and continue in another direction. For example, 400 sample images of size 128×128 are randomly chosen the vectors in the upper direction have been rejected at p-1 from the original image, 300 sample images are used as step, then p step and p+1 step only check vectors in the training samples, and 100 samples are used to test the down direction. When the algorithm is over, the final “so algorithm. In this letter, we use the densities of the wavelet far” k closest vectors are the k nearest neighbors of the coefficients’ extrema as feature vector [7] to classify the query vector Q. Our proposed fast algorithm has the texture image, by which the above-mentioned algorithms following advantages. First, it finds the same k nearest are compared. The [8] is used to extract neighbors. Second, compared with WKPDS and the features of these texture images. The k nearest neighbors algorithm of Pan et al. [3], the preprocessed vector set rule [9] serves as the classification method. needs no extra storage. Finally, it is much faster than the Table 1 lists the experimental results on classification two techniques mentioned above. time. “Dimension” is the number of features. “Algorithm” indicates which k nearest neighbors search algorithm is used during classifying the texture image with k nearest 4. Simulation Results neighbors rule. “k=5” means that we use 5NN classifier. Other values of k have similar meanings. “Error number” This part presents some simulation results for texture is the number of classification errors. “Time” is the image classification. 30 Brodatz texture images [6] are classification time. The unit of time is seconds. used as the test material for experiments. Each image is of size 512×512 pixels with 256 gray levels. For each image,

Table 1: Comparison of classification time k=5 k=3 k=1 Dimension Algorithm Error Error Error Time(s) Time(s) Time(s) number number number WKPDS 6 20.34 6 19.95 3 18.60 8 Pan et al. [3] 6 2.04 6 1.70 3 1.13 Our Algorithm 6 1.10 6 0.84 3 0.48 WKPDS 6 28.55 3 27.73 4 24.59 13 Pan et al. [3] 6 4.28 3 3.64 4 2.50 Our Algorithm 6 2.58 3 2.08 4 1.23 WKPDS 43 26.64 42 25.67 34 23.13 16 Pan et al. [3] 43 5.95 42 5.16 34 3.78 Our Algorithm 43 3.84 42 3.17 34 2.05

Compared with algorithms of Pan et al. [3], the algorithm and the method of Pan et al. [3] is that Eq.5 is average decrease of classification time is about 38.7%, adopted. So the item, “Num”, can be used to indicate how 42.0% and 49.3% when k= 5, 3 and 1, respectively. The many vectors have been rejected by the new equation (5). largest decrease of classification time is around 57.9%. When compared with WKPDS, the smallest decrease is Table 2: Arithmetic complexity for 8-dimensional feature vectors about 86%. From the experimental results, we can see that Algorithm Mult Add Comp Num Exhaust Search 72000 135000 8999 // our proposed algorithm largely reduces the computational WKPDS 13125.0 35249.3 22103.0 9000 burden of the k nearest neighbor search while finding the Pan et al. [3] 1196.4 2379.7 1696.4 495.6 same vectors. Our Algorithm 398.0 1776.2 2661.8 90.9 In order to roundly evaluate the computational complexity of our proposed method, the comparison Table 3: Arithmetic complexity for 13-dimensional feature vectors results about required addition (Add), multiplication (Mult) Algorithm Mult Add Comp Num and comparison (Comp) per vector for finding the nearest Exhaust Search 117000 225000 8999 // neighbor are shown in Tables 2 to 4. “Num” denotes the WKPDS 19608.0 48202.4 28565.09000 average number of vectors in the design set on which we Pan et al. [3] 2628.4 5229.8 3589.9 978.5 perform WKPDS for a test feature vector. Our algorithm Our Algorithm 1072.8 4077.7 5649.6 205.4 uses Eq.2 and Eq.5 to reject those vectors impossible to be the k nearest neighbors. Those vectors that can not be Table 4: Arithmetic complexity for 16-dimensional feature vectors rejected by the two equations must be determined by Algorithm Mult Add Comp Num WKPDS, which is the dominant contribution to the Exhaust Search 144000 279000 8999 // computational complexity. The difference between our WKPDS 17920.3 44822.5 26874.3 9000 Pan et al. [3] 3878.7 7728.3 5337.9 1479.5

中国科技论文在线 http://www.paper.edu.cn

IEICE TRANS. FUNDAMENTALS/COMMUN./ELECTRON./INF. & SYST., VOL. E89-A, 2006 5

Our Algorithm 1726.9 6385.9 8888.5 397.3 References [1] J. McNames, “A fast nearest-neighbor algorithm based on a From the complexity analysis of the above tables, we principal axis search tree,” IEEE Trans. Pattern Anal. Machine can see that our fast algorithm reduces a large amount of Intell., vol. 23, no. 9, pp. 964-976, 2001. multiplication and addition operations at the price of an [2] W. J. Hwang and K. W. Wen, “Fast kNN classification increase in the comparison operations, so that the overall algorithm based on partial distance search”, Electron. Lett. vol. computational burden is reduced. The reason lies in the 34, no. 21, pp. 2062-2063, 1998. fact that the computational cost of multiplication and [3] J. S. Pan, Y. L. Qiao and S. H. Sun, “A fast K nearest addition operations is much heavier than that of the neighbors classification algorithm,” IEICE Trans. Fundamentals, vol. E87-A, no. 4, pp.961-963, 2004. comparison operation. The item, “Num”, indicates that our [4] M. Vetterli and J. Kovacevic, Wavelet and Subband Coding, proposed Eq.5 rejects a lot of vectors impossible to be the Prentice Hall, 1995. nearest neighbor, which largely reduce the computational [5] L. Guan and M. Kamel, “ Equal-average hyperplane complexity. partitioning method for vector quantization of image data.” Pattern Recognit. Lett., vol. 13, no. 10, pp. 693-699, 1992. [6] P. Brodatz, Textures: A Photographic Album for Artists & Conclusion Designers. New York: Dover, New York, 1966. [7] J. S. Pan and J. W. Wang, “Texture segmentation using This letter proposes a fast k nearest neighbors search separable and non-separable wavelet frames,” IEICE Trans. algorithm that adequately makes use of the important Fundamentals, vol. E82-A, no. 8, pp. 1463-1474, 1999. contents of the approximate coefficients of a vector. The [8] S. Mallat and S. Zhong, “Characterization of signals from proposed algorithm rejects those vectors impossible to be multiscale edges,” IEEE Trans. Patt. Anal. and Mach. Intell. vol. 14. no. 7, pp. 710-732, 1992. the k nearest neighbors by using two important inequalities. [9] S. Theodoridis and K. Koutroumbas, Pattern It largely speeds up searching k nearest neighbors, which Recognition, China Machine Press, pp. 44-45, 2003. is confirmed with the experimental results.