<<

MANUSCRIPT TO TPAMI. PERSONAL USE OF THESE MATERIALS IS PERMITTED. HOWEVER, PERMISSION TO REPRINT/REPUBLISH THIS MATERIAL FOR ADVERTISING OR PROMOTIONAL PURPOSES OR FOR CREATING NEW COLLECTIVE WORKS FOR RESALE OR REDISTRIBUTION TO SERVERS OR LISTS, OR TO REUSE ANY COPYRIGHTED COMPONENT OF THIS WORK IN OTHER WORKS MUST BE OBTAINED FROM IEEE. 1 Fast for Walsh Hadamard Transform on Sliding Windows

Wanli Ouyang, W.K. Cham, Senior Member, IEEE

Abstract—This paper proposes a fast algorithm for Walsh Hadamard Transform on sliding windows which can be used to implement pattern matching most efficiently. The computational requirement of the proposed algorithm is about 1.5 additions per projection vector per sample which is the lowest among existing fast for Walsh Hadamard Transform on sliding windows.

Index Terms— Fast algorithm, Walsh Hadamard Transform, Pattern Matching, template matching, feature extraction

The rest of the paper is organized as follows. Section 2 de- fines terms and symbols used in this paper. Then the WHT 1 INTRODUCTION algorithm in [9] is briefly introduced. In Section 3, we intro- Pattern matching, also named as template matching, is widely duce two examples of the proposed algorithm. Section 4 illus- used in signal processing, computer vision, image and video trates the proposed algorithm for 1-D order-N WHT. The algo- processing. Pattern matching has found application in rithm computes order-N WHT using order-4 and order-N/4 manufacturing for quality control [1], image based rendering WHT. In Section 5, the number of additions required by the [2], [3], object detection [4] and video proposed algorithm is derived. Section 6 gives the experimen- compression. The block matching algorithm used for video tal result of motion estimation in video coding application, compression can be considered as a pattern matching problem which utilizes the proposed algorithm for computing 2-D [5]-[8]. To relieve the burden of high complexity and high WHT on sliding windows. Finally, Section 7 presents conclu- requirement of computational time for pattern matching, a lot sions. of fast algorithms have been proposed [9]-[13]. It has been found that pattern matching can be performed efficiently in Walsh Hadamard Transform (WHT) domain [9]. 2 WALSH HADAMARD TRANSFORM ON SLIDING In pattern matching, signal vectors obtained by a sliding win- WINDOWS dow need to be compared to a sought pattern. Hel-Or and Hel- Or’s algorithm [9] requires 2N-2 additions for obtaining all 2.1 Definitions WHT projection values in each window of size N. Note that Consider K input signal elements xn where n=0, 1, …, K-1, one subtraction is considered to be one addition regarding the which will be divided into overlapping windows of size N computational complexity in this paper. Their algorithm (K>N). Let the jth input window be: r T achieves efficiency by utilizing previously computed values in - X N (j)= [xj, xj+1, …, xj+N 1] for j = 0, 1, …, K-N. (1) the internal vertices of the tree structure in Fig. 1. Recently, A 1-D order-N WHT transforms N numbers into N projection the Gray Code Kernel (GCK) algorithm [10] which utilizes values. Let MN be an order-N WHT matrix and previously computed values in the leaves of the tree structure r r r T M= [ M (0), M (1), …, M (N-1)] , (2) in Fig. 1 was proposed. The GCK algorithm requires similar N N N N where r is the ith WHT vector. computation as [9] when all projection values are computed M N (i) Let y (i, j) for i = 0, 1, …N-1; j = 0, 1, …, K-N be the ith and requires less computation when only a small number of N WHT projection value for the jth window and projection vectors are computed. r T r yN(i,j)= M N(i) X N(j). (3) This paper proposes a fast algorithm for WHT on sliding r In [10], M (i) and yN(i, j) are called the ith projection kernel windows. Instead of performing order-N WHT by means of N r order-N/2 WHT and N additions in the tree structure, which is and projection result respectively. Let Y N(j) be the projection the technique adopted in [9], the proposed algorithm computes vector containing all projection values of the jth window and r T r order-N WHT by means of order-N/4 WHT and N+1 additions. Y N(j)=[yN(0, j), yN(1, j), …, yN(N-1, j)] = MN X N(j). (4) In this way, the proposed algorithm can obtain all WHT pro- For example, when N=4, we have: jection values using about 3N/2 additions per window. In the ⎛ y4 (0, j)⎞ ⎡1 1 1 1 ⎤⎛ x j ⎞ ⎜ ⎟ ⎢ ⎥⎜ ⎟ computation of partial projection values for sliding windows, y (1, j) 1 1 −1 −1 x . (5) v ⎜ 4 ⎟ ⎢ ⎥⎜ j+1 ⎟ r the proposed algorithm requires only 1.5 additions per projec- Y4 ( j) = ⎜ ⎟ = ⎜ ⎟ = M 4 X 4(j) y4 (2, j) ⎢1 −1 −1 1 ⎥ x j+2 tion vector for each window. As shown by experimental results ⎜ ⎟ ⎢ ⎥⎜ ⎟ ⎜ ⎟ ⎜ x ⎟ in Section 7, the computational time required by the proposed ⎝ y4 (3, j)⎠ ⎣1 −1 1 −1⎦⎝ j+3 ⎠ algorithm for computing ten or more projection values is about The WHT in (5) is in sequency order. Its basis vectors are 75% of that of the GCK algorithm. ordered according to the number of zero crossings. The relationship among WHTs in sequency order, dyadic order and ———————————————— natural order can be found in [15]. • The authors are with the Department of Electronic Engineering, The Chi- nese University of Hong Kong. E-mail: {wlouyang, wkcham}@ee.cuhk.edu.hk xxxx-xxxx/0x/$xx.00 © 200x IEEE

2 MANUSCRIPT TO TPAMI. PERSONAL USE OF THESE MATERIALS IS PERMITTED. HOWEVER, PERMISSION TO REPRINT/REPUBLISH THIS MATERIAL FOR ADVERTISING OR PROMOTIONAL PURPOSES OR FOR CREATING NEW COLLECTIVE WORKS FOR RESALE OR REDISTRIBUTION TO SERVERS OR LISTS, OR TO REUSE ANY COPYRIGHTED COMPONENT OF THIS WORK IN OTHER WORKS MUST BE OBTAINED FROM IEEE. 2.2 Previous WHT computation methods In [9], Hel-Or and Hel-Or proposed a fast algorithm that r r r computes Y N(j) from Y N/2(j) and Y N/2(j+N/2) using (6). ⎧yN / 2 ()⎣⎦i / 2 , j + yN / 2 ()⎣⎦i / 2 , j + N / 2 ,i%4 = 0 or 3 yN (i, j) = ⎨ , ⎩ yN / 2 ()⎣⎦i / 2 , j − yN / 2 ()⎣⎦i / 2 , j + N / 2 ,i%4 =1 or 2 (6) where % is the modulo operation; and ⌊ ·⌋ is the floor func- Fig. 2 Signal flow diagram of the Bottom up algorithm for window size tion. equal 4 As shown in Fig. 1, their algorithm first computes WHT projection values for window size N being 2, which are then used to compute WHT projection values for window size be- 3.2 Fast Algorithm for Window Size 8 ing 4 and so on. The computation starts at the root and moves The proposed algorithm and the GCK algorithm [10] for down the tree until the projection values represented by the window size being 8 are described in Table 2. The proposed r leaves are computed using (6). The algorithm in [9] requires 1 algorithm computes Y 8(j+2) for j=0, 1, …K-8, which is the r addition per window along each node of the tree in Fig. 1. The WHT projection vector in window j+2, using Y 8(j)=[y8(0,j), T GCK algorithm [10] utilizes previously computed order-N y8(1,j), …, y8(7,j)] , which is the computed projection vector in projection values for computing the current order-N projection window j. value. When a small number of projection values are com- Define tN/4(i, j) for i = 0, …, N/4-1; j = 0, 1, …, K-5N/4 as: puted, the GCK algorithm requires 2 additions per window for tN/4(i, j) = yN/4(i, j) - yN/4(i, j+N). (9) each projection value while the algorithm in [9] requires For N=8, we have: O(logN) additions. When all projection values are computed, ⎡t2 (0, j)⎤ ⎡y2 (0, j)⎤ ⎡y2 (0, j + 8)⎤ . both the algorithms in [9] and [10] require about 2N additions. ⎢ ⎥ = ⎢ ⎥ − ⎢ ⎥ ⎣t2 (1, j)⎦ ⎣ y2 (1, j)⎦ ⎣ y2 (1, j + 8)⎦ A new fast algorithm, which is more efficient than those re- From (3) and then (7), we have: ported in [9] and [10], is proposed in this paper. It can effi- ⎡t (0, j)⎤ ⎡1 1 ⎤⎡ x j ⎤ ⎡1 1 ⎤⎡x j+8 ⎤ ciently compute order-N WHT on sliding windows, i.e. yN(i, j) 2 = − ⎢ ⎥ ⎢ ⎥⎢x ⎥ ⎢ ⎥⎢x ⎥ for i = 0, …, P-1 (P≤N) and j = 0, 1, 2, …, K-N. ⎣t2 (1, j) ⎦ ⎣1 −1⎦⎣ j+1 ⎦ ⎣1 −1⎦⎣ j+9 ⎦ ⎡1 1 ⎤⎡ x − x ⎤ ⎡ d ( j) ⎤ = j j+8 = M 8 . (10) 3 FAST ALGORITHM FOR WHT ON SLIDING ⎢ ⎥⎢x − x ⎥ 2 ⎢ ⎥ ⎣1 − 1⎦⎣ j+1 j+9 ⎦ ⎣d 8 ( j + 1)⎦ WINDOWS FOR WINDOW SIZES 4 AND 8 As given by (10), t2(i, j) is the ith order-2 WHT projection T This section gives examples of computing order-N WHT on value of [d8(j), d8(j+1)] . According to Table 2 and (10), WHT sliding windows of sizes N=4 and 8 using the proposed algo- projection vector in window j+2 can be computed from those rithm. in window j as well as t2(i, j) as follows: ⎡y (0, j +2)⎤ ⎡ y (0, j) ⎤ ⎡−t (0, j)⎤ 3.1 Fast Algorithm for Window Size 4 8 8 2 ⎢ y (1, j +2)⎥ ⎢− y (2, j)⎥ ⎢ t (0, j) ⎥ The proposed algorithm and the GCK algorithm [10] for ⎢ 8 ⎥ ⎢ 8 ⎥ ⎢ 2 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ window size being 4 are described in Table 1. The proposed y8 (2, j +2) y8 (1, j) −t2 (0, j) r ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ . (11) algorithm computes Y (j+1) (the WHT projection values in y (3, j +2) − y (3, j) t (0, j) r 4 ⎢ 8 ⎥ = ⎢ 8 ⎥ + ⎢ 2 ⎥ window j+1) using Y 4(j) (the computed projection values in ⎢y8 (4, j +2)⎥ ⎢− y8 (4, j)⎥ ⎢ t2 (1, j) ⎥ th window j) as shown in Table 1. Except for the 0 projection ⎢y (5, j +2)⎥ ⎢ y (6, j) ⎥ ⎢−t (1, j)⎥ ⎢ 8 ⎥ ⎢ 8 ⎥ ⎢ 2 ⎥ value y4(0, j+1), the GCK algorithm utilizes the previous or- ⎢y8 (6, j +2)⎥ ⎢− y8 (5, j)⎥ ⎢ t2 (1, j) ⎥ der-4 projection values to compute the current order-4 projec- ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ tion value. ⎣y8 (7, j +2)⎦ ⎣ y8 (7, j) ⎦ ⎣−t2 (1, j)⎦

Define dN(j) as: In summary, (11) can be represented by: d (j) = x - x . (7) N j j+N ⎧ (−1)v+i{y (i, j) −t (v, j)},i = 0,3,4,7 We can see from Table 1 that the WHT projection values in 8 2 , (12) y8 (i, j + 2) = ⎨ v+i i window j+1 can be computed from those in window j and dN(j). ⎩(−1) {y8[i −(−1) , j]−t2 (v, j)},i =1,2,5,6 Therefore, we have where v =⌊ i/4⌋ . ⎡y4 (0, j +1)⎤ ⎡ y4 (0, j) ⎤ ⎡−d4 ( j)⎤ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ y (1, j +1) − y (2, j) d ( j) , (8) v ⎢ 4 ⎥ ⎢ 4 ⎥ ⎢ 4 ⎥ Y4 ( j +1) = = + ⎢y4 (2, j +1)⎥ ⎢ y4 (1, j) ⎥ ⎢−d4 ( j)⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎣y4 (3, j +1)⎦ ⎣− y4 (3, j)⎦ ⎣ d4 ( j) ⎦ and Fig. 2 shows the signal flow diagram. r Thus, after obtaining Y (j), the proposed algorithm obtains r 4 Y 4(j+1) by 5 additions as shown in (7)-(8) whereas the algo- rithm in [9] requires 6 additions and the GCK algorithm re- quires 8 additions.

WANLI OUYANG AND W. K. CHAM: FAST ALGORITHM FOR WALSH HADAMARD TRANSFORM ON SLIDING WINDOWS 3

TABLE 3 ⎡yN (8i + 0, j + N / 4)⎤ ⎡ yN (8i + 0, j) ⎤ ⎡−tN / 4 (2i + 0, j)⎤ COMPUTATION OF ALL ORDER-8 WHT PROJECTION VALUES IN ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ yN (8i +1, j + N / 4) − yN (8i + 2, j) tN / 4 (2i + 0, j) WINDOW J+2 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢yN (8i + 2, j + N / 4)⎥ ⎢ yN (8i +1, j) ⎥ ⎢−tN / 4 (2i + 0, j)⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ,(14) Step a d8(j+1) = xj+1‐xj+9. ⎢yN (8i +3, j + N / 4)⎥ ⎢− yN (8i +3, j)⎥ ⎢ tN / 4 (2i + 0, j) ⎥ ‐ One addition is required. = + ⎢yN (8i + 4, j + N / 4)⎥ ⎢− yN (8i + 4, j)⎥ ⎢ tN / 4 (2i +1, j) ⎥ T Step b t2(0,j) = [1, 1] [ d8(j), d8(j+1)] , ⎢y (8i +5, j + N / 4)⎥ ⎢ y (8i + 6, j) ⎥ ⎢−t (2i +1, j)⎥ T N N N / 4 t2(1,j) = [1, ‐1] [d8(j), d8(j+1)] . ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢y (8i + 6, j + N / 4)⎥ ⎢− y (8i +5, j)⎥ ⎢ t (2i +1, j) ⎥ ‐ Two additions are required. Note that d8(j) was obtained N N N / 4 r ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ during computation of Y8(j+1) in Step a. ⎣yN (8i + 7, j + N / 4)⎦ ⎣ yN (8i + 7, j) ⎦ ⎣−tN / 4 (2i +1, j)⎦ Step c where tN/4(i, j) is defined in (9). Equation (14), which is for v+i ⎧ (−1) {y8(i, j) −t2 (v, j)},i = 0,3,4,7 , order-N WHT, becomes (11) when N=8. y8 (i, j + 2) = ⎨ v+i i ⎩(−1) {y8[i −(−1) , j]−t2 (v, j)},i =1,2,5,6 Let us first consider the computation of tN/4(i, j) in (14). where v =⌊ i/4⌋ . Utilizing the dN(j) in (7), we define: v T D (j) = [d (j), d (j+1), …, d (j+N/4-1)] ‐ Eight additions are required in this step for i = 0, 1, …, 7. N/4 N N N r r = X N/4(j)- X N/4(j+N). (15) r Table 3 gives the three steps for computing Y (j+2) from From (3), (9) and then (15), we have: r r 8 Y 8(j) as well as Y 8(j+1) and the corresponding number of tN/4(i, j) = yN/4(i, j) - yN/4(i, j+N) r T r r T r operations required. = M N/4(i) X N/4(j) - M N/4(i) X N/4(j+N) Therefore, the proposed algorithm requires 11 additions r T r r = M N/4(i) [ X N/4(j) - X N/4(j+N) ] whereas the algorithm in [9] requires 14 additions for obtain- r T v r = M N/4(i) D N/4(j). (16) ing the 8 projection values in Y (j+2). The GCK algorithm 8 Equation (16) shows that t (i, j) is the ith projection value requires 16 additions. v N/4 of the order-N/4 WHT of D N/4(j). When N=8, (16) becomes (10). 4 FAST ALGORITHM FOR WHT ON SLIDING WINDOWS FOR WINDOW SIZE N

4.1 The algorithm

Let DN be the order-N reverse-, i.e., elements at the reverse-diagonal positions are 1 and 0 at others. For example, D4 is: ⎡0 0 0 1⎤ ⎢ ⎥ 0 0 1 0 . D = ⎢ ⎥ 4 ⎢0 1 0 0⎥ ⎢ ⎥ ⎣1 0 0 0⎦ The equation below is proven in the appendix:

⎡yN (8i + 0, j)⎤ ⎡ yN / 4 (2i, j) ⎤ ⎢ y (8i +1, j)⎥ ⎢ y (2i, j + N / 4) ⎥ ⎢ N ⎥ ⎢ N / 4 ⎥

⎢yN (8i + 2, j)⎥ ⎢ yN / 4 (2i, j + 2N / 4) ⎥ ⎢ ⎥ ⎢ ⎥ y (8i + 3, j) ⎡M 0 ⎤ y (2i, j + 3N / 4) ⎢ N ⎥ = 4 4×4 ⎢ N / 4 ⎥ y (8i + 4, j) ⎢ ⎥ y (2i +1, j) ⎢ N ⎥ ⎣04×4 D4 M 4 ⎦⎢ N / 4 ⎥ ⎢y (8i + 5, j)⎥ ⎢ y (2i +1, j + N / 4) ⎥ ⎢ N ⎥ ⎢ N / 4 ⎥ ⎢yN (8i + 6, j)⎥ ⎢yN / 4 (2i +1, j + 2N / 4)⎥ ⎢ ⎥ ⎢ ⎥ ⎣yN (8i + 7, j)⎦ ⎣yN / 4 (2i +1, j + 3N / 4)⎦ for i=0, 1, …, N/8-1. (13) Hence, order-N WHT is partitioned into N/4 groups of or- der-4 WHT. Utilizing the method in (8) for each of the order-4 WHT in (13), WHT projection values in window j+N/4 can be computed using projection values in window j as well as tN/4(2i, j) and t (2i+1, j) as follows: Fig. 3 Signal flow diagram of the bottom up algorithm for order-N se- N/4 quency WHT

The signal flow diagram in Fig. 3 depicts the computation of order-N WHT using (14)-(16). Table 4 describes the compu- r r tation of YN(j+N/4) from YN(j) and the corresponding number of operations as well as memory required. Since at most size

4 MANUSCRIPT TO TPAMI. PERSONAL USE OF THESE MATERIALS IS PERMITTED. HOWEVER, PERMISSION TO REPRINT/REPUBLISH THIS MATERIAL FOR ADVERTISING OR PROMOTIONAL PURPOSES OR FOR CREATING NEW COLLECTIVE WORKS FOR RESALE OR REDISTRIBUTION TO SERVERS OR LISTS, OR TO REUSE ANY COPYRIGHTED COMPONENT OF THIS WORK IN OTHER WORKS MUST BE OBTAINED FROM IEEE. 2K memory is required for the proposed algorithm at each step, BN(N) = 1+N/2 + N = 3N/2+1. the memory required for the proposed algorithm is 2K which The number of additions required for the GCK algorithm is the same as the GCK algorithm. and the proposed algorithm are summarized in Table 5 which TABLE 4 shows that the proposed algorithm requires about 3N/2 addi- COMPUTATION OF ORDER-N WHT tions while the GCK algorithm requires 2N additions. The number of additions required by our algorithm for order-4 and Overall procedure: order-8 WHT are 5 and 11 respectively because we can use For each j {Step a;} direct computation instead of the GCK for calculating tN/4(i, j) For each i: { in the Step b of Table 4. For example, if N=4, then tN/4(i, j)=d4(j) For each j { Step b; } and no computation is required in the Step b of Table 4 for For each j { Step c; } } obtaining tN/4(i, j).

Step a Compute dN(j) = xj ‐ x j+N. This step provides the v TABLE 5 D N/4(j) in (16) for the computation of tN/4(i, j) in Step b. NUMBERS OF ADDITIONS REQUIRED BY THE GCK ALGORITHM Analysis: One addition per window is required. Size 2K AND THE PROPOSED ALGORITHM FOR ALL PROJECTION VALUES memory is required for storing dN(j) and the input data OF ORDER-N WHT xj for j=0,… K‐1. Note that xj will not be used in the fol‐ Size 4 8 16 32 N lowing steps. r T v GCK 8 16 32 64 2N Step b Compute tN/4(⌊i/4⌋, j)= M N/4(⌊i/4⌋) D N/4(j). This Proposed 5 11 25 49 3N/2+1 step provides the tN/4 in (14) for the computation of yN(i,j+N/4) in Step c. We can use the GCK algorithm in [10] for computation. 5.2 When not all projection values are computed Analysis: N/2 additions per window are required for ob‐ v In many applications, not all projection values are required. taining the N/4 values of tN/4(⌊i/4⌋, j) from D N/4(j) for given j. As stated in [10], size 2K memory is required by In this part, we analyze the computational requirement when GCK. only the first P projection values are computed for window size N. Specifically, we shall derive the number of additions Step c Obtain yN(i,j+N/4) using (14). Note that the yN(i,j) in (14) is computed previously. for the computation of yN(0,j), yN(1, j), … and yN(P-1, j) for j = Analysis: N additions per window are required for the N N/4, N/4+1, …, K-N. Here we shall not consider the cases values of i. Size K memory is required for storing the when j

yN/4(i, a) for j≤a

WANLI OUYANG AND W. K. CHAM: FAST ALGORITHM FOR WALSH HADAMARD TRANSFORM ON SLIDING WINDOWS 5 tions/pixel/kernel using the proposed algorithm. 250 4x4 Snake 6 EXPERIMENTAL RESULTS 8x8 Snake 200 To investigate the computational efficiency of the proposed 16x16 Snake algorithm for pattern matching in practical applications, block matching in motion estimation is utilized. Block matching in 150 motion estimation using fast WHT was carried out on the first % 200 frames of a video sequence “tempete” which has a resolu- tion of 352×288. The experiment considers the execution time 100 required for obtaining different numbers of WHT projection values, which ranges from 1 to 20. The proposed algorithm is compared with the algorithm in [8], which utilized the GCK 50 0 5 10 15 20 algorithm. In a similar experiment reported in [8], two projection (a) Snake order value computation orders were used. They are the “snake or- der” and “increasing frequency order”. Fig. 4 shows the order- 250 ing of the first 20 projection values of these two orders. The 4x4 IF percentage of the time required by the proposed algorithm 8x8 IF 200 with respect to the GCK algorithm is given in Fig. 6. The pro- 16x16 IF posed algorithm outperforms the GCK algorithm when the number of projections is greater than 5. As the proposed algo- 150 rithm computes 3 or 4 projection values together to save com- % putation whereas the GCK algorithm does not, so the percent- age of computational time saved by the proposed algorithm in 100 comparison with the GCK algorithm depends on the number of projections. Generally, the proposed algorithm achieves a higher saving when most projection values to be computed can 50 0 5 10 15 20 take advantage of this property. This is why when the number of projection values approaches 13 and 16 for snake order, the (b) Increasing frequency order proposed algorithm requires the least percentage of time com- Fig. 5 The percentage of time required by our algorithm with respect to pared with the GCK algorithm. When the number of projec- GCK algorithm when different number of projection values are computed, tion values is less than 5, the proposed algorithm requires where Snake stands for the snake order and IF stands for the increasing more computational time because projection values cannot be frequency order. The experiment is implemented on a 2.13GHz PC using C on windows XP system with compiling environment VC 6.0. grouped together for computation. Therefore, we would sug- gest the use of the GCK algorithm when the number of projec- tion values is less than 5. 7 CONCLUSIONS 0 1 8 9 0 2 5 10 16 This paper proposes a fast computational algorithm for Walsh 3 2 7 10 1 3 7 12 18 Hadamard Transform on sliding windows, which requires about 1.5 additions per projection vector per window. The 4 5 6 11 4 6 9 14 computational time of the proposed algorithm is about 75% 8 11 13 19 15 14 13 12 that of the GCK algorithm which is the fastest algorithm re- 16 17 18 19 15 17 ported so far. In cases where not all projection values are (a) Snake order (b) Increasing frequency order needed, the proposed algorithm can outperform the GCK algo- Fig. 4 Two different projection orders rithm when the number of projection values is five or above. The proposed algorithm achieves its high efficiency in the computation of order-N WHT by using order-4 and order-N/4 WHT. This paper provides fast algorithm for 1-D WHT. In the future, we are going to seek even faster algorithm. We will also try to see if there exists the superset of GCK that can be computed by constant number of additions per window per projection value independent of the size and dimension of the transform.

ACKNOWLEDGMENT The authors are also thankful to Prof. Hel-Ors’ for providing their code implementing the GCK algorithm.

6 MANUSCRIPT TO TPAMI. PERSONAL USE OF THESE MATERIALS IS PERMITTED. HOWEVER, PERMISSION TO REPRINT/REPUBLISH THIS MATERIAL FOR ADVERTISING OR PROMOTIONAL PURPOSES OR FOR CREATING NEW COLLECTIVE WORKS FOR RESALE OR REDISTRIBUTION TO SERVERS OR LISTS, OR TO REUSE ANY COPYRIGHTED COMPONENT OF THIS WORK IN OTHER WORKS MUST BE OBTAINED FROM IEEE. Z r Z r Z r APPENDIX A yN (ai + b, j) ={M a [ f (b,i,a)]⊗ M N / a[i]}X N r z r z r This appendix provides the proof for (13). Except for this ap- = [M a [ f (b,i,a)]Ia ]⊗[I1M N / a (i)]X N pendix, sequency order is used for representing WHT. In this r r r = [I ⊗ M z [ f (b,i,a)]][I ⊗ M z (i)]X appendix, dyadic-ordered WHT will be utilized for proving 1 a a N / a N r z (13). Natural order-N WHT can be represented by: ⎡M N / a (i) ⎤ ⎢ r z ⎥ MN= M2 ⊗ MN/2 , r M (i) r = M Z [ f (b,i,a)]⎢ N / a ⎥X where ⊗ is the (A ⊗ B is a mp×nq matrix a ⎢ ... ⎥ N ⎢ ⎥ composed of the m×n blocks (ai,j B)) and r z ⎣⎢ M N / a (i)⎦⎥ 1 1 ⎡ ⎤ Z M2 ⎡ ⎤ = ⎢ ⎥. yN / a (i, j) 1 − 1 ⎢ ⎥ ⎣ ⎦ y Z (i, j N / a) Z ⎢ N / a + ⎥ . (a9) Both sequency and dyadic orders [15] are the reordering = M a [ f (b,a,i)] ⎢ ... ⎥ form of the natural order for WHT. Here we denote Z as the M N ⎢ Z ⎥ D y (i, j + N − N / a) ⎣⎢ N / a ⎦⎥a×1 order-N sequency-ordered WHT matrix; denote M N as the order-N dyadic-ordered WHT matrix and: The following equation is valid using (a9): D D D D T = [ (0), (1), …, (N-1)] , (a1) Z Z Z Z M N M N M N M N ⎡ y (2ai +0, j) ⎤ ⎡ ⎛ y (2i , j) ⎞ ⎤ N ⎢ ⎜ N / a ⎟ ⎥ where D is the ith WHT basis vector. The binary vector ⎢ Z Z ⎥ Z Z M N (i) y (2ai +1, j) ⎜ y (2i , j + N/a) ⎟ T ⎢ N ⎥ ⎢ Z N / a ⎥ representation of i in (a1) [i , i , …, i ] , where i are 0 or 1 for Ma ⎜ ⎟ 1 2 g k ⎢ ... ⎥ ⎢ ... ⎥ k = 1, … g and: ⎢ ⎜ ⎟ ⎥ ⎢ Z Z ⎥ ⎜ Z Z ⎟ g-1 g-2 y (2ai +a−1, j) ⎢ y (2i , j + N − N/a) ⎥ i = 2 i1+ 2 i2+…+2ig-1+ig . ⎢ N ⎥ = ⎝ N /a ⎠ Z Z ⎢ Z Z ⎥ For dyadic-ordered WHT, for b=0,…,a-1, a= 2, 4, 8, …,N, ⎢ yN (2ai +a, j) ⎥ ⎛ y (2i +1, j) ⎞ ⎢ ⎥ ⎢ ⎜ N /a ⎟⎥ we have: yZ (2aiZ +a+1, j) ⎢ ⎜ yZ (2iZ +1, j + N /a) ⎟⎥ r r r ⎢ N ⎥ Z N / a D D D . (a2) ⎢DaMa ⎜ ⎟⎥ MN (ai+b) = Ma (b) ⊗MN / a (i) ⎢ ... ⎥ ⎢ ⎜ ... ⎟⎥ Z D ⎢ Z Z ⎥ Let i and i be index of sequency-ordered and dyadic-ordered y (2ai +2a−1, j) ⎢ ⎜ Z Z ⎟⎥ ⎣⎢ N ⎦⎥ ⎣ ⎝ yN / a (2i +1, j + N −N /a)⎠⎦ WHT respectively. As pointed out in [15], the relationship Z D Z Z between the binary vector representation of i and i is: ⎡ yN / a (2i , j) ⎤ iZ=[W ] iD, (a3) ⎢ Z Z ⎥ D,Z g ⎢ yN / a (2i , j + N/a) ⎥ ⎡1 0 0 0⎤ ⎢ ... ⎥ ⎢ ⎥ Z ⎢ ⎥ where [WD,Z]g= 1 1 ... 0 . ⎡M ⎤ Z Z ⎢ ⎥ a ⎢ yN /a (2i , j + N − N/a) ⎥ = ⎢ ⎥ Z Z . ⎢...... ⎥ D M Z ⎢ y (2i +1, j) ⎥ ⎢ ⎥ ⎣ a a ⎦⎢ N /a ⎥ 1 1 ... 1 Z Z ⎣ ⎦ g×g ⎢ yN /a (2i +1, j + N /a) ⎥ According to (a3), if r Z Z Z = r D D D , where bZ, ⎢ ... ⎥ M N (ai + bi ) M N (ai + b ) ⎢ ⎥ (a10) D k Z D Z Z b r Z Z = r D D . M N (ai + bi ) M N (ai + b ) M N / a (i ) M N / a (i ) Equ. (13) is valid when a=4 in (a10). (a4) Denote f(b,a,i) as: REFERENCES ⎧b, i is an even number [1] M. S. Aksoy , O. Torkul, and I. H. Cedimoglu, “An industrial visual f (b,a,i) = ⎨ , (a5) ⎩a −1− b, i is an odd number inspection system that uses inductive learning,” Journal of Intelli- Z D gent Manufacturing, vol. 15(4), pp. 569-574, Aug. 2004. The bZ in (a4) is decided by both i and b : i [2] A.W. Fitzgibbon, Y. Wexler, and A. Zisserman, “Image-Based ren- Z D Z bi =f([WD,Z]b ,a, i ), dering using image-based priors,” Proc. Int’l Conf. Computer Vision, vol. 2, pp. 1176-1183, Oct. 2003. where the size of [WD,Z] is log2a×log2a. It is obvious that f[f(b,a,i),a,i]=b, so we have: [3] T. Luczak and W. Szpankowski, “A suboptimal lossy data compres- sion based on approximate pattern matching,” IEEE Trans. Informa- D D Z Z Z Z [WD,Z]b = f[ f([WD,Z]b ,a, i ), a, i ]=f(bi ,a, i ). (a6) tion Theory, vol. 43, pp. 1439-1451, Sept. 1997. r [4] R. M. Dufour, E. L. Miller, and N. P. Galatsanos, “Template match- M Z (aiZ + bZ , j) can be represented as follows using (a2), N i ing based object recognition with unknown geometric parameters,” (a4) and (a6): r r r r IEEE Trans. Image Process., vol. 11, no. 12, pp. 1385–1396, Dec. M Z (aiZ +bZ , j) = M D (aiD +bD, j) = M D (bD )⊗M D (iD ) N i N a N / a . (a7) 2002. r Z D r Z Z r Z Z Z r Z Z [5] C.M. Mak, N. Li, W.K. Cham, “Fast Motion Estimation in Walsh = Ma ([WD,Z ]b )⊗MN / a (i ) = Ma [ f (bi ,a,i )]⊗MN / a (i ) Hadamard Domain,” in Proc. Int. Sym. Intelligent Signal Processing According to (a7), we have: and Communication Systems, ISPACS, pp. 349-352, 13-16 Dec. r Z r Z r Z . (a8) M N (ai+b, j) = M a [ f (b,a,i)]⊗M N / a [i] 2005. Z Therefore, yN (ai + b, j) can be represented as follows: [6] ITU-T Rec. H.264 and ISO/IEC 14496-10 AVC, JVT-G050, March, 2003. [7] C.M. Mak., C.K. Fong, W.K. Cham, “Fast motion estimation for

WANLI OUYANG AND W. K. CHAM: FAST ALGORITHM FOR WALSH HADAMARD TRANSFORM ON SLIDING WINDOWS 7

H.264/AVC in Walsh Hadamard domain,” IEEE Trans. Circuits Syst. Pattern Matching using Projection,” in Proc. IEEE International Video Technol., 18(6):735 – 745, Jun. 2008. Conference on Image Processing, Genoa Italy, Sept. 2005. [8] Y. Moshe and H. Hel-Or, “A Fast Block Motion Estimation Algo- [14] J.L. Shanks, “Computation of the Fast Walsh-,” rithm Using Gray Code Kernels”, in Proc. IEEE Symp. Signal Proc- IEEE Trans. Comput, Vol. C-18, No. 5, pp.457 – 459, May 1969. essing and Information Technology, pp.185 - 190, Vancouver, Can- [15] W.K. Cham and R.J. Clarke, “Dyadic Symmetry and Walsh Matri- ada, Aug. 2006. ces,” IEE Proceedings, Pt.F., Vol.134, No.2, pp.141-145, Apr. 1987. [9] Y. Hel-Or and H. Hel-Or, “Real time pattern matching using projec- tion kernels,” IEEE Trans. Pattern Analysis and Machine Intelli- Wanli Ouyang received the B.S. degree in computer science from Xiang- gence, vol. 27, no. 9, pp. 1430-1445, Sept. 2005. tan University, Hunan, China, in 2003. He received the M.S. degree with [10] G. Ben-Artzi, H. Hel-Or, and Y. Hel-Or, “The gray-code filter ker- the College of Computer Science and Technology, Beijing University of nels,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. Technology, Beijing, China. He is now pursuing Ph.D in the Department 29, no. 3, pp.382 – 393, Mar. 2007. of Electronic Engineering, The Chinese University of Hong Kong. [11] S. Omachiand and M. Omachi, “Fast template matching with poly- Wai-Kuen Cham graduated from The Chinese University of Hong Kong nomials,” IEEE Trans. Image Process., vol. 16, no. 8, pp.2139 – in 1979 in Electronics. He received his M.Sc. and Ph.D. degrees from 2149, Aug. 2007. Loughborough University of Technology, U.K., in 1980 and 1983 respec- [12] G. J. VanderBrug and A. Rosenfeld, “Two-stage template match- tively. Since May 1985, he has been with the Department of Electronic ing,” IEEE Trans. Comput., vol. C-26, no. 4, pp. 384–393, Apr. 1977. Engineering, The Chinese University of Hong Kong. Prof. Cham is a Chartered Engineer and a senior member of IEEE. [13] M. Ben-Yehuda, L. Cadany, H. Hel-Or, and Y. Hel-Or, “Irregular

8

− −

− − − −

Fig. 1 Tree structure for Walsh-Hadamard Transform in sequency order

TABLE 1 FAST ALGORITHM WHEN WINDOW SIZE IS 4

xj xj+1 xj+2 xj+3 xj+4 Proposed algorithm GCK algorithm[10]

y4(0, j) 1 1 1 1 y4(0, j+1) y4(0, j+1)

y4(0, j+1) 1 1 1 1 = y4(0, j) - xj + x j+4 = y4(0, j) - xj + x j+4

y4(2, j) 1 -1 -1 1 y4(1, j+1) y4(1, j+2)

y4(1, j+1) 1 1 -1 -1 =-y4(2, j)+xj-x j+4 = y4(0, j)-y4(0, j+2)-y4(1, j)

y4(1, j) 1 1 -1 -1 y4(2, j+1) y4(2, j+2)

y4(2, j+1) 1 -1 -1 1 = y4(1, j)-xj+xj+4 = y4(1, j+1)-y4(1, j+2)-y4(2, j+1)

y4(3, j) 1 -1 1 -1 y4(3, j+1) y4(3, j+2)

y4(3, j+1) 1 -1 1 -1 = -y4(3, j)+ xj-xj+4 = y4(3, j) -y4(2, j)-y4(2, j+2)

TABLE 2 FAST ALGORITHM WHEN WINDOW SIZE IS 8

xj xj+1 xj+2 xj+3 xj+4 xj+5 xj+6 xj+7 xj+8 xj+9 Proposed algorithm GCK

y8(0, j) 1 1 1 1 1 1 1 1 y8(0,j+2) y8(0,j+2)

y8(0, j+2) 1 1 1 1 1 1 1 1 =y8(0,j)-xj-xj+1+xj+8+xj+9 =y8(0,j+1)-xj+1+xj+9

y8(2, j) 1 1 -1 -1 -1 -1 1 1 y8(1,j+2) y8(1, j+2)

y8(1, j+2) 1 1 1 1 -1 -1 -1 -1 =-y8(2,j)+xj+xj+1-xj+8-xj+9 = y8(0, j-2)-y8(0, j+2)-y8(1, j-2)

y8(1, j) 1 1 1 1 -1 -1 -1 -1 y8(2,j+2) y8(2, j+2)

y8(2, j+2) 1 1 -1 -1 -1 -1 1 1 =y8(1,j)-xj-xj+1+xj+8+xj+9 = y8(1, j)-y8(1, j+2)-y8(2, j)

y8(3, j) 1 1 -1 -1 1 1 -1 -1 y8(3,j+2) y8(3, j+2)

y8(3, j+2) 1 1 -1 -1 1 1 -1 -1 =-y8(3,j)+xj+xj+1-xj+8-xj+9 =y8(3, j-2)-y8(2, j+2)-y8(2, j-2)

y8(4, j) 1 -1 -1 1 1 -1 -1 1 y8(4,j+2) y8(4, j+2)

y8(4, j+2) 1 -1 -1 1 1 -1 -1 1 =-y8(4,j)+xj-xj+1-xj+8+xj+9 = y8(3, j+1)-y8(3, j+2)-y8(4, j+1)

y8(6, j) 1 -1 1 -1 -1 1 -1 1 y8(5,j+2) y8(5, j+2)

y8(5, j+2) 1 -1 -1 1 -1 1 1 -1 =y8(6,j)-xj+xj+1+xj+8-xj+9 = y8(4, j-2)-y8(4, j+2)-y8(5, j-2)

y8(5, j) 1 -1 -1 1 -1 1 1 -1 y8(6,j+2) y8(6, j+2) - - y8(6, j+2) 1 -1 1 -1 -1 1 -1 1 =-y8(5,j)+xj-xj+1-xj+8+xj+9 = y8(6, j) y8(5, j) y8(5, j+2)

y8(7, j) 1 -1 1 -1 1 -1 1 -1 y8(7,j+2) y8(7, j+2)

y8(7, j+2) 1 -1 1 -1 1 -1 1 -1 =y8(7,j)-xj+xj+1+xj+8-xj+9 = y8(7, j-2) -y8(6, j-2)-y8(6, j+2)