Implementation and Comparison of the 5/3 Lifting 2D Discrete Wavelet Transform Computation Schedules on Fpgas

Journal of VLSI Signal Processing 2007 ) 2007 Springer Science + Business Media, LLC. Manufactured in The United State. DOI: 10.1007/s11265-007-0139-5 Implementation and Comparison of the 5/3 Lifting 2D Discrete Wavelet Transform Computation Schedules on FPGAs MARIA E. ANGELOPOULOU AND PETER Y. K. CHEUNG Department of Electrical and Electronic Engineering, Imperial College London, Exhibition Road, London, SW7 2BT, UK KONSTANTINOS MASSELOS Department of Computer Science and Technology, University of Peloponnese, Tripolis, 22100, Greece YIANNIS ANDREOPOULOS Department of Electronic Engineering, Queen Mary University of London, Mile End Road, London, E1 4NS, UK Received: 14 April 2007; Revised: 23 July 2007; Accepted: 9 August 2007 Abstract. The suitability of the 2D Discrete Wavelet Transform (DWT) as a tool in image and video compression is nowadays indisputable. For the execution of the multilevel 2D DWT, several computation schedules based on different input traversal patterns have been proposed. Among these, the most commonly used in practical designs are: the row–column, the line-based and the block-based. In this work, these schedules are implemented on FPGA- based platforms for the forward 2D DWT by using a lifting-based filter-bank implementation. Our designs were realized in VHDL and optimized in terms of throughput and memory requirements, in accordance with the principles of both the schedules and the lifting decomposition. The implementations are fully parameterized with respect to the size of the input image and the number of decomposition levels. We provide detailed experimental results concerning the throughput, the area, the memory requirements and the energy dissipation, associated with every point of the parameter space. These results demonstrate that the choice of the suitable schedule is a decision that should be dependent on the given algorithmic specifications. Keywords: discrete wavelet transform, FPGA, implementation, lifting scheme, 5/3 lifting filter-pair, row-column, line-based, block-based, comparison 1. Introduction desired smoothness and continuity. The high algorithmic performance of the 2D DWT in image The two-dimensional Discrete Wavelet Transform compression justifies its use as the kernel of both (2DDWT)isnowadaysestablishedasakey the JPEG-2000 still image compression standard [1] operation in image processing. In the area of image and the MPEG-4 texture coding standard [2]. compression, the 2D DWT has clearly prevailed against its predecessor, the 2D Discrete Cosine 1.1. The Dyadic Decomposition Transform. This is mainly because it achieves higher compression ratios, due to the subband decomposi- The 2D DWT can be considered as a `chain’ of suc- tion it involves, while it eliminates the `blocking’ cessive levels of decomposition as depicted in Fig. 1. artifacts that deprive the reconstructed image of the Because the 2D DWT is a separable transform, it Angelopoulou et al. can be computed by applying the 1D DWT along the rows and columns of the input image of each level during the horizontal and vertical filtering h.f. v.f. LL0 L0 H0 stages. Every time the 1D DWT is applied on a signal, LEVEL 0 it decomposes that signal in two sets of coefficients: a low-frequency and a high-frequency set. The low- frequency set is an approximation of the input signal at a coarser resolution, while the high-frequency set includes the details that will be used at a later stage LL HL L H HL during the reconstruction phase. 1 1 1 1 1 This procedure, presented in Fig. 1, is known as h.f. v.f. the dyadic decomposition of the image, and its LEVEL 1 LH HH LH HH impact upon the image’s pixels can be presented by 1 1 1 1 the diagram of Fig. 2, for the case of three decomposition levels. The shaded areas in Fig. 2 represent the low-frequency coefficients that com- HL L H HL prise the coarse image at the input of each level. LL2 2 2 2 2 HL HL Let us briefly describe the steps of this decomposition. 1 1 LH HH LH HH The input of level j is the low-frequency 2D subband 2 2 h.f. 2 2 v.f. LLj, which is actually the coarse image at the resolution LEVEL 2 LH HH LH HH of that level. In the first level, the image itself 1 1 1 1 constitutes the LL image block (LL0). The coefficients L (H), produced after the horizontal filtering at a given level, are vertically filtered to produce subbands LL and LL3 HL3 HL LH (HL and HH ). The LL subband will either be the 2 LH HH 3 3 HL input of the horizontal filtering stage of the next level, if 1 LH HH there is one, or will be stored, if the current level is also 2 2 h.f. ... the last one. All LH, HL and HH subbands are stored, to LEVEL 3 LH HH contribute later in the reconstruction of the original 1 1 h.f. : horizontal filtering stage image from the LL subband. v.f. : vertical filtering stage Figure 2. Diagrammatic representation of the dyadic decompo- 1.2. The 1D DWT sition for three decomposition levels. In Fig. 1, the units that implement the 1D DWT are detail. Two main options exist for the implementation of depicted as black boxes. These are now considered in 1D DWT: the traditional convolution-based implementation [3] and the lifting-based implementation [4, 5]. h.f.v.f.h.f. v.f. h.f. v.f. The convolution-based 1D DWT The conventional HH1 H HL convolution-based 1D DWT of [3] is presented in 0 1 HH IN L LH 2 Fig. 3a. As shown in Fig. 3a, this consists of two 0 1 H 1 HL2 (LL0) LL HH fi 1 LH 3 analysis lters, h (low-pass) and g (high-pass), L1 2 H2 HL LL 3 followed by subsampling units. The signal x½n is 2 L LH 2 3 decomposed into the approximation (low-frequency) LL 3 ... signal lp½n and the detail (high-frequency) signal level 0 level 1 level 2 hp½n . Note that in the structure of Fig. 3a the unit that implements the forward 1D-DWT downsampling is performed after the filtering has h.f. :horizontal filtering stage been completed. This is clearly inefficient since, in v.f. : vertical filtering stage this case, half of the calculated coefficients are fi Figure 1. The 2D DWT decomposition as a ‘chain’ of redundant, and the ltering is realized at full successive levels. sampling rate. Implementation of the 5/3 Lifting 2D DWT Computation Schedules h[n] 2 lp[n] 2 LP(z) E by factorizing the polyphase matrix of the DWT into x[n] X(z) 0(z) g[n] 2 hp[n] z-1 2 HP(z) elementary matrices. We will briefly discuss the principles of the (a) (b) forward lifting. More information on the lifting Figure. 3. The convolution-based implementation of the forward fi scheme can be found in [4, 5]. As it is proven in 1D DWT. The conventional ltering-and-downsampling structure fi (a). Using the polyphase matrix of the analysis filter-bank (b). [4], if (H , G) is a perfect-reconstruction lter-pair, the following factorization of matrix E0ðzÞ of (1), into lifting steps, is feasible: Early research on filter-bank design proved that Ym K 0 1 U ðzÞ 10 the execution of 1D DWT can be accelerated by EðzÞ¼ i ð3Þ fi 01=K 01ÀPiðzÞ 1 using the polyphase matrix of the lter-bank, instead i¼1 of the conventional filtering-and-downsampling structure of Fig. 3a. As Fig. 3b shows, the signal is where K is a constant and m is the number of predict- split into two signals (polyphase components) at half and-update steps. They both depend on the type of of the original sampling rate. The downsampling is filter-pair that is used. fi now performed prior to the actual ltering, thereby Note that the lifting factorization of (3) is not unique. fi avoiding the calculation of coef cients that will later That is, for a given wavelet transform we can often find be discarded. The polyphase components of the signal multiple ways of factorizing its polyphase matrix. fi fi are ltered in parallel by the corresponding lter The numerical factorization of (3) is represented by fi coef cients, producing the same result as if the the schematic diagram of Fig. 4. The forward lifting downsampling was performed as described in [3]. scheme consists of the following steps: The analysis polyphase matrix for Fig. 3bis fi de ned (in the Z-domain) as: 1. The splitting step, where the signal is separated into even and odd samples. HeðzÞHoðzÞ E0ðzÞ¼ ð1Þ 2. The prediction steps, associated with the predict GeðzÞGoðzÞ operator PiðzÞ. 3. The update steps, associated with the update where HeðzÞ and HoðzÞ denote the Type-I even and operator UiðzÞ. odd polyphase components of the corresponding 4. The scaling step, indicated by the scaling factors fi low-pass analysis lter, and GeðzÞ and GoðzÞ denote 1=K and K. the Type-I even and odd polyphase components of the corresponding high-pass analysis filter. The inverse transform is realized by traversing the Using the analysis polyphase matrix of (1), the schematic of Fig. 3b from right to left (reverse signal wavelet decomposition can be written (in the Z- flow) and switching the signs of the predict and domain) as: update operators as well as the scaling factors. Because of the complete reversibility of the lifting LPðzÞ XeðzÞ scheme even if non-linear predict and update oper- ¼ E0ðzÞ ; ð2Þ HPðzÞ XoðzÞ ators are used in the schematic of Fig.

Load more