<<

EURASIP Journal on Applied Processing 2001:3, 147–162 © 2001 Hindawi Publishing Corporation

Motion Compensation on DCT Domain

Ut-Va Koc Lucent Technologies Bell Labs, 600 Mountain Avenue, Murray Hill, NJ 07974, USA Email: [email protected] K. J. Ray Liu Department of Electrical and Computer Engineering and Institute for Systems Research, University of Maryland, College Park, MD 20742, USA Email: [email protected]

Received 21 May 2001 and in revised form 21 September 2001

Alternative fully DCT-based architectures have been proposed in the past to address the shortcomings of the conven- tional hybrid motion compensated DCT structures traditionally chosen as the basis of implementation of - compliant . However, no prior effort has been made to ensure interoperability of these two drastically different architectures so that fully DCT-based video codecs are fully compatible with the existing video coding standards. In this paper, we establish the criteria for matching conventional codecs with fully DCT-based codecs. We find that the key to this interoperability lies in the heart of the implementation of modules performed in the spatial and transform domains at both the encoder and the decoder. Specifically,if the spatial-domain motion compensation is compatible with the transform-domain motion compensation, then the states in both the coder and the decoder will keep track of each other even after a long series of P-frames. Otherwise, the states will diverge in proportion to the number of P-frames between two I-frames. This sets an important criterion for the development of any DCT-based motion compensation schemes. We also discuss and develop some DCT-based motion compensation schemes as important building blocks of fully DCT-based codecs. For the case of subpixel motion compensation, DCT-based approaches allow more accurate interpolation without any increase in computation. Furthermore, a scare number of DCT coefficients after quantization significantly decreases the number of calculations required for motion compensation. Coupled with the DCT-based algorithms, it is possible to realize fully DCT-based codecs to overcome the disadvantages of conventional hybrid codecs.

Keywords and phrases: video codec, video coding, MPEG compatibility, motion compensation, DCT.

1. INTRODUCTION A compromise is to remove the loop and perform open-loop motion estimation based upon original instead of In most international video coding standards such as CCITT reconstructed images in sacrifice of the performance of the H.261 [1], MPEG-1 [2], MPEG-2 [3] as well as the pro- coder [4, 5]. The presence of the IDCT block inside the feed- posed HDTV standard, Discrete Cosine Transform (DCT) back loop of the conventional video coder design comes from and block-based motion estimation are the essential elements the fact that currently available motion estimation algorithms to achieve spatial and temporal compression, respectively. can only estimate motion in the spatial domain rather than Most implementations of a standard-compliant codec adopt directly in the DCT domain. In the case of the conventional the conventional motion-compensated DCT video coder and decoder, motion compensation is carried out in the spatial decoder structures as shown in Figures 1a and 1b,respectively. domain after conversion of compressed streams by the The feedback loop in the coder for temporal prediction con- IDCT block back to the uncompressed reconstructed . sists of a DCT, an Inverse DCT (IDCT), a spatial-domain This implies the requirement for such a decoder to handle all motion compensator (SD-MC),and a spatial-domain motion the pixels in real time even after being encoded at a estimator (SD-ME) which is usually the full search block very high compression rate. matching approach (BKM). This is undesirable. In addition The alternative fully DCT-based video coder and decoder to the additional complexity added to the overall architecture, architectures, as shown in Figures 2a and 2b, respectively, this feedback loop limits the throughput of the coder and have been proposed and explored in [5, 6, 7, 8] among oth- becomes the bottleneck of a real-time high-end video codec. ers, to address the shortcomings of the conventional codec 148 EURASIP Journal on Applied Signal Processing

+ + channel video in+ DCT Q VLC channel video in DCT + Q VLC − − − Q 1

−1 Q IDCT

SD-ME + TD-ME +

(a) Conventional hybrid motion-compensated DCT video coder. (a) Fully DCT-based motion-compensated video coder.

− 1 + −1 + VLD Q IDCT VLD Q IDCT

SD-MC TD-MC

(b) Conventional hybrid motion-compensated DCT video decoder. (b) Fully DCT-based motion-compensated video decoder.

Figure 1: Conventional hybrid motion-compensated DCT video Figure 2: Fully DCT-based motion-compensated video codec: codec: motion estimation/compensation are performed in the spa- motion estimation/compensation are completed in the transform tial domain. (DCT) domain. structures. In the fully DCT-based video codec, motion es- technology permit transmission of high-quality production- timation and compensation are performed on the com- grade video over broadband networks in real time at pressed bit stream in the transform (DCT) domain, and thus affordable costs. these functional blocks are called transform-domain motion • Lower computational complexity of DCT-based motion estimator (TD-ME) and transform-domain motion compen- estimation and compensation approaches: due to the decor- sator (TD-ME), respectively, to be distinguished from their relation of DCT exploited in most video standards, most spatial-domain counterparts. The resultant fully DCT-based energy tend to cluster in a few DCT coefficients, especially motion compensated video codec structure enjoys several the DC terms, with the rest being zeros after quantization. advantages over the conventional hybrid motion compen- This characteristic is particularly beneficial to the DCT-based sated DCT video codec structure: approach since no computation is needed for the majority of • Less coder components and complexity: removal of the DCT coefficients being zero [9]. DCT-IDCT pair in the feedback loop of the fully DCT-based • Joint optimization of DCT-based components: a fast reduces the total number of components required in the feed- lattice-structured DCT coder generates dual outputs (DCT back loop and thus the complexity of the complete coder. and DST) which can be utilized by the DCT-based motion • Higher throughput rate: the feedback loop of a video estimation algorithms. coder requires processing at the so that the pre- • Extendibility to a transcoder structure: an optimal vious frame data can be stored in the frame memory and transcoder modifies the encoded video bit stream in the DCT need to be available for coding the next incoming frame. domain directly to fit different usage requirements (such as Traditionally, this loop has four components plus the spatial- frame rate conversion, frame size conversion, con- domain motion estimation and compensation unit and thus version, etc.) different from the usage requirement originally creates the bottleneck for encoding large frame sizes in real planned for. The fully DCT-based structure handles video time. In the conventional coder, the whole frame must be data completely in the DCT domain and therefore can be eas- processed by both the DCT-IDCT pair and the Q-Q−1 pair ily extended to provide a transcoder function by cascading a before the next incoming frame. In the DCT-based struc- DCT-based decoder with certain simplification and modifi- ture, the whole frame must be processed by only the Q- cation required by the end usage. For example, the DCT coder Q−1 pair. This results in a less stringent requirement on the at the front of a DCT-based coder and an IDCT decoder of a processing speed of the feedback loop components. Alterna- DCT-based decoder can be removed. tively, this may increase the throughput rate of the coder and • Additional information processing: DCT coefficients thus allow processing larger frame sizes when the technol- carry certain information which can be utilized, as an exam- ogy keeps on improving the processing speed of these com- ple, for image segmentation in the DCT domain [10]. The ponents. This high throughput advantage becomes increas- DCT-based codec structure facilitates such use of DCT coef- ingly important when the advances in optical networking ficients. Motion compensation on DCT domain 149

• Compatibility with existing standards: one of the s rt t + main objectives of most motion-compensated DCT video xt DCT Q VLC − compression standards is to ensure proper interoperability −1 between an encoder and a decoder supplied by two dif- Q ferent vendors. The fully DCT-based structure encodes the intraframes and motion compensated residuals in DCT fun- damentally in the same way as the hybrid structure does. The IDCT encoded bit stream can be made fully compatible with the y yt−1 t existing video coding standards under certain conditions as + h MC Z t discussed later in this paper. vˆ For compatibility with existing standards, a decoder must t be able to track closely the state of the encoder transmitting Z = Frame memory ME the encoded bit stream to ensure interoperability, because of the recursive nature of the motion-compensated standards, especially for systems having lots of interframes to achieve Figure 3: Modeling diagram for conventional hybrid motion- high compression. In this way, the decoder will be able to compensated DCT video coder. reproduce the images in as high fidelity as possible. However, despite the above advantages of fully DCT-based codecs, no D r sD effort has been spent on the important issue of how the con- + t t xt DCT Q VLC ventional hybrid codecs and the fully DCT-based codecs can − match with each other without causing degradation in video −1 yD Q quality. In Section 2, we will demonstrate that t−1 • D Z + The fully DCT-based codecs and the conventional MC D y D h hybrid codecs are mathematically compatible in the sense that t t vˆD t the state of a DCT-based decoder can track the DCT-based Z = Frame memory encoder state in the same manner as the hybrid counterparts. D ME • If the coder-encoder pair of different architectural types meets the matching condition, then we can mix different types of coders and decoders without having the decoders Figure 4: Modeling diagram for fully DCT-based motion- reaching divergent states after a long series of P-frames. compensated video coder. Furthermore, we will also discuss DCT-based motion compensation schemes employed in these fully DCT-based codecs in Section 3. where yt is the state of the coder stored in the frame memory, and the estimated motion vector 2. MATCHING FULLY DCT-BASED CODECS WITH   vˆt = ME yt−1,xt . (2) CONVENTIONAL HYBRID CODECS In order to gain insight into the main differences between • Intraframe encoding (I-frame) for SE: the conventional codec and the fully DCT-based one, we are    s0 = Q DCT x0 , going to derive the mathematical formulation for both ar-      −1 (3) chitectures. The conventional hybrid coder is redrawn in y0 = h0 = IDCT Q Q DCT x0 . Figure 3 with the SD-ME block splitted into two functional blocks (ME for motion estimation and MC for motion com- The above formulation considers only a pensation) and the frame memory (Z) explicitly depicted for with the frame x0 encoded as an I-frame and the rest of the storage of the previous reconstructed frame. The esti- frames xt as P-frames. Without loss of generality, B-frames mated motion vector field vˆt for the current frame is es- are not considered here. Then st and vˆt are entropy coded timated from both the current frame xt and the previous into the bit stream sent to a decoder. reconstructed frame yt−1. The encoded bit stream for the The fully DCT-based coder structure is redrawn in compressed interframe residual is denoted as VLC{st} where Figure 4 where the TD-ME block is also splitted, explicitly VLC{·} is the variable length coder and st is the quantized for formulating the encoding process, into two sections: the DCT coefficients of the interframe motion-compensated motion estimation (MED) and the motion compensation residual. Therefore,the conventional hybrid coder (also called (MCD). Note that the superscript D in Figure 4 and equa- Spatial Domain Encoder or SE for short) can be described tions (4), (5), and (6) below are used to distinguish the fully mathematically as an iterative function: DCT-based structure (called Transform Domain Encoder or • Interframe encoding (P-frame) for SE: TE) from the SE. Similar to the formulation of the SE, we con-     sider only one group of pictures with the first frame coded as st = Q DCT xt − MC yt−1, vˆt , an I-frame and the rest as P-frames. The recursive function      (1) −1 describing the TE structure is listed as follows: yt = MC yt−1, vˆt + IDCT Q st , 150 EURASIP Journal on Applied Signal Processing

z = ˆ • Intraframe decoding (I-frame) for SD: s wt t xt t −1 + VLD Q IDCT    z −1 t−1 z0 = xˆ0 = w0 = IDCT Q s0 , (8) MC Z where s0 = Q{DCT{x0}}. vˆ Based on the DCT-based motion compensation schemes t discussed above, the Transform Domain Decoder (TD), a fully DCT-based decoder, can be constructed [8, 11] as shown Figure 5: Modeling diagram for conventional hybrid motion- in Figure 6 with all the signals and components explicitly compensated DCT video decoder. labeled. The TD can be modeled as follows: • Interframe decoding (P-frame) for TD:      D − sD w ˆD D = 1 D + D D D − t x xˆt IDCT Q st MC zˆt−1, vˆt , VLD t 1 + IDCT t Q D     (9) z − t−1 D = D D D + 1 D zt MC zt−1, vˆt Q st , D zD MC Z t D where xˆt is the reconstructed frame image at the decoder vˆD D t and zt will be stored in the frame memory as the state of the decoder. • Figure 6: Modeling diagram for fully DCT-based motion- Intraframe decoding (I-frame) for TD: compensated video decoder.      D = D = −1 D xˆ0 IDCT w0 IDCT Q s0 ,   (10) D = D = −1 D z0 w0 Q s0 , • Interframe encoding (P-frame) for TE: D = { { }}      where s0 Q DCT x0 . D = − D D D st Q DCT xt MC yt−1, vˆt , No matter how we match the spatial- or transform-     (4) domain encoders (SE or TE) with the spatial- or transform- D = D D ˆD + −1 D yt MC yt−1, vt Q st , domain decoders (SD or TD), the reconstructed intraframes = D (t 0) are related to the original frames as follows: where st is the quantized DCT coefficients of the interframe D      motion-compensated residual, y − is the state of the coder −1 t 1 xˆ0 = IDCT Q Q DCT x0 . (11) stored in the frame memory, and the estimated motion vector However, for intraframes, the reconstructed frames will have    subtle difference with different matching pairs. As discussed D = D D vˆt ME yt−1, DCT xt . (5) below, the subtle difference in the states of the encoder and the decoder may have a divergent effect on the reconstructed images after encoding a long series of P-frames. • Intraframe encoding (I-frame) for TE:    D = 2.1. Matching SE with SD s0 Q DCT x0 ,     (6) D = D = −1 If we send the encoded bit stream from an SE to an SD, then y0 h0 Q Q DCT x0 . we can show that, as long as the encoder-decoder pair have In most motion-compensated video compression stan- matching DCT/IDCT, Q/Q−1 and MC implementation, dards, the conventional hybrid decoder (Spatial Domain = = Decoder or SD for short) is usually cited for its concep- wt ht,zt yt, (12) tual simplicity. The SD is also depicted in Figure 5 where where zt (= xˆt) and yt are the decoder and encoder states, st reappears at the output of VLD (Variable Length Decoder) because variable length coding is considered as a reversible respectively. In other words, the decoder can always track process, that is, VLD{VLC{α}} = α. The formulation for the the state of the encoder even after a long series of P-frames. SD is listed as follows: However, in practice, it is difficult to implement matching • Interframe decoding (P-frame) for SD: components at both the encoder and the decoder. As a result, a new intraframe is usually sent after a limited number of      −1 xˆt = IDCT Q st + MC xˆt−1, vˆt , P-frames to reset the state of the codecs before diverging too (7) far off. The reconstructed frames can be shown to be related = zt xˆt, to the original images as below:       where xˆt is the reconstructed frame image at the decoder −1 xˆt = IDCT Q Q DCT xt − MC xˆt−1, vˆt and zt will be stored in the frame memory as the state of the   (13) decoder. + MC xˆt−1, vˆt . Motion compensation on DCT domain 151

2.2. Matching TE with TD high-fidelity, that is, Similarly, if we send the encoded bit stream from a TE to a   z = IDCT yD . (21) TD, then we can prove that, as long as the encoder-decoder t t pair have matching DCT/IDCT,Q/Q−1 and MCD implemen- Proof. For intraframes (t = 0), tation,      − z = IDCT Q 1 Q DCT x D = D D = D 0 0 wt ht ,zt yt , (14)     (22) = D = D IDCT h0 IDCT y0 . D D where zt and yt are the decoder and encoder states, respec- = = { −1{ D}} + D tively. Therefore, the decoder can always track the state of the For t 1, z1 IDCT Q s1 MC(z0, vˆ1 ), and        encoder even after a long series of P-frames in the same way as − IDCT yD = IDCT Q 1 sD + MCD yD, vˆD the SE-SD pair. It should be noted that, unlike the states of the 1   1  0 1  = −1 D + D D D SE-SD pair being the reconstructed frame images, the states IDCT Q s1 IDCT MC y0 , vˆ1 of this TE-TD pair are the quantized DCT coefficients usually       = z + IDCT MCD yD, vˆD − MC z , vˆD having many zeros in the high-frequency DCT domain and 1 0 1 0 1 = thus requiring less storage space than the SE-SD pair. z1 The reconstructed frames are related to the original (23) images as below: { D D D }= { D} D   if IDCT MC (y0 , vˆ1 ) MC(IDCT y0 , vˆ1 ). The first D = D xˆt IDCT zt , (15) equality holds for the distributive property of IDCT. = { D} Now assuming that zt IDCT yt , where              D = −1 D + D D D − IDCT yt+1 IDCT Q st+1 MC yt , vˆt+1 zD = Q 1 Q DCT x − MCD zD , vˆD       t t t−1 t = −1 D + D D D   (16) IDCT Q st+1 IDCT MC yt , vˆt+1 + D D D       MC zt−1, vˆt . = + D D D − D zt+1 IDCT MC yt , vˆt+1 MC zt, vˆt+1 = 2.3. Matching TE with SD zt+1 (24) When the encoded bit stream from an SE is decoded by a TD, that is, { D D D }= { D} D if IDCT MC (yt , vˆt+1) MC(IDCT yt , vˆt+1). = D = D st st , vˆt vˆt , (17) The first requirement is the matching condition of MC-MCD simply stating that the conventional motion com- it can be shown that pensated frame (MC) must be the same as the IDCT (image)   of the DCT-based motion compensated frame (MCD). The w = IDCT hD . (18) t t second condition says that the distributive property of IDCT The reconstructed frames are related to the original images must also be maintained true even in the finite-length and as below: fixed-point implementation though it holds in theory.        When the first requirement cannot be satisfied, we find = −1 − D D xˆt IDCT Q Q DCT xt MC yt−1, vˆt that the encoder and the decoder may have progressively   (19) divergent states as more P-frames are inserted between two + MC xˆt−1, vˆt . I-frames. We may define the difference between the encoder D and decoder states as δt for the tth P-frame as follows: However, the contents of the frame memories (zt and yt ) of the TE encoder and the SD decoder are quite differ-   δ  IDCT yD − z . (25) ent: the frame memory of the TE holds the quantized DCT t t t coefficients of the reconstructed frames whereas the SD frame In other words, δt is defined as the encoder state minus the memory stores the reconstructed images. Now we need to decoder state. We further assume that the second condition show the following theorem. holds and the spatial-domain and transform-domain motion compensators produce results differing by a fixed amount ∆ Theorem 1. If the SD decoder and the TE encoder satisfy the for each for the sake of simplicity in the analysis matching MC-MCD condition as described as follows:            IDCT MCD(a, v) − MC DCT{a},v = ∆. D D = D D D (26) MC IDCT yt−1 , vˆt IDCT MC yt−1, vˆt , (20) = { D}− = IDCT{a + b}=IDCT{a}+IDCT{b}, We know that δ0 IDCT y0 z0 0. Also   D δ1 = IDCT y − z1 then both frame memory contents can maintain a fixed rela-  1       = D D D − D D tionship in order for the SD decoder to track the state of the IDCT MC y0 , vˆ1 MC IDCT y0 , vˆ1 (27) TE encoder for accurately decoding a long series of P-frames in = ∆. 152 EURASIP Journal on Applied Signal Processing

Similarly, then both frame memory contents can maintain a fixed rela-      tionship in order for the TD decoder to track the state of the SE = D D D − D δ2 IDCT MC y1 , vˆ2 MC z1, vˆ2 encoder and accurately decode P-frames continuously in high-        = D D D − D + D fidelity: IDCT MC y1 , vˆ2 MC IDCT y1 δ1, vˆ2 .   (28) = D yt IDCT zt . (36)

Usually the commonly used spatial-domain motion compen- −1 Proof. For intraframes (t = 0), y0 = IDCT{Q {Q{DCT sation algorithms are distributive: { }}}} = = { D} = = + x0 xˆ0 IDCT z0 .Fort 1, y1 h1 { D} MC(a + b, v) = MC(a, v) + MC(b, v). (29) MC(IDCT z0 , vˆ1), and      D D D D IDCT z = IDCT w + MC z , vˆ1 Since ∆ is assumed to be fixed independent of its pixel posi- 1  1  0   D D D tion, = IDCT w + IDCT MC z , vˆ1  1   0          = + D D − = D D D − D D y1 IDCT MC z0 , vˆ1 MC y0, vˆ1 δ2 IDCT MC y1 , vˆ2 MC IDCT y1 , vˆ2   = y + MC δ , vˆD (30) 1 1 2 (37) = ∆ + ∆ = 2∆. { D D }= { D} if IDCT MC (z0 , vˆ1) MC(IDCT z0 , vˆ1). In general, = { D} Now assuming that yt IDCT zt ,      = D D D − D      δt IDCT MC yt−1, vˆt MC zt−1, vˆt D = D + D D (31) IDCT zt+1 IDCT wt+1 MC zt , vˆt+1 = + − =      ∆ (t 1)∆ t∆. D D D = IDCT w + + IDCT MC z , vˆt+1  t 1   t   = + D D − This suggests that the state difference will diverge in propor- yt+1 IDCT MC zt , vˆt+1 MC yt, vˆt+1 ∆ tion to the number of P-frames between two I-frames. If is = yt+1 positive, then the decoded picture will become dimmer and (38) dimmer; otherwise, the decoded picture will become brighter and brighter. { D D }= { D} if IDCT MC (zt , vˆt+1) MC(IDCT zt , vˆt+1). 2.4. Matching SE with TD Notice that the first requirement of a matching MC-MCD A number of papers (see [8, 11, 12]) discuss the implementa- pair is the same as the matching condition for the TE-SD tion of a DCT-based decoder taking an SE-encoded bit stream pair. This implies that if we can build a matching MC-MCD but none of them has addressed the issue of divergent states pair satisfying the first requirement, then we can build a TE if their DCT-based decoder is not matched properly with the and TD so that we can mix the encoder-decoder pair in any conventional hybrid encoder. combination without reaching the divergent states. Consider the case when the encoded bit stream from an Similar to the case of matching TE with SD, we can derive SE is decoded by a TD, that is, how much the decoder state diverges from the encoder state in terms of the number of P-frames between two I-frames, if D = D = st st, vˆt vˆt. (32) the first condition cannot be met. Define the state difference,  δt for the tth P-frame as follows: We can show that      − − D   δt yt IDCT zt . (39) = D ht IDCT wt . (33)  Please note that δt is defined as the decoder state minus the The reconstructed frames are constructed from the original encoder state, opposite to δt in the case of matching TE- images accordingly: SD. Similarly, we assume that the second condition holds       and the spatial-domain and transform-domain motion com- D = −1 − xˆt IDCT Q Q DCT xt MC yt−1, vˆt pensators produce results differing by a fixed amount ∆ for    (34) each pixel for the sake of simplicity in the analysis as in the + MCD zD , vˆD . t−1 t TE-SD case:     In the same way as in the case of the TE-SD pair, we are IDCT MCD(a, v) − MC DCT{a},v = ∆. (40) going to show the following theorem.  = { D}− = Theorem 2. If the TD decoder and the SE encoder satisfy the Therefore, δ0 IDCT z0 y0 0. Also D   matching MC-MC condition as described in the following:  = D − δ1 IDCT z1 y1               D D D D D D MC IDCT z − , vˆt = IDCT MC z − , vˆt , = IDCT MC z , vˆ1 − MC IDCT z , vˆ1 (41) t 1 t 1 (35) 0 0 IDCT{a + b}=IDCT{a}+IDCT{b}, = ∆. Motion compensation on DCT domain 153

 The equality δ = δ = ∆ means that we can expect the same 50 divergent amount in the codec states for the SE-TD case as ∆ = 0.5 (slope = 0.500000) 45 for the TE-SD case. Similarly,      40  = D D − δ2 IDCT MC z1 , vˆ2 MC y1, vˆ2        (42) 35 = D D − D − IDCT MC z1 , vˆ2 MC IDCT z1 ∆, vˆ2 . 30 Because of the distributive property of spatial-domain 25 motion compensation (MC(a+b, v)=MC(a, v)+MC(b, v)), 20 and the assumption of ∆ being fixed independent of its pixel 15 position, Encoder state-decoder state          10  = D D − D + δ2 IDCT MC z1 , vˆ2 MC IDCT z1 , vˆ2 MC ∆, vˆ2 5 = + = ∆ ∆ 2∆. 0 (43) 0 102030405060708090100 Frame number In general,      Figure 7: Simulation result of the “Flower Garden” sequence for  = D D − δt IDCT MC zt−1, vˆt MC yt−1, vˆt matching TE-SD with ∆ = 0.5. (44) = ∆ + (t − 1)∆ = t∆.

0 This suggests that the state difference will diverge in propor- ∆ = 0.5 (slope = −0.500000) tion to the number of P-frames between two I-frames in the −5 opposite direction to the diverging state difference for the case −10 of matching TE with SD. If ∆ is positive, then the decoded e −15 picture will become brighter and brighter. −20

2.5. Simulation results - −25

We implement two types of encoders (SE and TE) and two −30 types of decoders (SD and TD). In order to make fair com- −35 parison, we use the full search block matching approach for both encoders to find the estimate of motion vectors. Fur- Encoder state decoder stat −40 thermore, in order to verify (31) and (44), we deliberately add −45 a constant ∆ to the transform-domain motion compensation −50 process so that the motion-compensated frames from both 0 10 20 30 40 50 60 70 80 90 100 the spatial-domain and transform-domain motion compen- Frame number sators differ by that constant ∆. Simulation is performed on the “Flower Garden” sequence. The simulation results for the Figure 8: Simulation result of the “Flower Garden” sequence for cases of matching TE-SD and SE-TD are plotted in Figures matching SE-TD with ∆ = 0.5. 7 and 8, respectively. Both figures show that the codec state differences for both cases grow linearly with respect to the number of P-frames as predicted by (31) and (44). For the such as prioritization of signal components from low order case of TE-TD, the TD decoder can keep track of the TE DCT coefficients to fit low-end communication resources. encoder state, as shown in Figure 9. Finally, many manipulation functions can be performed in the DCT domain more efficiently than in the spatial domain [14] due to a much lower data rate and removal of the decod- 3. DCT-BASED MOTION COMPENSATION SCHEMES ing/encoding pair. However, all the earlier works have been Manipulation of compressed video data in DCT domain focused mainly on manipulation at the decoder side. has been recognized as an important component in many To serve the purpose of building a fully DCT-based advanced video applications (cf. [8, 11, 13, 14, 15, 16, 17]). In motion compensated video coder, our aim is to develop the a video bridge where multiple sources of compressed video techniques of motion compensation in the DCT domain are combined and retransmitted in a network, techniques of without converting back to the spatial domain before motion manipulation and composition of compressed video streams compensation. In [14], the method of pixelwise (integer-pel) entirely in DCT domain eliminate the need to build a decod- translation in the DCT domain is proposed for extracting ing/encoding pair. Furthermore, manipulation in DCT do- a DCT block out of four neighboring DCT blocks at an main provides flexibility to match heterogeneous Quality of arbitrary position. Though addressing a different scenario, Service requirements with different network or user resources this method can be applied after modification to integer-pel 154 EURASIP Journal on Applied Signal Processing

1 0Ih3 0Iv3 = = H = , V = , ∆ 0.5 (slope 0.000000) 3 00 3 00 0.8

0.6 00 0I H = , V = v4 . (46) 4 I 0 4 00 0.4 h4

0.2 Here In is the n × n identity matrix, that is, In = { } 0 diag 1,...,1 and n is determined by the height/width of the corresponding subblock. These pre-multiplication and −0.2 post-multiplication matrix operations can be visualized in −0.4 Figure 10c where the overlapped grey areas represent the

Encoder state-decoder state −0.6 extracted subblock. Then these four subblocks are summed to form the desired translated block Bˆref. − 0.8 If we define the DCT operation on an N × N matrix B as −1   0 102030405060708090100 DCT B = DBDT , (47) Frame number where the (k, m) element of D is the DCT-II kernel: Figure 9: Simulation result of the “Flower Garden” sequence for = 2 kπ 1 matching TE-TD with ∆ 0.5. D(k, m)= C(k)cos m+ , for k, m=0,...,N−1. N N 2 (48) T Therefore, D D = (2/N)IN . The formation of the DCT of motion compensation in the DCT domain. For subpel Bˆref in the DCT domain can be described in this equation: motion compensation, we derive an equivalent form of bi- 4 linear interpolation in the DCT domain and then show that   N 2       it is possible to perform other interpolation functions for DCT Bˆref = DCT Hk DCT Bk DCT Vk . (49) 2 achieving more accurate and visually better approximation k=1 in the DCT domain without increasing the complexity. This corresponds to pre- and post-multiplication of the DCT 3.1. Integer-pel DCT-based motion compensation transformed Hk and Vk with the DCT of Bk since DCT is a unitary orthogonal transformation and is guaranteed to As illustrated in Figure 10a, after motion estimation, the cur- be distributive to matrix multiplications. The DCT of the × rent block C of size N N in the current frame It can be motion-compensated residual (displaced frame difference or best predicted from the block displaced from the current DFD) for the current block C is, therefore, (d ,d ) block position by the estimated motion vector u v in       the spatial domain. This motion estimate determines which DCT DFD = DCT Bˆref − DCT C . (50) four contiguous predefined DCT blocks are chosen for the prediction of the current block out of eight surrounding DCT{Hk} and DCT{Vk} can be precomputed and stored DCT blocks and the block at the current block position. To in the memory. Furthermore, many high-frequency coeffi- extract the displaced DCT block in the DCT domain, a direct cients of DCT{Bk} or displacement estimates are zeros (i.e., method is used to obtain separately from these four contigu- sparse and block aligned reference blocks), making the ac- ous blocks four subblocks which can be combined together tual number of computations in (49) small. In [14], simu- to form the final displaced DCT block as shown in Figure 10b lation results show that the DCT-domain approach is faster with the upper-left, lower-left, upper-right, and lower-right than the spatial-domain approach by about 10% to 30%. Fur- blocks from the previous frame It−1 labeled as B1, B2, B3, ther simplification is also possible as seen from Figure 10b and B4, respectively [14]. Subblocks Si are extracted in the that spatial domain from these four blocks by pre-multiplication = = = = and post-multiplication of the windowing/shifting matrices, HU H1 H3, HL H2 H4, (51) H and V : i i VL = V1 = V2, VR = V3 = V4. = = Sk HkBkVk, for k 1,...,4, (45) Therefore, only four windowing/shifting matrices need to be × accessed from the memory instead of eight. where Hk and Vk are the N N windowing/shifting matrices In [8, 18], further savings in the computation of the win- defined as dowing/shifting matrices is made by using fast DCT. It is 0I 00 reported that 47% reduction in computational complexity = h1 = H1 , V1 , with fast DCT over the brute-force method without the 00 Iv1 0 assumption of sparseness and 68% with only the top-left 4×4 00 00 subblocks being nonzero can be achieved with the use of H2 = , V2 = , Ih2 0 Iv2 0 fast DCT. Motion compensation on DCT domain 155

current block reconstructed from previous frame

B1 B3 v v L R

(du,)

hU

hL

B2 B4

9 blocks from previous frame current block in current frame current block (Bref)

(a) DCT-based motion compensation. (b) Pixelwise translated DCT block.

v v L R vL vR

h −1 h hU h U U U hU

B1 B3 B1 −1 − B3 v 1 dv L vR (half pel) −1 h h hL h L hL L L

B2 B4 B2 du B4 v v v v L R L (half pel) R

(c) Integer-pel compensation. (d) Half-pel compensation.

Figure 10: (a) Prediction of current block in current frame from four contiguous DCT blocks selected among nine neighboring blocks in previous frame based upon the estimated displacement vector for current block. (b) Schematic diagram of how a pixelwise translated DCT block is extracted from four contiguous DCT blocks. (c) Decomposition of integer-pel DCT-based translation as four matrix multiplication operations. (d) Decomposition of half-pel DCT-based translation as four matrix multiplication operations.

3.2. Subpixel DCT-based motion compensation implementation and effectiveness in prediction [2, 3], though it is well known that a range of other interpolation func- For the case of subpixel motion, interpolation is used to tions, such as cubic, spline, Gaussian, and Lagrange interpo- predict interpixel values. According to the MPEG standards, lations, can provide better approximation accuracy and more bilinear interpolation is recommended for its simplicity in pleasant visual quality (see [19, 20, 21, 22]). The complexity 156 EURASIP Journal on Applied Signal Processing

x (0) −1 polation filter matrices which act as a linear filter or trans- 2 x2(N ) form. Therefore, GBL(i) and GBR(i) can be replaced by any u FIR filter or interpolation function of finite duration (prefer- ably with the length much smaller than the block size N).

T 3.2.2 Bilinear interpolated subpixel motion x (0) x (N−1) x (0) x (N−1) 1a 1a 1b 1b compensation For the 2-D case, if (u, v) is the displacement of the recon- Figure 11: Illustration of extraction of the subpel displaced block structed block Bˆref measured from the upper left corner of x (n) x (n) x (n) 2 from two adjacent 1-D blocks 1a and 1b with the block B1, then for hU =u and vL =v, bilinear interpolation.   4       DCT Bˆref = DCT Hk DCT Bk DCT Vk , (56) k=1 argument is true if the interpolation operation is performed in the spatial domain, but in the DCT domain, it is possible to where employ better interpolation functions than the bilinear inter-   H1 = H3 = HU = GBL hU , polation without any additional computational load increase.   H2 = H4 = HL = GBR hU , 3.2.1 Interpolation filter  (57) = = = T V1 V2 VL GBL(vL , For simplicity of derivations, we start with the one-  = = = T dimensional half-pel bilinear interpolation and then proceed V3 V4 VR GBR(vL . to the two-dimensional case of quarter-pel accuracy with other interpolation functions. Consider two one-dimensional Here = −   adjacent blocks, x1a(n) and x1b(n) for n 0,...,N 1,as ··· ··· N−1 0 00.50.50 shown in Figure 11. We want to extract a block {x (n)} =        2 n 0   G h G h = ...... that displaced u pixels to the right of x1a(0) where u is sup- BL U BR U  . . . 0 . . 0  posed to be an odd multiple of 0.5 (i.e., half-pel motion). 0 ··· 0 ··· 00.50.5 Therefore, we can show that (58)  · ·   Once again, GBL( ) and GBR( ) can be precomputed and  1  x (N + n − i) + x (N + n − i + 1) , stored in the memory as in the case of integer-pel motion  1a 1a  2  ≤ ≤ − compensation and thus the extra computational load for  0 n i 2,    doing bilinear interpolation is eliminated.  1 − + = − = x1a(N 1) x1b(0) ,ni 1, x2(n)  2   3.2.3 Cubic interpolated subpixel motion     1 compensation  x1b(n − i)+x1b(n − i + 1) ,  2  Three different interpolation functions, namely cubic, cubic N − 1 ≥ n ≥ i, spline, and bilinear interpolations, are plotted in Figure 12a. (52) As can be seen, the bilinear interpolation has the short- where i =u. In the matrix form, est filter length and the cubic spline has a longest ripple but the cubic spline has the smallest approximation er- x2 = GBL(i)x1a + GBR(i)x1b, (53) ror among these three (cf. [22]). To compromise between where x2, x1a, and x1b are the column vectors of x2(n), filter length and approximation accuracy, we choose the x1a(n), and x1b(n), respectively, and GBL(i) and GBR(i) are cubic interpolation in the simulation. By choosing the res- defined as follows: olution of the filter as half a pixel length, the bilinear inter-   polation fhb(n) = [0.5, 1, 0.5] and the cubic interpolation 1 0Ii 0Ii−1 f (n) = [−0.0625, 0, 0.5625, 1.0000, 0.5625, 0, −0.0625]. G (i) = + , hc BL 2 00 00 From Figure 12b, it is clear that the contributions at the half-   (54) pel position from all the pixel values are summed up and give 1 00 00 · · GBR(i) = + . rise to the bilinear filter matrices GBL( ) and GBR( ).Ina 2 IN−i 0 IN−i+1 0 similar way, as in Figure 12c, the cubic filter matrices GCL(·) and G (·) can be defined as In the DCT domain, CR

      − = 0 0.0625Ii+1 0 0.5625Ii DCT x2 DCT GBL(i) DCT x1a GCL(i) = +     (55) 00 00

+ DCT GBR(i) DCT x1b . 0 0.5625I − 0 −0.0625I − + i 1 + i 2 , 00 00 Here, GBL(i) and GBR(i) can be regarded as bilinear inter- Motion compensation on DCT domain 157

Interpolation functions f(t) Bilinear interpolation f(t) 1 1 Cubic T Cubic Spline 0.9 0.8 Linear 0.8 0.7 0.6 0.6 0.5 0.4 f(t) f(t) 0.4 0.2 0.3 0.2 0 0.1 0 −0.2 −6 −4 −2 0246 −3 −2 −10 1 2 3 t [T = sampling period] t [T = sampling period]

(a) Different interpolation functions. (b) Bilinear interpolation.

Cubic interpolation f(t) Cubic spline interpolation f(t) 1 1 T T

0.8 0.8

0.6 0.6

f(t) 0.4 0.4 f(t)

0.2 0.2

0 0 −0.2 −4 −3 −2 −101234 −5012345−4 −3 −2 −1

t [T = sampling period] t [T = sampling period]

(c) Cubic interpolation. (d) Cubic spline interpolation.

Figure 12: (a) plots of different interpolation functions. (b), (c), and (d) depict how to form a pre- or post-multiplication matrix for half-pel or even quarter-pel DCT-based motion compensation.

00 00 residual can be obtained in a similar fashion: GCR(i) = + −0.0625IN−i−1 0 0.5625IN−i 0   4       00 00 DCT Bˆref = DCT Hk DCT Bk DCT Vk , + + . = 0.5625I − + 0 −0.0625I − + 0 k 1 (60) N i 1 N i 2       = ˆ − (59) DCT DFD DCT Bref DCT C , where Here GCL(·) and GCR(·) can be precomputed and stored. Therefore, its computational complexity remains the same H1 = H3 = HU = GCL(hU ), H2 = H4 = HL = GCR(hU ), as both integer-pel and half-pel bilinear interpolated DCT- = = = T = = = T based motion compensation methods. The reconstructed V1 V2 VL GCL(vL), V3 V4 VR GCR(vL). DCT block and the corresponding motion-compensated (61) 158 EURASIP Journal on Applied Signal Processing

This idea can be extended to other interpolation functions bandlimited signal from the sequence y(n) by a pair of sinc- such as sharped Gaussian [20] and quarter-pel accuracy. like η functions. By defining

N 3.3. Interpolation by DCT/DST 2 mnπ mνπ Q (n, ν)  K K cos cos , (66) c N n m N N Discrete Cosine Transform of type I (DCT-I) or type II (DCT- m=0 II) have successfully been applied to discrete interpolation the interpolated sequence is applications (cf. [23, 24, 25]). Interpolation using DST or the more general W transform has also been studied over the N     n years (cf. [26, 27]). Interpolation using DCT/DST is found z(n) = y n Q n , . (67) c M to surpass the usual DFT interpolation method, especially n=0 for sinusoidal inputs, and, by taking into account of the Notice that boundary condition, very accurate interpolated values can  be generated [25]. In the following, we will relate DCT-I in- N Kn  mπ terpolation with the Nyquist sampling theorem and show Qc (n, ν) = Km cos (n − ν) N = N that, by means of DCT-I interpolation, the DCT-II coeffi- m 0  cients of a half-pel shifted block can be directly obtained from N + mπ +  (68) the DCT-I coefficients of the original block. In this way, we Km cos (n ν) = N can build a DCT-based motion compensation block without m 0 K   the need of converting the DCT coefficients back to pixels = n η(n − ν) + η(n + ν) , before motion compensation as required in the conventional N approach. where

N 3.3.1 DCT-I interpolated sequence mπ η(x)  K cos (x). (69) m N The procedure of interpolation using DCT-I is given as fol- m=0 lows: It can be shown that (1) Compute the modified DCT-I defined as 1 cos(πx/2N) η(x)= sin πx · ≈Nsinc (x) if πx  2N. Y(m)=DCT-I{y(n)} 2 sin(πx/2N) (70) N 2 mnπ Since the orthogonal equation is = K K y(n) cos ; for m=0,...,N, N m n N n=0 N    2 m nπ mnπ  (62) K K cos cos = δ m − m , N m n N N n=0 where (71)  we can show that  1  ,m= 0, or N, K = 2 (63) = − m  Qc (n, ν) δ(n ν) if ν is an integer. (72) 1, otherwise. Therefore, z(nM) = y(n) for n = 0,...,N. This satisfies the (2) Append zeros to the DCT-I sequence Y(m) to form requirement of being an interpolation function. another DCT-I sequence Z(m): From the Nyquist sampling theorem, a continuous ban-  dlimited signal, f(t), can be perfectly reconstructed from its Y(m), m = 0,...,N, = sampled sequence, f (nT ) for the sampling interval T by a Z(m)  (64) 0,m= N + 1,...,MN. series of sinc functions (see [28, 29])

∞ (3) Obtain the interpolated sequence z(n) by calculating = − t the modified inverse DCT-I transform of the zero- f(t) f (nT )sinc n . (73) n=−∞ T padded Z(m) as below: The reconstructed f(t) can then be resampled at a differ- = −1{ } z(n) DCT-I Z(m) ent sampling rate, T1 = T/M, to generate an interpolated f(mT ) MN sequence 1 . mnπ = Z(m)cos ; for n = 0,...,MN. ∞ MN   m=0 m f mT1 = f (mT /M) = f (nT )sinc n − . (74) (65) n=−∞ M

The above interpolation procedure of using DCT-I can be Therefore, interpolation using DCT-I expressed in (67) is the shown to be equivalent to upsampling of the reconstructed truncated version of (74) for a symmetric sequence f (nT ). Motion compensation on DCT domain 159

Half-pel motion compensation on Infrared Car Half-pel motion compensation on Infrared Car

80

75 2000

70 1500 65 Cubic Linear 60 Cubic Zero-order 1000

MSE per pixel Linear MSE per pixel 55

50 500

45 0 2468101214 2468101214 Frame number Frame number

(a) Infrared Car sequence.

Half-pel motion compensation on Miss America Half-pel motion compensation on Miss America 21 300 Cubic 20 Linear 250 19

200 18

17 150 Cubic 16 Linear MSE per pixel MSE per pixel Zero-order 100 15

14 50

13 0 5 1015202530 5 1015202530 Frame number Frame number

(b) Miss America sequence.

Figure 13: Pictures from the Infrared Car and Miss America sequences are subsampled and displaced by a half-pel motion vector with different motion compensation methods. The MSE-per-pixel values are obtained by comparing the original unsampled pixel values with the predicted pixel values of the motion compensated residuals. Zero-order interpolation means replication of sampled pixels as the predicted pixel values.

3.3.2 DCT-II of DCT-I interpolated half-pel motion the original sequence y(n) as follows: compensated block W (k)  DCT-II{w(i)} Given a sequence of length N + 1, {y(n); n = 0,...,N}, {w(n); n = 0,...,N − 1} N−1 the half-pel shifted sequence is 2 kπ 1 generated by sampling the DCT-I interpolated sequence of = C(k) z(i) cos i + N N 2 length 2N + 1, z(n) such that i=0 N −1   w(i) = z(2i + 1) for i = 0,...,N − 1. (75) = C(k) Y(m) δ(m + k) + δ(m − k) m=0 The DCT-II coefficients W (k) of w(i) are found to have sim- for k = 0,...,N − 1. ple relationship with modified DCT-I coefficients Y(m) of (76) 160 EURASIP Journal on Applied Signal Processing

Therefore, Infrared Car (hca) √ 65 W(0) = 2Y(0); W (k) = Y (k) for k = 1,...,N − 1. Half-Pel Bilinear Half-Pel Cubic (77) 60 Quarter-Pel Bilinear With this simple relationship, once the modified DCT-I Quarter-Pel Cubic coefficients of an N + 1 sequence are obtained, the DCT-II 55 of the DCT-I interpolated half-pel shifted block will easily be obtained via (77). or per pel (MSE) 50

3.3.3 Simulation results err re Simulation is performed on the Infrared Car and Miss Amer- 45 ica sequences to demonstrate the effectiveness of our bilinear and cubic motion compensation methods. Mean squa 40 The first set of simulations subsamples each picture It(i, j) from the sequences (i.e., y(i, j) = It(2 ∗ i, 2 ∗ j)) 35 and then this shrinked picture y(i, j) is displaced by a half- 246810121416 pel motion vector (arbitrarily chosen as (2.5, 1.5)) with both Frame number bilinear and cubic interpolated motion compensation meth- ods. The mean square errors per pixel (MSE) are computed as Infrared Car (hca) = − 2 2 0.92 MSE ( i,j [x(i,ˆ j) x(i, j)] )/N by treating the original unsampled pixels It(2∗i+1, 2∗j+1) as the reference picture 0.9 x(i, j) = It(2 ∗ i + 1, 2 ∗ j + 1) where x(i,ˆ j) is the predicted ) pixel value from y(i, j). As shown in Figure 13, the zero- 0.88 order interpolation is also simulated for comparison. The zero-order interpolation, also called sample-and-hold inter- 0.86 polation, simply takes the original pixel value as the predicted half-pel pixel value [21]. As can be seen in Figure 13, both the 0.84 per sample (bps bilinear and cubic methods have much lower MSE values than 0.82 the zero-order method and also the cubic method performs 0.8 Half-Pel Bilinear much better than the bilinear counterpart without increased Half-Pel Cubic Quarter-Pel Bilinear computational load. Quarter-Pel Cubic Figures 14 and 15 show the results of another set of sim- 0.78 ulations in which the subpixel DCT-based motion compen- 0.76 sation algorithms generate motion compensated residuals of 2 4 6 8 10 12 14 16 the sequences“Infrared Car”and“MissAmerica,”respectively, Frame number (quality = 32) based on the displacement estimates of the full search block matching algorithm, where the residuals are used to compute Figure 14: Pictures from the Infrared Car sequence are subsampled the MSE and BPS values for comparison. It can be seen that and displaced by a half-pel motion vector with different motion the cubic interpolation approach achieves lower MSE and compensation methods. The MSE-per-pixel values are obtained by BPS values than the bilinear interpolation. comparing the original unsampled pixel values with the predicted pixel values of the motion compensated residuals. Zero-order in- terpolation means replication of sampled pixels as the predicted 4. CONCLUSION pixel values. Despite the conceptual simplicity in the conventional hybrid DCT motion compensated codec structures adopted for implementation of most DCT-based video coding standards, transform domains at both the encoder and the decoder. an alternative DCT-based video codec structure has been pro- Specifically, if the spatial-domain motion compensation is posed in the past to address a number of shortcomings in- compatible with the transform-domain motion compensa- herent in the conventional hybrid architecture. However, no tion, then the states in both the coder and the decoder will prior research has been directed at what criteria can allow keep track of each other even after a long series of P-frames. these two drastically different architectures to interoperate Otherwise, the states will diverge in proportion to the num- with each other in accordance with the standards. ber of P-frames between two I-frames. This sets an impor- In this paper, we establish the criteria for matching con- tant criterion for the development of any DCT-based motion ventional codecs with fully DCT-based codecs such that com- compensation schemes. patibility with existing standards and interoperability with Following this criterion,we also discuss and develop some each other are guaranteed. We find that the key to this in- DCT-based motion compensation schemes as important teroperability lies in the heart of the implementation of building blocks of fully DCT-based codecs. For the case motion compensation modules performed in the spatial and of subpixel motion compensation, DCT-based approaches Motion compensation on DCT domain 161

Mis America in QCIF (hms) and Associated Audio for Digital Storage Media at up to about 9 1.5 Mbit/s, ISO/IEC 11172, Geneve, Switzerland, 1993. [3] CCITT Recommendation MPEG-2, Generic Coding of Moving 8 Half-Pel Bilinear Half-Pel Cubic Pictures and Associated Audio, ISO/IEC 13818,Geneve,Switzer- Quarter-Pel Bilinear Quarter-Pel Cubic land, 1994 H.262. 7 [4] J. S. McVeigh and S.-W. Wu, “Comparative study of partial closed-loop versus open-loop motion estimation for coding of 6 HDTV,” in Proc. IEEE Workshop on Visual Signal Processing and Communications, New Brunswick, September 1994, pp. 63–68. 5 [5] H. Li, A. Lundmark, and R. Forchheimer, “Image sequence coding at very low bitrates: a review,” IEEE Trans. Image Pro- 4 cessing, vol. 3, no. 5, pp. 589–609, 1994. [6] U. V. Koc and K. J. R. Liu, “DCT-based motion estimation,” 3 IEEE Trans. Image Processing, vol. 7, no. 7, pp. 948–965, 1998.

Mean square error per pel (MSE) [7] U. V. Koc and K. J. R. Liu, “Interpolation-free subpixel motion 2 estimation techniques in DCT domain,” IEEE Trans. Circuits 1 and Systems for Video Technology, vol. 8, no. 4, pp. 460–487, 0 50 100 150 1998. Frame number [8] N. Merhav and V. Bhaskaran, “Fast algorithms for DCT- domain image downsampling and for inverse motion compen- sation,” IEEE Trans. Circuits and Systems for Video Technology, Miss America in QCIF (hms) vol. 7, no. 3, pp. 468–476, 1997. 0.33 [9] M. Song, A. Cai, and J. A. Sun, “Motion estimation in DCT 0.32 Half-Pel Bilinear domain,” in Proceedings of International Conference on Com- Half-Pel Cubic Quarter-Pel Bilinear munication Technology (ICCT), 1996, vol. 2, pp. 670–674. 0.31 Quarter-Pel Cubic [10] S. Ji and H. W. Park, “Moving object segmentation in DCT- 0.3 based compressed video,” Electronics Letters, vol. 36, no. 21, pp. 1769–1770, 2000. 0.29 [11] J. Song and B.-L. Yeo, “A fast algorithm for DCT-domain in- verse motion compensation based on shared information in a 0.28 ,” IEEE Trans. Circuits and Systems for Video Tech- 0.27 nology, vol. 10, no. 5, pp. 767–775, 2000. [12] R. Kleihorst and F. Cabrera, “Implementation of DCT-domain Bits per sample (bps) 0.26 motion estimation and compensation,” in IEEE Workshop on 0.25 Signal Processing Systems, October 1998, pp. 53–62. [13] S. F. Chang and D. G. Messerschmitt, “A new approach to 0.24 decoding and compositing motion-compensated DCT-based 0.23 images,” in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Pro- 0 50 100 150 cessing, April 1993, vol. V, pp. 421–424. Frame number (quality = 32) [14] S.-F. Chang and D. G. Messerschmitt, “Manipulation and com- positing of MC-DCT compressed video,” IEEE Journal on Se- lected Areas in Communications, vol. 13, no. 1, pp. 1–11, 1995. Figure 15: Pictures from the Miss America sequence are subsam- [15] Y. Y. Lee and J. W. Woods, “Video post-production with com- pled and displaced by a half-pel motion vector with different motion pressed images,” SMPTE Journal, vol. 103, pp. 76–84, February compensation methods. The MSE-per-pixel values are obtained by 1994. comparing the original unsampled pixel values with the predicted [16] B. C. Smith and L. Rowe, “Algorithms for manipulating com- pixel values of the motion compensated residuals. Zero-order in- pressed images,” IEEE and Applications,vol. terpolation means replication of sampled pixels as the predicted 13, no. 5, pp. 34–42, 1993. pixel values. [17] J.-B. Lee and B. G. Lee, “Transform domain filtering based on pipelining structure,” IEEE Trans. Signal Processing, vol. 40, pp. 2061–2064, August 1992. [18] N. Merhav and V. Bhaskaran, “A fast algorithm for DCT- allow more accurate interpolation without any increase in domain inverse motion compensation,”in Proc. IEEE Int. Conf. computation. Furthermore, a scare number of DCT coeffi- Acoustics, Speech, Signal Processing, Atlanta, May 1996, vol. IV, cients after quantization significantly decreases the number pp. 2307–2310. of calculations required for motion compensation. Coupled [19] R. W. Schafer and L. R. Rabiner, “A digital signal processing approach to interpolation,” Proceedings of the IEEE, vol. 61, no. with the DCT-based motion estimation algorithms, it is pos- 6, pp. 692–702, 1973. sible to realize fully DCT-based codecs to overcome the dis- [20] W. F. Schreiber, Fundamentals of Electronic Imaging Systems— advantages of conventional hybrid codecs. Some Aspects of Image Processing, Springer-Verlag, 3rd edition, 1993. REFERENCES [21] A. K. Jain, Fundamentals of , Prentice- Hall, 1989. [1] CCITT Recommendation H.261, Video Codec for Audiovisual [22] H. Hou and H. C. Andrews, “Cubic splines for image interpo- Services at p × 64 kbit/s, CCITT, August 1990. lation and digital filtering,” IEEE Trans. Acoustics, Speech, and [2] CCITT Recommendation MPEG-1, Coding of Moving Pictures Signal Processing, vol. 26, no. 6, pp. 508–517, 1978. 162 EURASIP Journal on Applied Signal Processing

[23] Z.Wang,“Interpolation using type I discrete cosine transform,” Electronics Letters, vol. 26, no. 15, pp. 1170, 1990. [24] J. I. Agbinya, “Interpolation using the discrete cosine trans- form,” Electronics Letters, vol. 28, no. 20, pp. 1927, 1992. [25] C.-Y. Hsu and S.-M. Chen, “Discrete interpolation using the discrete cosine transform with the mapping of the boundary conditions,” IEEE Signal Processing Letters, vol. 2, no. 10, pp. 185, 1995. [26] Z. Wang and L. Wang, “Interpolation using the fast sine trans- form,” Signal Processing, vol. 26, no. 2, pp. 131–137, 1992. [27] Z. Wang, G. A. Jullien, and W. C. Miller, “The generalized dis- crete W transform and its application to interpolation,” Signal Processing, vol. 36, no. 1, pp. 99–109, 1994. [28] A. V. Oppenheim and R. W. Schafer, Discrete-Time Signal Pro- cessing, Signal Processing. Prentice Hall, New Jersey, 1989. [29] S. P. Kim and W.-Y. Su, “Direct image resampling using block transform coefficients,” Signal Processing: Image Communica- tion, vol. 5, no. 3, pp. 259–272, 1993.

Ut-Va Koc is a Distinguished Member of Technical Staff (DMTS) at Bell Labs Re- search, Lucent Technologies, Murray Hill, New Jersey. He has published numerous peer-reviewed papers and book chapters on signal processing in multimedia and com- munications. Dr Koc has been active in serv- ing as reviewer of various journals and con- ferences, editor for EURASIP Journal of Ap- plied Signal Processing and guest co-chair in various international conferences. He received the Ph.D. degree from the University of Maryland, College Park and the B.S. degree from the National Chiao Tung University. K. J. Ray Liu received the B.S. degree from the National Taiwan University, and the Ph.D. degree from UCLA, both in electri- cal engineering. He is a Professor of Elec- trical and Computer Engineering Depart- ment and Institute for Systems Research of University of Maryland, College Park. His research interests span broad aspects of signal processing algorithms and archi- tectures; image/video compression, coding, and processing; wireless communications; security; and medical and biomedical technology in which he has published over 200 pa- pers. Dr Liu received numerous awards including the 1994 National Science Foundation Young Investigator, the IEEE Signal Processing Society’s 1993 Senior Award (Best Paper Award), IEEE Vehicular Technology Conference Best Paper Award, Amsterdam, 1999, and the George Corcoran Award in 1994 for outstanding contributions to electrical engineering education and the Outstanding Systems Engineering Faculty Award in 1996 in recognition of outstanding contributions in interdisciplinary research, both from the Univer- sity of Maryland. Dr Liu is Editor-in-Chief of EURASIP Journal on Applied Signal Processing, and has been an Associate Editor of IEEE Transactions on Signal Processing, a Guest Editor of special issues on Multimedia Signal Processing of Proceedings of the IEEE,a Guest Editor of special issue on Signal Processing for Wireless Commu- nications of IEEE Journal of Selected Areas in Communications, a Guest Editor of special issue on Multimedia Communications over Networks of IEEE Signal Processing Magazine, a Guest Editor of special issue on Multimedia over IP of IEEE Trans. on Multimedia, and an editor of Journal of VLSI Signal Processing Systems. He cur- rently serves as the Chair of Multimedia Signal Processing Technical Committee of IEEE Signal Processing Society and the Series Editor of Marcel Dekker series on signal processing and communications.