Quick viewing(Text Mode)

Motion Compensated Deinterlacer Analysis and Implementation

Motion Compensated Deinterlacer Analysis and Implementation

2008:126 CIV MASTER'S THESIS

Motion Compensated Deinterlacer Analysis and Implementation

Johan Helander

Luleå University of Technology MSc Programmes in Engineering Arena, Media, Music and Technology Department of Computer Science and Electrical Engineering Division of Signal Processing

2008:126 CIV - ISSN: 1402-1617 - ISRN: LTU-EX--08/126--SE

Master’s Thesis

Supervisor: Magnus Hoem Examiner: Magnus Lundberg Nordenvaad

Telestream AB & Department of Computer Science and Electrical Engineering, Signal Processing Group, Luleå University of Technology

Preface

This Master’s Thesis was carried out by me during the autumn term 2007 and beginning of 2008 at Telestream AB’s office at Rådmansgatan 49, Stockholm. It is part of the Master of Science program Arena Media, Music and Technology at Luleå University of Technology (LTU). Because of my education and interest in signal processing in media applications, the proposed topic was very well suited. The reader is assumed having basic knowledge about signal processing, such as sampling, quantization, aliasing and so on. I would like to thank Telestream AB for their warm welcome and comfortable treatment during this period. I would especially like to thank Magnus Hoem, CEO Telestream AB, for the opportunity to carry this thesis through, Nils Andgren, Telestream AB, for help and support through important thoughts and discussions, Kennet Eriksson, Telestream AB, for supplying test sequences. Finally, I would like to thank Maria Andersson, for great support by illustration of the majority of the figures contained in this Master’s Thesis.

 ii Abstract

In the early days of as Cathode Ray Tube (CRT) screens became brighter, the level of caused by progressive scanning became more noticeable. This is because the human visual system is sensitive to large-area flicker. Interlaced scanning was invented in 1932 as a redeem to this difficulty. In contrast to progressive scanning, where every line is drawn in sequence, interlaced scanning alternates lines of a frame in half a frame interval, called a field. The conversion process from interlace scan to is called . In this thesis, two deinterlacing methods, which use motion information from the video sequence, were used in conjunction to obtain an improved result. Thus, the process of finding the true motion on the sequence also had to be analyzed. The analysis was done on artificially generated test sequences, as well as true video sequences. The result was measured using the Mean Square Error between a progressive input sequence and the deinterlaced output sequence. This measurement was compared to a much simpler deinterlacing algorithm and showed large improvements, primarily in sense of aliasing. However, in some cases, the deinterlacer produced severe artifacts causing picture degradation.

iii iv Contents

1 Introduction 9 1.1 Background ...... 9 1.2 Problem description ...... 10 1.3 Purpose ...... 11 1.4 Limitations . 11

2 Prerequisites 13 2.1 Digital image and video signals . 13 2.1.1 Two-dimensional sampling . 13 2.1.2 Temporal sampling. 13 2.1.3 Temporal alias...... 14 2.2 Generalized Sampling Theorem . 15 2.3 Deinterlacing . 15 2.3.1 Line Averaging...... 17 2.4 Temporal redundancy ...... 17 2.4.1 Block-matching motion estimation . 17 2.4.2 Motion compensation ...... 19

3 Motion estimation on interlaced video 21 3.1 The 3-D Recursive-Search Block-Matcher ...... 21 3.1.1 Variable 3-D Recursive-Search Block-Matcher. . . 23 3.2 Time-Recursive motion estimation...... 25 3.3 GST based motion estimation ...... 26 3.3.1 Applying a first-order linear interpolation . . . . . 30

 Contents

4 Motion compensated deinterlacing 31 4.1 Time-Recursive deinterlacing . 31 4.2 GST based deinterlacing...... 32 4.2.1 The GST interpolation filter ...... 32 4.2.2 Applying a first-order linear interpolation . . . . . 35 4.3 GST deinterlacing with Recursive GST motion estimation 36 4.4 Robust GST deinterlacing ...... 37 4.5 Two-dimensional extension . 39

5 Results 41 5.1 Evaluation of the RGST motion estimator and GST deinterlacer . 41 5.1.1 Performance on artificial test sequences...... 41 5.1.2 Performance on true video sequences ...... 43 5.1.3 Subjective analysis. 46

6 Discussion 49 6.1 Problems and improvements...... 49 6.1.1 Edge difficulties...... 49 6.1.2 Singularity problem...... 49 6.1.3 Improved fall-back method . 50 6.1.4 Improved interpolation . 50 6.1.5 Further robustness ...... 50 6.1.6 Two dimensional GST interpolation ...... 50 6.1.7 Chrominance deinterlacing. 51

7 References 53

vi vii viii 1 Introduction

1.1 Background Progressive scanning, also known as sequential scanning, is the method for displaying, storing or transmitting moving images, frames, where every horizontal line of each frame is drawn in sequence (see Figure 1). In the early days of television as Cathode Ray Tube (CRT) screens became brighter, the level of flicker caused by progressive scanning became more noticeable. This is because the human visual system is sensitive to large-area flicker. An increased would have solved this perception problem, but would also have consumed a larger amount of bandwidth. Also, CRT screens at that time limited the amount of frames to be displayed per second. In 1932 the interlace technology was invented by Radio Corporation of America engineer Randall C. Ballard [1] partly as a redeem to the flicker problem. The technique improves the picture quality by removing large-area flicker without consuming any extra bandwidth. In contrast to progressive scanning, where every line is drawn in sequence, interlaced scanning alternates lines of a frame in half a frame interval, called a field. Consequently, two fields form one frame. One field contains all the odd lines of the image while the other field contains all the even lines of the image as shown in Figure 2. The afterglow of the phosphor of the CRT screen, in combination with the persistence of vision results in two fields being perceived as a continuous image. Hence, interlace makes it possible to view full horizontal resolution with half the bandwidth which would be required

 1. Introduction

fs f n s -1 2 n-1

n n Frame Field

Figure 1. A sequence of progressive Figure 2. A sequence of interlaced fields. frames. for a progressive scan image, while maintaining the necessary refresh rate to prevent large-area flicker. Because of the achieved compromise between quality and required bandwidth, interlacing has been used exclusively until the introduction of computer monitors in the 1970s.

1.2 Problem description As described, traditional CRT screens are natively designed for interlace scanning. Recent display technologies like Liquid Crystal Display (LCD) and Panel (PDP) on the other hand, requires progressive scan video to display correctly. The conversion process from interlace scan to progressive scan is called deinterlacing. This can be done in numerous ways but will here be focused on a motion compensated method. Motion compensation uses estimated motion vectors to realize an improved result. Interlace scanning complicates this estimation though, as well as several other image-processing tasks.

 Assuming the response time of these technologies could be made fast enough, interlace scan on such a display would result in a halving of brightness due to half of the pixels remaining black every other field. Consequently, conversion from interlaced to progressive scan is necessary.  See section 2.4 for further details on motion compensation, motion estimation and motion vectors.

10 1. Introduction

1.3 Purpose The main purpose of this thesis was to achieve superior-quality deinterlacing of interlaced video signals. Since this process can gain from motion compensation, the process of motion estimation on interlaced scanning also had to be investigated.

1.4 Limitations A block-matching motion estimator can be divided into a matching criterion and a search strategy. Because of its massive line of research, the focus was not on search strategy in this thesis. For straightforward and smooth analysis, Matlab was used as the tool for implementation of researched methods. The result does only take grayscale video into account due to simplification and computational time.

11 12 2 Prerequisites

2.1 Digital image and video signals Digital processing requires the signal to be digitally represented through sampling and quantization. This can be further studied in [2].

2.1.1 Two-dimensional sampling One-dimensional sampling can be extended to two-dimensional sampling by adding an additional dimension. A gray-scale image can then be digitally represented on a sample lattice by sampling the continuous intensity values along two orthogonal axes x and y (see Figure 3).

Figure 3. Matlab mesh plot of an image containing a bicycle.

13 2. Prerequisites

y

Figure 4. Vertical- temporal diagram of a progressive sample lattice. n n-2 n-1 n

2.1.2 Temporal sampling When a sequence of images is observed, this is perceived as a continuous scene by the human visual system [3]. To be able to represent moving images, this is exploited in the line of video technology by capturing, transmitting and storing image sequences, i.e. video. This is done just like sampling of static images, but adding an additional dimension, time. For a moving scene, the two-dimensional array of samples is repeatedly collected for various discrete-time instances arriving at a three-dimensional array of intensity samples (see Figure 4). The intensity value of a pixel in position on the sample lattice and  image number n can then be expressed by the function fp [x ,n] ( p for progressive).

2.1.3 Temporal alias Consider a white bar moving in the horizontal direction between temporal samples. This motion will construct a variation of intensity  over time if observing a specific position x . If the motion of the bar is small, the waveform representing the variation of intensity over time will be accurately sampled. If, on the other hand, the movement of the  edge is large enough, that same x in the next frame misses the bar and it can no longer be reconstructed. Nyquist-Shannon’s Sampling Theorem [2] is no longer valid because the Nyquist frequency has been violated. This is, due to two reasons, not as bad as it may seem at first. The first reason is that the human visual system is exceptionally tolerant against temporal aliasing. Even if there is temporal aliasing

14 2. Prerequisites present in an image sequence, the brain can interpret and understand how the objects in the image are moving [3]. The second reason temporal aliasing can be overlooked is that practical sampling cannot be made instantaneously. In a capture device, each frame requires a certain exposure time, i.e. sample aperture. The sample aperture acts as a low-pass filter blurring the fast-moving details. The effect does not remove all temporal aliasing but will in conjunction with the first reason render an acceptable moving image sequence. However, it is important to consider temporal aliasing when dealing with motion compensated video processing.

2.2 Generalized Sampling Theorem The sampling theorem states that a bandwidth-limited signal with fs maximum frequency 2 can be exactly reconstructed if this signal is sampled with a frequency higher than fs . In 1956, Yen [4] showed a generalization of this theorem (GST). He proved that any fs bandwidth-limited signal with maximum frequency 2 can exactly be reconstructed by N independent sets of samples, sampled with fs frequency N . This theorem will later be used for motion estimation and deinterlacing.

2.3 Deinterlacing Interlace can be described as a form of spatio-temporal subsampling by a factor of two. The vertical-temporal sample lattice for interlace can  be seen in Figure 5. The field f[x ,n] is defined for y mod 2= n mod 2 only. Deinterlacing aims at removing the subsample artifacts by interpolating the missing lines. The output of a deinterlacer can be defined as:

(1)

 where fi [x ,n] is the interpolated pixels.

 A film camera, video camera etc.

15 2. Prerequisites

y

Figure 5. Vertical- temporal diagram of a interlaced sample lattice. n n-2 n-1 n

At first glance deinterlacing may seem as a straightforward application of general sample rate conversion theory [5], i.e. vertical up-sampling of a factor two. However, such an up sampling is only valid if the signal satisfies the sampling theorem. This is not fulfilled for most video signals since filtering prior to the sampling is omitted. Due to the video capture technique, the pre-filtering has to be accomplished in the optical path which is not a practical solution [5] partially because of the complexity of such a system. Also because of the human visual system where the temporal frequencies at the retina of an observer have an unknown relation with the spatial content. High frequencies, due to object motion, are mapped to zero frequencies at the retina when the viewer tracks the object. Therefore, optical pre- filtering is not optimal and would degrade the image quality for the viewer. As a result, an alternate method than vertical up-sampling for deinterlacing has to be accomplished. Due to the discussed practical and theoretical problems, many deinterlacing algorithms have been proposed. Until the end of the 1970’s, linear deinterlacing methods were the common approach. From the early 1980’s, it was suggested that nonlinear methods could outperform linear methods. Next, motion compensation was proposed as a method to further improve deinterlacing of images with motion but was initially considered too complex for consumer applications.

 As temporal sampling of a continuous moving image requires the signal to be made discrete directly from the optical signal (in conturary to sampling of sound for instance, where the signal is made electronic before sampling), the pre-filtering can only be completed in the optical path.

16 2. Prerequisites

Consequently, deinterlacing algorithms can be categorized into non-motion compensated methods and motion compensated method. Furthermore, non-motion compensated methods can be divided into linear and nonlinear methods. An overview of the most relevant deinterlacing proposals can be found in [5] and [6]. The deinterlace method focused in this Master’s Thesis was mainly chosen due to the theoretically optimal performance. However, as confirmed in [7], the method also seems superior when measured in a subjective sense.

2.3.1 Line Averaging One of the simplest deinterlace methods is Line Averaging, also called “Bob”. This is a linear spatial filter, which simply takes the average of each surrounding line to interpolate the missing line. It can be described by:     −  +  +   fx uy ,n  f  x uy ,n  f[x ,n]= (2) i 2 where . However, this filter introduces aliasing. Generally, all spatial deinterlacing filters balance between aliasing and resolution.

2.4 Temporal redundancy In a typical scene represented by a sequence of images, there will be a great deal of similarity between surrounding images of the same sequence. The similarities between images are found by means of motion estimation and can then be exploited by using motion compensation.

2.4.1 Block-matching motion estimation Generally, motion estimation using block-matching tries to find the correlated portions of images applying a search strategy and a match criterion. This is done by dividing the images into smaller portions, called macro blocks. The motion is found between the macro block from the previous image and the macro block in the current image. The displacement between the macro blocks can be described by a motion vector pointing from the origin block in the previous image to

17 2. Prerequisites the translated block in the current image. Each macro block in the image is assigned its own motion vector. The difficulty lies within finding the matching block. What makes it difficult is the lack of sufficient temporal samples. If the temporal sampling frequency obeyed Nyquist, it would be a straightforward task tracking an object from one image to the next. But as earlier declared, the temporal sampling of all common imaging systems is much less than the Nyquist frequency. Consequently, if we know the position of an object in one image of the sequence, we have no idea where it will be in the next. The search method defines in what position to look for the correlated pixels. One of the simplest methods is Full Search. It shifts a macro block in the current image over a set of candidate vectors to the previous image. The candidate set includes all possible positions within a defined search area. Because of the massive amount of candidate vectors, this strategy is very computationally intensive. However, if there is a match within the search area, the Full Search method will find it. The match criterion defines the amount of correlation between macro blocks. There are a number of possible criteria, though Minimum Mean Square Error (MMSE) and Minimum Absolute Difference (MAD) are commonly used. In the case of progressive images, the Mean Square Error (MSE) and Sum of Absolute Difference (SAD) error function can be defined as:

(3)

(4)

  where f[x ,n] is the existing progressive image, ˆf[x ,n] is the estimated image, and M and N is the vertical and horizontal size of macro block B in pixels. The motion vector for a macro block can consequently be found by minimizing the error function for all candidate vectors giving the MMSE and MAD criteria.

18 2. Prerequisites

2.4.2 Motion compensation The use of the knowledge of the displacement of an object in successive frames is called motion compensation. It is a nonlinear technique for describing an image in terms of a translated reference image. Motion compensation can be utilized in several video applications, e.g. video compression, conversion, noise reduction and deinterlacing. In video compression for example, the temporal redundancy is eliminated using motion compensation, resulting in a reduction of stored and transmitted data.

19 20 3 Motion estimation on interlaced video

3.1 The 3-D Recursive-Search Block-Matcher Different applications demand different types of motion estimators. In video compression, motion compensation helps reducing the amount of stored and transmitted data. The estimated motion vectors do not necessarily have to reflect the true motion of objects. In motion compensation deinterlacing, it is however important that the vectors describe the true motion. The use of non-true motion vectors can result in severe artifacts such as misplaced object parts. Interlacing complicates the estimation of true motion vectors. As the video signal contains spatial aliasing components, unreliable estimated motion vectors can occasionally not be prevented. In the motion estimation and motion compensation deinterlacing algorithms described here, the 3-D Recursive-Search Block-Matcher [8] is used. This algorithm uses a small number of candidate vectors per macro block and provides quarter pixel accuracy. Also, due to an inherent smoothness constraint, it yields very coherent motion vector fields that closely correspond to the true motion of objects. Thus, this method is suitable for motion compensated frame rate conversion and motion compensated deinterlacing. The candidate vector is selected from a candidate set S defined as:

21 3. Motion estimation on interlaced video

     = + + S {(d1 U1 ),(d2 U2 ),d3} (5)    where d1 , d2 and d3 are previously calculated motion vectors from the spatial and spatio-temporal surrounding illustrated in Figure 6.   U1 and U2 are update vectors which allows new motion vectors to be selected. They are selected from an update vector set U set :

. (6)

It is proposed that the update vectors are updated on block basis according to:

(7) where b is the output of a block counter, L is a look-up table function which provides a vector from U set , a is the number of update vectors in U set and . Preferably, a is not a factor of the number of blocks in the image to prevent a relation between the update vector and the spatial position in the image. Consequently, each macro block uses three candidate vectors where only one of these is an updated candidate. The candidate vector that yields the smallest error value from the matching criterion is selected as the final motion vector for

Current calculated macro block

Macro block in current field for spatial prediction

Macro block in previous field for spatio-temperal prediction

Figure 6. Positions, relative to the current macro block, from which the motion vectors d1 , d2 and d3 are taken in the 3-D Recursive-Search Block-Matcher.

22 3. Motion estimation on interlaced video that macro block. The matching criterion is described in the following sections.

3.1.1 Variable 3-D Recursive-Search Block-Matcher Alternatively, the macro blocks from which the spatial and spatio- temporal predictions are taken can be varied. This was experimentally confirmed to provide a more responsive motion vector field, although some smoothness might be offered. Here, the distance from which the prediction vectors are taken depends on the previously calculated motion vector. If the velocity of the previously calculated motion vector at the current macro block position is small, the distance to the prediction macro block position will also be small. This method is used to realize a faster convergence if an object appears in the frame. If the object is moving with a small velocity, the prediction macro block positions should be in the near neighborhood to realize a high correlation between the motion vectors. Similarly, if the object is moving with a large velocity, the motion vectors should be taken from a prediction macro block position further away from the current macro block. This way, the current motion vector can converge faster to the velocity of the appearing object. The candidate set is defined as:

(8)

 where the motion vectors d1...4 are previously calculated motion vectors from the spatial and spatio-temporal surrounding illustrated  in Figure 7. U1 is selected from the update set U set1 :

(9)

 U2 selected from U set2 :

(10)

 U3 selected from U set3 :

23 3. Motion estimation on interlaced video

Current calculated macro block

Macro block in current field for spatial prediction

Macro block in previous field for spatio-temperal prediction

Figure 7. Positions, relative to the current macro block, from which the motion vectors d1 , d2 , d3 and d4 are taken in the Variable 3-D Recursive-Search Block-Matcher

(11)

 and U4 selected from U set4 :

. (12)

Each update vector is then updated for every macro block according to:

(13)

where L1... 4 is a look-up table function for its corresponding update set

U set1...4 . The distances illustrated as arrows in Figure 7 between the current macro block and the macro block from which the prediction vector is taken, depends on the magnitude of the motion vector in the previous field n −1 . As illustrated, these distances are added respectively subtracted from the origin of the prediction postition, resulting in a prediction position adjusted to the motion in the sequence. This distance be described as:

(14)

24 3. Motion estimation on interlaced video

where Round() is rounding to the nearest integer value and q is a suitable coefficient and is the previously calculated motion vector in the previous field.

3.2 Time-Recursive motion estimation 1990, Wang et al. [9] proposed a high-quality time-recursive deinterlacing concept. This algorithm uses motion compensation and can be combined with the 3-D Recursive-Search Block-Matcher. It uses the previously deinterlaced field instead of the previous field to estimate the motion. If the previous field is properly deinterlaced, it contains all vertical details, allowing a better motion estimation. Motion estimation is preferably completed over the shortest possible time interval because:

▪ A longer interval requires storage of intermediate fields, which increase the cost of the system ▪ In case of acceleration, longer intervals are less accurate ▪ Covering and uncovering regions grow with increasing time interval of motion estimation.

If the motion vector is known, the difference between the estimated and the current existing field n can be minimized. Or equivalently, a candidate vector can be found that minimizes the difference between the estimated and the current existing field n . The minimization results in the finally selected motion vector . The matching criterion for the Time-Recursive motion estimator is:

(15)

  where fout [x ,n − 1] is the previously deinterlaced field. If c is a non- integer vector, i.e. a vector with sub-pixel accuracy, interpolation in  fout [x ,n − 1] will be needed to accomplish the estimation. As shown in Equation 15, the previous output frame, which is the previously de-interlaced field, is required for motion estimation and consist of both original and interpolated samples. Since the motion estimation

25 3. Motion estimation on interlaced video

y

Figure 8. Vertical- Existing samples temporal diagram c of Time-Recursive Interpolated (deinterlaced) samples motion estimation for a fractional c velocity.

n n-1 n is partly based on interpolated samples, time recursion is introduced into the motion estimation process. The Time-Recursive motion estimation process is illustrated in Figure 8.

3.3 GST based motion estimation The GST based motion estimation method described here was first introduced by Delogne et al. [10] and Vandendorpe et al. [11]. It is an advanced motion estimation algorithm based upon a generalization of the sampling theorem. It can be further studied in [12]. Motion estimation can be realized by comparing existing samples from the current field n with samples generated from the previous field n −1 and pre-previous field n − 2 . Samples from the previous field are shifted over an estimated motion vector to the current time instance, field n , resulting in a set of motion compensated samples. Furthermore, samples from the pre-previous field are shifted twice the motion vector to the current time instance, creating another set of samples. If both of these sets of samples are independent, the generalization of the sampling theorem can be applied to reconstruct the signal as if it was sampled with the frame sampling frequency (twice the field sampling frequency). Consequently, the existing samples from field n can be estimated from the samples of the previous and pre-previous field (see Figure 9). If both sets of samples are dependant, which is true for all integer-odd vertical velocities, the GST cannot be used. However, for integer-odd vertical velocities,

26 3. Motion estimation on interlaced video

y

Figure 9. Existing samples Vertical- c Motion compensated samples temporal from previous field diagram of GST based motion 2c c Motion compensated samples from pre-previous field estimation for a fractional 2c c velocity. n n-2 n-1 n y

c 2c Figure 10. Vertical- temporal diagram of c GST based motion 2c estimation for an integer-odd velocity. n n-2 n-1 n motion estimation can be done with the current field and the previous field only (see Figure 10). The main disadvantage of this method is the assumption of uniform motion over a two-field period. This assumption loses its correctness if acceleration is present in the video. Also, regions that appear or disappear due to covering and uncovering, causes unreliability of the motion vectors in those areas. The uniform motion constraint is a major drawback of this method. The matching criterion for the GST based motion estimator is defined as:

ˆ   εGST = f [x,n]− f [x,n] (16) ∑ x∈B  where the estimate ˆf[x ,n] is the shifted sample from the previous fields. For sub-pixel motion estimation, an appropriate interpolation

27 3. Motion estimation on interlaced video filter has to be applied to be able to reconstruct the existing samples from the two sets of samples. According to [12] and [13], the estimate can be expressed as the modified dual convolution sum:

(17)

where h1 and h2 are the interpolation filters with the appropriate impulse response modeling the shift due to vertical motion. δ y is the vertical motion fraction defined as:

. (18)

 Hence, δ y < 1,0. e is the nearest even integer value of the motion vector component defined as:

. (19)

Consequently, the integer component of the motion candidate vector is managed by shifting samples from previous and pre-previous fields to the current field. The fractional part of the motion candidate vector is realized with the interpolation filters h1 and h2 . For clarity, only the vertical part of the candidate vector is handled. The horizontal motion can be solved with sample rate conversion theory and is therefore set to zero. Since it is convenient to derive the filter coefficients in the z - domain, Equation (17) is transformed into:

Fˆ ( z,n) = Fz,n( −1) Hz( ) + Fz,n( − 2) Hz( ) (20) odd ( 1 2 )odd

assuming the current field is odd and (F )odd is the odd field of F . If the complete progressive frame at n − 2 , Fp ( z,n − 2) is existing, the following equation holds:

28 3. Motion estimation on interlaced video

F( z,n−2) = F( z,n − 2) . (21) odd ( p )odd

Field n −1 can then be reconstructed by shifting samples over the motion vector from frame n − 2 , applying the desired interpolator and extracting the desired field samples. Thus,

F( z,n−1) = F( z,n − 2) H( z ) (22) even ( p )even where H( z ) describes the motion over one field period and the desired interpolation in the z -domain. Similarly, field n can be reconstructed by shifting the samples from Fp ( z,n − 2) over twice the motion vector. Consequently,

2 Fodd ( z,n) = Fz,np ( − 2) HzHz( ) ( ) = Fz,np ( − 2) Hz( ) . (23) ( )odd ( )odd

Using the following characteristics:

(24)

(25)

Equation (22) can be rewritten into:

Feven ( z,n−1) = Feven ( z,n − 2) Heven ( zF)+odd ( z,n − 2) Hodd ( z ) (26) which can be expressed as:

Feven ( z,n−1) − Fodd ( z,n − 2) Hodd ( z ) Feven ( z,n −2) = (27) Heven ( z )

Also, Equation (23) can be rewritten:

29 3. Motion estimation on interlaced video

2 Fodd ( z,n) = Fodd ( z,n − 2) Heven ( zF)+even ( z,n − 2) 2 Hodd( zH) even ( z ). (28)

Substituting Equation 27 for Feven ( z,n − 2) in Equation 28 gives:

Fodd ( z,n) = HzF1( ) odd ( z,n−2) + HzF2( ) even ( z,n −1) (29) with

2 2 H1 ( z) = Heven ( z) − Hodd ( z ) and H2 ( z) = 2 Hodd ( z ). (30)

3.3.1 Applying a first-order linear interpolation The linear interpolator was selected as the interpolator method because of its price-performance ratio. The first-order linear interpolator is modeled by:

(31) and

(32) resulting in:

. (33)

1 As an example, consider a velocity of δ y = 4 pixels per field. The inverse z -transform provide the spatio-temporal expression for the estimated sample ˆf[ y,n]:

ˆ fy,n[ ]= afy1 [ +1 ,n − 1] + afy,n2 [ −2] − afy3 [ +2 ,n − 2] (34)

2 where , a2 = δ y and .

30 4 Motion compensated deinterlacing

4.1 Time-Recursive deinterlacing If a perfectly deinterlaced image and accurate motion vectors are available, sample rate conversion theory can be used to interpolate the samples to deinterlace the current field. The output of the Time- Recursive deinterlacer is defined as:

(35) and illustrated in Figure 11. If the motion candidate d is non-integer,  y interpolation in fout [x ,n − 1] is needed. As can be seen in Figure 11, the interpolated samples depend on previous original samples as well as previously interpolated samples. As a result, errors originating from an output frame can propagate into successive output frames. This is inherent to the time recursive method. This approach is thus both an advantage and disadvantage. If the motion estimation is correct, time recursion results in stability along the motion trajectory. If the motion estimation is incorrect, deinterlacing can result in an erroneously interpolated image, which can propagate to the subsequent frames.

31 4. Motion compensated deinterlacing

y

Existing samples

Interpolated (deinterlaced) Figure 11. samples Vertical-temporal diagram of Currently interpolated samples Time-Recursive deinterlacing d Interpolated sample in previously deinterlaced field for a fractional velocity. Samples used in interpolation n n-1 n 4.2 GST based deinterlacing When using the GST for deinterlacing, two independent sets of samples are used as in GST motion estimation. However, in the deinterlacing process, the existing samples from the current field n can be exploited. Hence, only one set of samples from the previous field n −1 need to be shifted to the current time instance to be able to reconstruct the current field with the frame sampling frequency. This is illustrated in Figure 12. Since only original samples of the current field and the previous field are used (no interpolated samples from the previous field) for motion estimation and deinterlacing, errors will not propagate to successive fields. The deinterlaced output is defined as:

(36)

 where falt [x ,n] is an alternative fall-back deinterlacing method for vertical velocities resulting in dependent sample sets, i.e. odd velocities. This method can be Line Averaging for example.

4.2.1 The GST interpolation filter  The GST interpolation filter, fGST [x ,n] is calculated as:

 In contrast to GST motion estimation, in GST deinterlacing, the velocities leading to a integer-odd will need a fall-back algorithm.

32 4. Motion compensated deinterlacing

y

Existing samples Figure 12. Motion compensaed samples Vertical- from previous field temporal diagram of d Currently interpolated samples GST based deinterlacing d Samples used in for a fractional interpolation velocity.

n n-2 n-1 n

   =  δ   − +  + f GST [x,n] ∑h1 k, y  f x (2k 1)uy ,n  k (37)     δ   − − −  = − ∑h2 m, y  f x e 2muy ,n 1, k,m {..., 1,0,1,2,3,...} m where

(38) and

. (39)

Again, the interpolator is assumed to calculate samples for vertical motion only. In the case of an odd field, i.e. the current field contains the odd scanning lines only, Equation (37) can be simplified into:

. (40)

 If a progressive previous frame fp [x ,n − 1] is available, the current missing field can be expressed as the convolution:

33 4. Motion compensated deinterlacing

= − − feven [ y,n] ∑ hqfyq,n[ ] p [ 1]. (41) k

In the z -domain, the missing samples can be calculated as:

(42) and

(43) which can be rewritten as:

Fodd ( z,n)− Feven ( z,n −1) Hodd ( z ) Fodd ( z,n −1) = . (44) Heven ( z )

Equation (42) and (44) results in:

(45)

(46)

(47)

(48) with

2 Hodd ( z ) Hodd ( z ) H1 ( z ) = and H2 ( z) = Heven ( z ) − . (49) Heven ( z ) Heven ( z )

34 4. Motion compensated deinterlacing

4.2.2 Applying a first-order linear interpolation The first-order linear interpolator is modeled by:

(50) and

−1 Hodd (z ) = δ y z and Heven (z ) = 1−δ y (51) resulting in:

. (52)

As an example, consider a velocity of pixels per field. Then, the deinterlaced sample can be calculated as:

2 1 3 (1 ) F z,n= 4 zF−1 z,n+ F z,n−1 − 4 zF−2 z,n −1 = even ( ) 3 odd ( ) even ( ) 3 even ( ) (53) 4 4 4

. (54)

The inverse z -transform provide the spatio-temporal expression for the missing sample feven [ y,n]:

1 3 1 f[ y,n]= f[ y,n +1 ] + f[ y,n−1] − f[ y,n +2 − 1]. (55) even 3 odd 4 even 12 even

This is visualized in Figure 13. Writing Equation 55 in a generalized form gives:

feven [ y,n]= af1 odd [ y +1 ,n] + af2 even [ y,n−1] + af3 even [ y +2 ,n − 1]. (56)

In Table 1, the calculated coefficients for a1 , a2 and a3 are shown.

35 4. Motion compensated deinterlacing

y Figure 13. Visualization

3 Existing samples of the GST d 4 deinterlacer Motion compensaed samples 1 - 1 with a first- 3 12 from previous field order linear d Currently interpolated samples interpolation and a vertical velocity of 0,25 pixels per field. n n-2 n-1 n

δ y a1 a2 a3 Table 1. 0 0 1 0 GST filter coefficients 0,25 0,333 0,750 -0,083 for different 0,50 1 0,500 -0,500 sub-pixel velocities. 0,75 3 0,250 -2,250

4.3 GST deinterlacing with Recursive GST motion estimation The main weakness of the Time-Recursive motion estimation and deinterlacer approach is the error propagation. The main weakness of the GST motion estimator is the uniform motion constraint as motion is generally not uniform. These drawbacks can be eliminated by combining these methods. The Time-Recursive motion estimator uses a previously deinterlaced field which is generated from the pre- previous deinterlaced field, and so on. This sequence can be broken by using only original samples for deinterlacing. This is realized by using the GST based deinterlacing method to calculate the previous deinterlaced field. This new combined method is called Recursive GST (RGST) motion estimation [13] and is illustrated together with the GST deinterlacer in Figure 14.

36 4. Motion compensated deinterlacing

y Existing samples Figure 14. Interpolated (deinterlaced) samples The RGST motion d Motion compensated samples from previous field estimator together c Currently interpolated samples with the GST deinterlacer. d Interpolated sample in previously deinterlaced field n n-2 n-1 n 4.4 Robust GST deinterlacing In [14], Ciuhu et al. introduced a robust deinterlacing algorithm based on the GST. For velocities resulting in an integer-odd vertical motion vector, the GST provides no solution. Furthermore, even for velocities having a value close to an integer-odd vector, the GST algorithm becomes sensitive for small inaccuracies of the estimated motion. The robust solution is based on that the GST cannot only be applied to deinterlace a video signal using the current and the previous field, but that equally well the samples could be taken from the current and the next field. Thus, Equation (37) is valid also if the motion compensated samples from field n −1 is replaced with samples from field n +1. Using both these alternatives, results in two calculated output samples that are theoretically the same (see Figure 15). If this is not the case, the motion vector can be assumed to be unreliable. However, this implies that the motion is uniform over two fields. The difference between the two samples can thus be used as a quality indicator of the interpolated pixel, allowing us to distinguish areas where a protection method will be needed. The first calculated output can be determined by:

(57)

and the second according to:

37 4. Motion compensated deinterlacing

(a) (b) Figure 15. (a) y y GST interpolation using samples from the current and the previous - d field and (b) samples from the d current and the fprev fnext - d next field. Both solutions are d theoretically the same. n n n-1 n n n+1

(58)

The quality indicator σ GST can be calculated as:

(59) which can be used to fade between the average of two outputs, in case they are considered reliable, and a fall-back option, e.g. Line Averaging, otherwise:       σ + + σ  −  +  +   GST (f prev [x,n] f next [x,n]) LA (f x uy ,n  f x uy ,n ) = f GST [x,n] (60) 2(σ GST + σLA ) where

. (61)

4.5 Two-dimensional extension To utilize the Time-Recursive motion estimator and GST deinterlacer in a practical application, these discussed methods have to be

38 4. Motion compensated deinterlacing extended to handle horizontal motion. This is discussed by Ciuhu et al. in [14]. Extending the Time-Recursive motion estimator into two dimensions is straightforward. Because the previous field is deinterlaced, the linear interpolation in the horizontal direction (due to sub-pixel accuracy) can be extended to bilinear interpolation to calculate the sample value at a position in the frame. Furthermore, the GST deinterlacer can be extended into two dimensions without any severe effort. The vertical GST interpolation filter is restricted to vertically surrounding samples. Thus, the samples located at a position which the filter cannot handle (i.e. a horizontally sub-pixel position) need to be adjusted to a horizontal integer position. This is done through a horizontal interpolation filter, which calculates the sample at the horizontal integer position from the motion compensated samples from the previous field (see Figure 16). Thus, first the horizontal motion is handled by interpolation in the horizontal dimension, then the vertical motion is handled by applying the vertical GST interpolator. The GST interpolator is therefore not a two-dimensional filter, but can handle motion in two dimensions.

y

Existing samples

Currently interpolated samples Horizontally interpolated samples Motion compensated samples from previous field

x n-1

n n Figure 16. Extension of the GST deinterlacer into two dimensions.

39 40 5 Results

5.1 Evaluation of the RGST motion estimator and GST deinterlacer

5.1.1 Performance on artificial test sequences During the analysis and implementation of the described algorithms, artificially created test sequences were used. These test sequences contained cutouts from still images moving with a specified velocity and an artificially generated interlaced scanning. One frame of the cutout test sequences from the image Boat (before interlacing) can be seen in Figure 17. Because the motion vectors were known in advance, the correctness of the motion estimator could be evaluated. This was a necessary relief during implementation. As a measure on the precision of the estimated motion vectors, the MSE when the

Figure 17. A cutout from the image Boat used as a test sequence for evaluation.

41 5. Results

12 Figure 18. MSE 10 chart for the RGST deinterlacer in 8 conjunction with the GST deinterlacer, for 6 an artificially created test sequence MSE per sample 4 moving vertically with specified 2 motion.

0 0 0.25 0.5 0.75 1 1.25 1.5 1.75 2 Specified velocity (pixels per field) motion estimator was used in conjunction with the deinterlacer was studied. The MSE is defined similar to Equation (3):

1   2 εMSE = (f out [x,n]− f [x,n]) (62) ∑ P (MN ) x∈W Wn n where M and N is the vertical and horizontal size of the measurement window Wn in pixels and P is the number of frames in the test sequence. The MSE is thus an average measured over the pixels in the measurement window and frames in the sequence. The RGST motion estimator was evaluated with the Robust GST deinterlacer and the Variable 3-D Recursive-Search Block-Matcher. The evaluation was performed on the 25 fields test sequence Boat with different velocities. The results of the analysis can be seen in Figure 18. As can be seen in Figure 18, the MSE increases as the velocity approaches the integer-odd velocity dy = 1. Looking at the coefficients for in Table 1 reveals that the samples near an odd-integer velocity becomes significantly boosted. This is also noted in [13] where the frequency responses of the four different filters are shown. This boosting characteristic of the GST filter causes serious degradation of the image if the motion vector is incorrect. As a consequence, the motion estimation according to the Time-Recursive method is tougher to accomplish. Although error propagation is not possible through the GST deinterlacer, errors in the deinterlaced image influences the behavior of the motion estimator. Therefore, convergence to the true motion vectors is slower. In addition, this is confirmed when analyzing Figure 19. In this figure the MSE of a true

42 5. Results

Car 16 14 12 10 8 6

MSE per frame 4 2 0 0 10 20 30 40 50 60 70 80 90 100 Frame Median Motion 2 1.5 1 0.5 0 −0.5 −1 −1.5 −2 0 10 20 30 40 50 60 70 80 90 100 Frame Vertical Median Motion (pixels/field)

Figure 19. MSE for the Car sequence aligned with the estimated vertical median motion. video sequence, Car, is aligned side-by-side with the median of the estimated vertical motion component of the same sequence. As can be observed, the MSE increases when the velocity is in the region of dy = 1. A solution to this problem is discussed in Section 6.1.2.

5.1.2 Performance on true video sequences When the RGST motion estimator had been tested on artificially created test sequences and showed satisfying results, the RGST motion estimator and GST deinterlacer was tested on true video sequences. Here, the results from four of these sequences are shown. Images from these four sequences can be seen in Figure 20, 21, 22 and 23. In the Car sequence, the camera starts panning when tracking a racing car driving on a track. In the IKEA sequence, the camera is fixed and a bed in the centre of the image is rotating. In the Underwater sequence, a man is moving slowly while the camera is panning slightly. In the New York sequence, the camera is flying over

43 5. Results

Figure 20. Image from Car test sequence.

Figure 21. Image from IKEA test sequence.

Figure 22. Image from Underwater Figure 23. Image from NewYork test test sequence. sequence. the city, tracking a building. As a reference, the simple Line Average deinterlacing method is also showed for each sequence. The MSE comparison is shown in Figure 24, 25, 26 and 27. As can be seen in the figures, the motion compensated GST filter outperforms the spatial Linear Average filter in most cases. In the Underwater sequence however, both methods performs equal. The assumed explanation to this is that the Underwater sequence does not contain any significant amount of temporal alias (due to mainly small velocities). Thus, it cannot take advantage of the enhanced precision offered by the GST interpolator. Averaging the MSE’s over

 The performance comparison would probably be more fair if the results from other motion compensated methods was presented as well. However, since the Matlab implementation for other motion compensated methods are hard to get hold of, this was not possible. Neither, the sequences used in the references could be obtained and the results from the analysis could thus not be compared to the implementations in the references.

44 5. Results

35 Linear Average deinterlace GST deinterlace

30

25

20

15 MSE per frame

10 Figure 24. GST and Linear Average MSE comparison, Car sequence.

5

0 0 10 20 30 40 50 60 70 80 90 100 Frame

35 Linear Average deinterlace GST deinterlace

30

25

20

15 MSE per frame Figure 25. GST and Linear Average 10 MSE comparison, IKEA sequence.

5

0 0 10 20 30 40 50 60 70 80 90 100 Frame

35 Linear Average deinterlace GST deinterlace

30

25

20

15 MSE per frame

10 Figure 26. GST and Linear Average MSE comparison, Underwater

5 sequence.

0 0 10 20 30 40 50 60 70 80 90 100 Frame

45 5. Results

100 Linear Average deinterlace GST deinterlace 90

80

70

60

50

MSE per frame 40

30 Figure 27. GST and Linear Average MSE comparison, NewYork 20 sequence. 10

0 0 10 20 30 40 50 60 70 80 90 100 Frame the frames in the sequences and summing these MSE’s, gives the following values:

Total MSE RGST 51,83 Total MSE Line Average 97,65

Table 2. Sum of the secuences MSE’s averaged over the frames.

5.1.3 Subjective analysis Nevertheless, video quality is a subjective matter and too much confidence should not be put in the MSE. Thus, some visualizations of the deinterlaced images will be shown here. On the bonnet of the car in Figure 28, the artifacts from GST deinterlacing can be seen. This artifact is due to incorrectly estimated motion vectors. The artifact becomes even more obvious if the estimation results in a boosted velocity. In Figure 29, the GST deinterlacer is compared to the Line Average deinterlacer with respect to aliasing. The high frequency components on the edge of the bed are almost free from aliasing with the GST deinterlacer. When using the Linear Average method however, the aliasing becomes obvious.

46 5. Results

Figure 28. Artifact of the GST deinterlacer resulting from incorrect motion estimation in the Car sequence.

(a) (b)

Figure 29. Comparison between the (a) GST deinterlacer and (b) the Linear Average deinterlacer on the IKEA sequence with respect to aliasing.

47 48 6 Discussion

6.1 Problems and improvements

6.1.1 Edge difficulties For macro blocks at the edge of the horizontal-vertical sample lattice, neighbor macro block exists may be absent. Furthermore, a vector pointing from outside the sample lattice will result in a incorrect GST interpolation. In the best case, the deinterlacing will be salvaged by the fall-back deinterlacing algorithm. However, the estimated motion vectors in these areas can easily be incorrectly estimated which should be respected. One solution might be a weighting of the motion vectors depending on its position in the sample lattice.

6.1.2 Singularity problem In [13], Bellers et al. proposes a solution for the earlier noticed problem with boosted samples as velocities approaching the singularity dy = 1. The impact of especially the vertical-temporal filters that boosts samples significantly can be reduced by combining the GST deinterlacer with the Time-Recursive deinterlacer. As shown in Figure 30, small pixel distances between the motion compensated sample from the previous field and the existing sample in the current field can be avoided by partly compensating for the motion according to the Time-Recursive deinterlacing method and partly by using the GST deinterlacing method. Results from [13] show clearly better values using this mixed approach. Nevertheless, note that error propagation using this method is not completely removed.

49 6. Discussion

(a) (b) y y

Existing samples d d GST Interpolated (deinterlaced) samples d d n n n-1 n n-1 n Figure 30. (a) GST deinterlacing for a boosted velocity. (b) Mixed GST and Time- Recursive deinterlacing.

6.1.3 Improved fall-back method The fall-back method for the odd-integer singularity velocities used here is the Linear Average algorithm. However, since this method introduces severe aliasing in the deinterlaced image, a different fall- back method should be considered.

6.1.4 Improved interpolation In the GST deinterlacer, a first-order linear interpolator is applied. This is a good price-performance interpolator. Yet, to achieve an even better result, another interpolator could be used.

6.1.5 Further robustness To eliminate the GST deinterlacing artifacts due to incorrect motion estimation shown in Section 5.1.3, further robustness in the motion estimator has to be accomplished. In [15], a solution is presented based on a combination between the Time-Recursive motion estimation algorithm and the GST motion estimation algorithm.

6.1.6 Two dimensional GST interpolation As seen in Figure 31, the samples contributing to the interpolation is not optimal. The circle marks a region of two pixels distance from the interpolated pixel. In this region, there are pixels not being used in

50 6. Discussion

y

Existing samples Motion compensated samples from previous field Currently interpolated samples

x

Figure 31. The distribution of the pixels used in the GST interpolation filter for a fractional motion in both vertical and horizontal direction. the interpolation and is therefore sub-optimal. In [13], an improved inseparable 2-D GST filter is introduced, where the interpolation is done in both vertical and horizontal direction at once. This deinterlacer can also help the motion estimator estimate the correct motion vectors for diagonal motion.

6.1.7 Chrominance deinterlacing Here, the motion estimation and deinterlacing was performed entirely on the intensity component of the video signal. To achieve an output with color representation, the chroma components also has to be processed. Because the chroma components are often sub-sampled, this could be a non straightforward task. This is a subject of further studies.

51 52 7 References

[1] R. C. Ballard, “Television System”, U.S. Patent 2152234, July 19, 1932. [2] J. H. McClellan, R. W. Schafer and M. A. Yoder, Signal Processing First, Pearson Prentice Hall, 2003, ISBN no. 0-13-120265-0. [3] P. D. Symes, Video Compression, McGraw-Hill, 1998, ISBN no. 0-07-063344-4. [4] J. L. Yen, “On Nonuniform Sampling of Bandwidth-Limited Signals”, IRE Transactions on Circuit Theory, vol. 3, issue 4, pp. 251-257, December 1956, ISSN no. 0098-4094. [5] G. de Haan and E. B. Bellers, “Deinterlacing - An Overview”, in Proceedings of the IEEE, vol. 86, issue 9, pp. 1839-1857, September 1998, ISSN no. 0018-9219. [6] E. B. Bellers and G. de Haan, “Advanced de-interlacing techniques”, in Proceedings of ProRISC/IEEE Workshop on Circuits, Systems and Signal Processing, Mierlo, The Netherlands, pp. 7-17, November 1996. [7] M. Zhao and G. de Haan, “Subjective evaluation of de- interlacing techniques”, in Proceedings of SPIE - Image and Video Communications and Processing 2005, San Jose, USA, vol. 5685, issue 2, pp. 683-691, International Society for Optical Engineering, March 2005, ISSN no. 0277-786X.

53 7. References

[8] G. de Haan and P. W. A. C. Biezen, “Sub-pixel motion estimation with 3-D recursive search block-matching”, in Signal Processing: Image Communication 6, vol. 6, issue 3, pp. 229-239, Elsevier Science Publishers B.V., June 1994, ISSN no. 0923-5965. [9] F. M. Wang, D. Anastassiou and A. N. Netravali, “Time-Recursive Deinterlacing for IDTV and Pyramid Coding”, in Signal Processing: Image Communications 2, Elsevier Science Publishers B.V., vol. 2, issue 3, pp. 365-374, October 1990, ISSN no. 0923-5965. [10] P. Delogne, L. Cuvelier, B. Maison, B. Van Caillie and L. Vandendorpe, “Improved Interpolation, Motion Estimation and Compensation for Interlaced Pictures”, in IEEE Transactions on Image Processing, vol. 3, issue 5, pp. 482-491, September 1994, ISSN no. 1057-7149. [11] L. Vandendorp, L. Cuvelier, B. Maison, P. Quelez and P. Delogne, “Motion-compensated conversion from interlaced to progressive formats”, in Signal Processing: Image Communication 6, vol. 6, issue 3, pp. 193-211, Elsevier Science Publishers B.V., June 1994, ISSN no. 0923-5965. [12] G. de Haan and E. B. Bellers, “Advanced motion estimation and motion compensated de-interlacing”, Journal of the SMPTE, vol. 106, issue 11, pp. 777-786, November 1997. [13] E. B. Bellers and G. de Haan, “New Algorithm for Motion Estimation on Interlaced Video”, in Proceedings of SPIE - Visual Communication and Image Processing 1998, San Jose, USA, vol. 3309, issue 1, pp. 111-121, The International Society for Optical Engineering, January 1998, ISSN no. 0277-786X. [14] C. Ciuhu and G. de Haan, “A two-dimensional generalized sampling theory and application to de-interlacing”, in Proceedings of SPIE - Visual Communication and Image Processing 2004, vol. 5308, pp. 700-711, The International Society for Optical Engineering, January 2004, ISBN no. 9780819452115.

54 7. References

[15] C. Ciuhu and G. de Haan, “Motion Estimation on Interlaced Video”, in Proceedings of SPIE - The International Society for Optical Engineering, San Jose, USA, vol. 5685, issue 2, pp. 718- 729, International Society for Optical Engineering, January 2005, ISSN no. 0277-786X.

55