On Computational Complexity of Motion Estimation Algorithms in MPEG-4 Encoder

Total Page:16

File Type:pdf, Size:1020Kb

Load more

On Computational Complexity of Motion Estimation Algorithms in
MPEG-4 Encoder

Muhammad Shahid
This thesis report is presented as a part of degree of
Master of Science in Electrical Engineering

Blekinge Institute of Technology, 2010

Supervisor: Tech Lic. Andreas Rossholm, ST-Ericsson Examiner: Dr. Benny Lovstrom, Blekinge Institute of Technology

Abstract

Video Encoding in mobile equipments is a computationally demanding feature that requires a well designed and well developed algorithm. The optimal solution requires a trade off in the encoding process, e.g. motion estimation with tradeoff between low complexity versus high perceptual quality and efficiency. The present thesis works on reducing the complexity of motion estimation algorithms used for MPEG-4 video encoding taking SLIMPEG motion estimation algorithm as reference. The inherent properties of video like spatial and temporal correlation have been exploited to test new techniques of motion estimation. Four motion estimation algorithms have been proposed. The computational complexity and encoding quality have been evaluated. The resulting encoded video quality has been compared against the standard Full Search algorithm. At the same time, reduction in computational complexity of the improved algorithm is compared against SLIMPEG which is already about 99 % more efficient than Full Search in terms of computational complexity. The fourth proposed algorithm, Adaptive SAD Control, offers a mechanism of choosing trade off between computational complexity and encoding quality in a dynamic way.

Acknowledgements

It is a matter of great pleasure to express my deepest gratitude to my ad-

  • visors Dr. Benny Lovstrom and Andreas Rossholm for all their guidance,
  • ¨
  • ¨

support and encouragement throughout my thesis work. It was nonetheless a great opportunity to do research work at ST-Ericsson under the marvelous supervision of Andreas Rossholm. The counseling provided by Benny Lovstrom was of great value for me in writing up this manuscript.

  • ¨
  • ¨

I can’t forget mentioning the comfort I received from Fredrik Nillson and Jimmy Rubin of ST-E in setting up the working environment and start up of ST-E algorithm. I owe my successes in life so far to all of my family members, for their magnificent kindness and love!

iii

Contents

  • Abstract
  • i

iii v
Acknowledgements Contents

  • 1 Introduction
  • 1

  • 2 Basics of Digital Video
  • 3

344455566
2.1 Color Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Video Quality . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Representation of Digital Video . . . . . . . . . . . . . . . . 2.4 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4.1 Internet . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.2 Video Storage . . . . . . . . . . . . . . . . . . . . . . 2.4.3 Television . . . . . . . . . . . . . . . . . . . . . . . . 2.4.4 Games and Entertainment . . . . . . . . . . . . . . . 2.4.5 Video Telephony . . . . . . . . . . . . . . . . . . . .

  • 3 Video Compression Fundamentals
  • 7

789
3.1 CODEC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 A Video CODEC . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Video Coding Standards . . . . . . . . . . . . . . . . . . . .
3.3.1 MPEG-1 . . . . . . . . . . . . . . . . . . . . . . . . 10 3.3.2 MPEG-2 . . . . . . . . . . . . . . . . . . . . . . . . 10 3.3.3 MPEG-4 . . . . . . . . . . . . . . . . . . . . . . . . 10 3.3.4 MPEG-7 . . . . . . . . . . . . . . . . . . . . . . . . 10 3.3.5 MPEG-21 . . . . . . . . . . . . . . . . . . . . . . . . 10 3.3.6 H.261 . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.3.7 H.263 . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.3.8 H.263 + . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.3.9 H.264 . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.4 MPEG-4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

v
Contents
3.5 Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

4 Motion Estimation and its Implementation 15

4.1 Block Matching . . . . . . . . . . . . . . . . . . . . . . . . . 17 4.2 Motion Estimation Algorithms . . . . . . . . . . . . . . . . 18
4.2.1 Full Search . . . . . . . . . . . . . . . . . . . . . . . 19 4.2.2 Three-Step Search . . . . . . . . . . . . . . . . . . . 20 4.2.3 Diamond Search . . . . . . . . . . . . . . . . . . . . 20 4.2.4 SLIMPEG . . . . . . . . . . . . . . . . . . . . . . . . 21

5 Rate Distortion Optimization and Bjontegaard Delta PSNR 23

5.1 Measurement of Distortion . . . . . . . . . . . . . . . . . . 24 5.2 Bjontegaard Delta PSNR . . . . . . . . . . . . . . . . . . . 24

  • 6 Simulation, Results and Discussion
  • 29

6.1 SAD as a Comparison Metric . . . . . . . . . . . . . . . . . 30 6.2 Proposed Techniques . . . . . . . . . . . . . . . . . . . . . . 30
6.2.1 Spatial Correlation Algorithm . . . . . . . . . . . . . 31 6.2.2 Temporal Correlation Algorithm . . . . . . . . . . . 31 6.2.3 Adaptive SAD Control . . . . . . . . . . . . . . . . . 32
6.3 Simulations with different video sequences . . . . . . . . . . 34
6.3.1 Football Sequence . . . . . . . . . . . . . . . . . . . 35 6.3.2 Foreman Sequence . . . . . . . . . . . . . . . . . . . 36 6.3.3 Claire Sequence . . . . . . . . . . . . . . . . . . . . . 40

7 Conclusion and Future Work List of figures
51 54 55 57
List of tables Bibliography

vi

Chapter 1

Introduction

Since the advent of the first digital video coding technology standard in 1984 by the International Telecommunication Union (ITU), the technology has seen a great progress. The two main standard setting bodies in this regard are ITU and International Organization for Standardization (ISO). Recommendations of ITU include the standards like H 261/262/263/264 and these focus on applications in the area of telecommunication. Motion Pictures Experts Group (MPEG) of ISO has released standards like MPEG- 1/-2/-4 which focus the applications in computer and consumer electronics area. The standards defined by both of these groups have some parts in common and also some work has been performed as a joint venture. The field of video compression has been continuously developing with the enhancements in the previous versions of the standards and introduction of new recommendations. MPEG-4 standard is followed in this thesis work. It can be easily said that video compression is a top requirement in any multimedia storage and transmission phenomenon with encoding the video in various forms before sending or storing it and then decoding it subsequently at the receiver end or when viewing it. Besides the presence of digital video in television and CD/DVD etc, cellular phones will probably be the next high use place of video content. The limited storage capacity of mobile equipments dictates the requirement of efficient video compression tools. Video encoding in mobile equipments has developed from a highend feature to something that is taken for granted. Nevertheless, it is a computationally demanding feature that requires well designed and well developed algorithms. Many different algorithms need to be evaluated in order to come closer to the optimal solution. As early as 1929, Ray Davis Kell described a form of video compression for which he obtained a patent [1]. Given the fact that a video is actually a series of pictures transmitted at some designated speed between successive images, Rays patent gave rise to the idea of transmitting the difference between the successive images instead of sending the whole image. However,

1
Chapter 1. Introduction it took ages to get the idea implemented into reality but still it is a keystone of many video compression standards today. Connected to this idea, there comes the concept of motion estimation which tries to exploit the presence of temporal correlation at different positions between the video frames. It predicts the motion found in the current frame using already encoded frames. Henceforth, the residual frame contains much less energy than the actual frame. Motion vectors and the residual frame are encoded by a bit rate much lesser than the bit rate required to encode a regular frame. Motion estimation may require tremendous amount of computational work inside the video coding process. There are certain algorithms employed for doing motion estimation. The basic class of these is called Full Search Algorithms and it gives optimal performance but computationally very time consuming. To deal with this computation issue, many sub optimal fast search algorithms have been designed and this thesis will focus on some of them in a try to improve performance of one of them. The SLIMPEG motion estimation algorithm is taken as reference here and inherent video properties like spatial and temporal correlation has been applied to devise techniques in a try to achieve less complex yet performance oriented motion estimation algorithms. The rest of the report is organized as: Chapter 2 and chapter 3 deal with fundamentals of digital video and video compression respectively. Implementation aspects of motion estimation have been explored in chapter 4 ending with the introduction of SLIMPEG motion estimation algorithm. Rate distortion and delta PSNR are the contents of chapter 5. Results of the main contribution have been provided in chapter 6 with their description. Chapter 7 contains conclusion and some hints about future work in the field.

2

Chapter 2

Basics of Digital Video

A video image is obtained by capturing the 2D plane view of a 3D scene. Digital video is then spatial and temporal sampled frames presented in a sequence. The spatio-temporal sampling unit which is usually called pixel (picture element) can be represented by a digital value to describe its color and brightness. The more sampling points taken to form the video frame the higher is usually the visual quality but requiring high storage capacity. The video frame is usually formed in a rectangular shape. The smoothness of a video is determined by the rate at which its frames are presented in a succession. A video comprising a frame rate of thirty frames per second looks fairly smooth enough for most purposes. A general comparison of appearance of a video determined by its frame rate is given in the table 2.1[2].

Table 2.1: Video frame rates.[2]

  • Video frame rates.
  • Appearance

Below 10 frames per second ’Jerky’, unnatural appearance to movement

  • 10-20 frames per second
  • Slow movement appears OK;

rapid movement is clearly jerky Movement is reasonably smooth
Movement is very smooth
20-30 frames per second 50-60 frames per second

2.1 Color Spaces

The pixel may be represented by just one number (grey scale image) or by multiple numbers (colored image). A particular scheme used for rep-

3
Chapter 2. Basics of Digital Video resenting colors is called color space. Two of the most common schemes are known as RGB (red/green/blue) and YCrCb (luminance/red chrominance/blue chrominance). In the RGB color space, each pixel is represented by three numbers indicating the relative proportion of the three colors. Each of the numbers is usually formed by eight bits. So, one pixel requires twenty four bits for its complete representation. It has been observed from psycho visual experiments that human optical system is less sensitive to color than luminance. This fact is exploited in YCrCb color space where luminance is concentrated in only one of its components Y and color information is contained in the rest of the components. There is a relationship between both color space schemes where one representation can be transformed into another. For details on this, please see [2].

2.2 Video Quality

The video quality is an important parameter and is a subjective issue, by its nature of being judged by human. There are many objective criteria for measuring video quality which can give results with some correlation to human experience e.g., PSNR. However, they may not make up satisfactorily to the demand of subjective experience of a human observer. Experiments show that a picture with lower PSNR may look visually better than with a higher value of PSNR. It is to be noted that human visual experience may vary also from person to person and brings up the need of such alternatives which could be thought of as covering the need of both objective and subjective tests. An objective test which matches best with the human visual experience will give acceptable results.

2.3 Representation of Digital Video

Before the video is ready for coding, it is often transformed to one of the Intermediate Formats. The central out of them is common intermediate format, CIF, where a frame size resolution is 253 X 288 pixels. Table 2.2 gives information about some standard common intermediate formats.

2.4 Applications

There has been an exponential growth in applications of digital video and the technology has got the capacity to emerge rapidly. Some examples of

4
Chapter 2. Basics of Digital Video

Table 2.2: Intermediate formats. [2]

  • Format
  • Luminance resolution(horz. X vert.)

Sub-QCIF Quarter CIF(QCIF) CIF
128 X 96
176 X 144 352 X 288

  • 704 X 576
  • 4 CIF

widely used digital video applications are given in the following subsections.

2.4.1 Internet

It can be safely said that current era of internet holds the most of digital video applications ranging from a small video clip to wholesome of movies, from a small video chat to a corporate video conference and so on. Remote teaching/learning, video telephony and sharing videos has been made possible by the digital video. The state of the art video broadcasting phenomenon YouTube presents billions of videos to its viewers worldwide by using benefits of digital video technology.

2.4.2 Video Storage

Digital video has reshaped the way of storing videos. CD/DVD ROM and Blu-ray Disc have almost wiped out the classic film tape media storage devices. These new storage discs come with huge advantages of capacity, portability and durability. The latest of them is Blu-ray Disc and it has storage capacity as much as 50 GB in single layer and upto 100 GB in dual layer[3].

2.4.3 Television

Satellite television channels across the planet create an entire new world of global village by the virtue of digital video. Literally, there are thousands of television channels operating in various areas of the world and the number is yet increasing. News, current affair shows and popular drama serials gather a huge count of viewers.

5
Chapter 2. Basics of Digital Video

2.4.4 Games and Entertainment

The heavy video games and movies have gained enormous popularity and these are again an applications of digital video. Now a days, we see an increasing trend of popularity of 3D animation movies which is a big success of digital video. Take the example of ’Avatar’, a block buster 3D flick which is possibly the best ever liked movie of the current era.

2.4.5 Video Telephony

The digital video has played an important role in getting a video along with voice while communicating on telephone. On government and private levels, video conferencing is replacing the need of traveling far away for attending meetings at one place. Skype is probably the brand leader in this field.

6

Chapter 3

Video Compression Fundamentals

It has been observed that the size of an ordinary digitized video signal is far greater than usual storage capacity and transmission media bandwidth. This fact shows the need of systems capable of compressing the video. For the sake of example, a channel of ITU-R 601 television (with 30 fps) requires media having bit rate of 216 Mbps for broadcasting in its uncompressed form. A 4.7 Gb DVD can store only 87 seconds of uncompressed video at this bit rate. This implies that there is a clear need of such mechanism which can make this data fit to be able of storing or transmitting having limited capacities. Hence comes the compression but with drawback of some quality loss in visual experience. An effective compression system is, in general, lossy in nature.

3.1 CODEC

The term CODEC represents a combination of two systems capable of encoding (compressing) and decoding (decompressing). A typical codec is shown in figure 3.1. The encoder compresses the original signal and the process is called source coding. After some more signal processing the signal reaches the point of decompression at source decoder.

According to information theory, there is statistical redundancy in an ordinary data signal. This principle has been utilized in Huffman coding and such kind of CODEC is known as entropy CODEC. However, the entropy encoders do not perform well in case of images and videos. There is need of deploying source models before entropy coding can applied on such data. There are some properties present in video which are taken into considera-

7
Chapter 3. Video Compression Fundamentals

Figure 3.1: Source coder,channel coder,channel[2].

tion to be benefited in source models. These properties include the spatial and temporal redundancy present amongst pixels in video frames. Moreover, psycho visual experiments have shown that human visual system is more particular about lower frequencies. So, in encoding process for video, some high frequencies can be safely ignored. Codecs are often designed to emphasize certain aspects of the media, or their use, to be encoded. For example, a digital video (using a DV codec) of a sports event, such as baseball or soccer, needs to encode motion well but not necessarily exact colors, while a video of an art exhibit needs to perform well encoding color and surface texture. Pertaining to video quality, there can be two kinds of codecs. In order to achieve good level of compression, most of the codecs degrade the original quality of the signal and are known as lossy codecs. There are also codecs which preserve the original quality of the signal and are known as lossless codecs [10]. Some examples of coding techniques are presented here. In Differential Pulse Code Modulation (DPCM) coding technique, pixels are sent as prediction of already dispatched pixels. Next step is to transmit the prediction error which is actually difference of prediction from actual pixel. Transform coding changes the domain of the frame signal. This change is helpful in rounding off insignificant coefficients and a lossy compression is achieved. The transform coding has got a great deal of application in various video compression techniques. Another technique is motion compensated predictive coding which is the emphasis of this thesis. In a similar way as that of DPCM, a model of actual frame belonging to a video is obtained by prediction based on already encoded frame. This model is then subtracted from the original frame to obtain residual frame which contains much less energy as compared to its original frame [2].

3.2 A Video CODEC

Video signals are constructed by a sequence of still images which are better known as video frames. These frames can be encoded for compression

8
Chapter 3. Video Compression Fundamentals

Figure 3.2: Video CODEC With Prediction[2].

using intra frame coding techniques but this compression does not turn out to be of enough good value for a video. This fact and presence of temporal redundancy inside the video sequence drives the need of inter frame encoding. A prediction of actual video frame based on previous frame is subtracted from actual frame to form what is called residual frame. The residual frame is then encoded by frame codec. A block diagram of such video coder is in figure 3.2. The process of encoding the residual frame includes its transformation. The transform coefficients are quantized and then entropy coding is applied for transmission or storage. At the decoder end, revert operation of these steps are applied to get the original data back [2].

Recommended publications
  • 1. in the New Document Dialog Box, on the General Tab, for Type, Choose Flash Document, Then Click______

    1. in the New Document Dialog Box, on the General Tab, for Type, Choose Flash Document, Then Click______

    1. In the New Document dialog box, on the General tab, for type, choose Flash Document, then click____________. 1. Tab 2. Ok 3. Delete 4. Save 2. Specify the export ________ for classes in the movie. 1. Frame 2. Class 3. Loading 4. Main 3. To help manage the files in a large application, flash MX professional 2004 supports the concept of _________. 1. Files 2. Projects 3. Flash 4. Player 4. In the AppName directory, create a subdirectory named_______. 1. Source 2. Flash documents 3. Source code 4. Classes 5. In Flash, most applications are _________ and include __________ user interfaces. 1. Visual, Graphical 2. Visual, Flash 3. Graphical, Flash 4. Visual, AppName 6. Test locally by opening ________ in your web browser. 1. AppName.fla 2. AppName.html 3. AppName.swf 4. AppName 7. The AppName directory will contain everything in our project, including _________ and _____________ . 1. Source code, Final input 2. Input, Output 3. Source code, Final output 4. Source code, everything 8. In the AppName directory, create a subdirectory named_______. 1. Compiled application 2. Deploy 3. Final output 4. Source code 9. Every Flash application must include at least one ______________. 1. Flash document 2. AppName 3. Deploy 4. Source 10. In the AppName/Source directory, create a subdirectory named __________. 1. Source 2. Com 3. Some domain 4. AppName 11. In the AppName/Source/Com directory, create a sub directory named ______ 1. Some domain 2. Com 3. AppName 4. Source 12. A project is group of related _________ that can be managed via the project panel in the flash.
  • An Overview of Block Matching Algorithms for Motion Vector Estimation

    An Overview of Block Matching Algorithms for Motion Vector Estimation

    Proceedings of the Second International Conference on Research in DOI: 10.15439/2017R85 Intelligent and Computing in Engineering pp. 217–222 ACSIS, Vol. 10 ISSN 2300-5963 An Overview of Block Matching Algorithms for Motion Vector Estimation Sonam T. Khawase1, Shailesh D. Kamble2, Nileshsingh V. Thakur3, Akshay S. Patharkar4 1PG Scholar, Computer Science & Engineering, Yeshwantrao Chavan College of Engineering, India 2Computer Science & Engineering, Yeshwantrao Chavan College of Engineering, India 3Computer Science & Engineering, Prof Ram Meghe College of Engineering & Management, India 4Computer Technology, K.D.K. College of Engineering, India [email protected], [email protected],[email protected], [email protected] Abstract–In video compression technique, motion estimation is one of the key components because of its high computation complexity involves in finding the motion vectors (MV) between the frames. The purpose of motion estimation is to reduce the storage space, bandwidth and transmission cost for transmission of video in many multimedia service applications by reducing the temporal redundancies while maintaining a good quality of the video. There are many motion estimation algorithms, but there is a trade-off between algorithms accuracy and speed. Among all of these, block-based motion estimation algorithms are most robust and versatile. In motion estimation, a variety of fast block based matching algorithms has been proposed to address the issues such as reducing the number of search/checkpoints, computational cost, and complexities etc. Due to its simplicity, the Fig. 1. Classification of Frames block-based technique is most popular. Motion estimation is only known for video coding process but for solving real life redundancies.
  • Motion Estimation at the Decoder Sven Klomp and Jorn¨ Ostermann Leibniz Universit¨At Hannover Germany

    Motion Estimation at the Decoder Sven Klomp and Jorn¨ Ostermann Leibniz Universit¨At Hannover Germany

    5 Motion Estimation at the Decoder Sven Klomp and Jorn¨ Ostermann Leibniz Universit¨at Hannover Germany 1. Introduction All existing video coding standards, such as MPEG-1,2,4 or ITU-T H.26x, perform motion estimation at the encoder in order to exploit temporal dependencies within the video sequence. The estimated motion vectors are transmitted and used by the decoder to assemble a prediction of the current frame. Since only the prediction error and the motion information are transmitted, instead of intra coding the pixel values, compression is achieved. Due to block-based motion estimation, accurate compensation at object borders can only be achieved with small block sizes. Large blocks may contain several objects which move in different directions. Thus, accurate motion estimation and compensation is not possible, as shown by Klomp et al. (2010a) using prediction error variance, and small block sizes are favourable. However, the smaller the block, the more motion vectors have to be transmitted, resulting in a contradiction to bit rate reduction. 4.0 3.5 3.0 2.5 MV Backward MV Forward MV Bidirectional 2.0 RS Backward Rate (bit/pixel) RS Forward 1.5 RS Bidirectional 1.0 0.5 0.0 2 4 8 16 32 Block Size Fig. 1. Data rates of residual (RS) and motion vectors (MV) for different motion compensation techniques (Kimono sequence). 782 Effective Video Coding for MultimediaVideo Applications Coding These characteristics can be observed in Figure 1, where the rates for the residual and the motion vectors are plotted for different block sizes and three prediction techniques.
  • Motion Vector Forecast and Mapping (MV-Fmap) Method for Entropy Coding Based Video Coders Julien Le Tanou, Jean-Marc Thiesse, Joël Jung, Marc Antonini

    Motion Vector Forecast and Mapping (MV-Fmap) Method for Entropy Coding Based Video Coders Julien Le Tanou, Jean-Marc Thiesse, Joël Jung, Marc Antonini

    Motion Vector Forecast and Mapping (MV-FMap) Method for Entropy Coding based Video Coders Julien Le Tanou, Jean-Marc Thiesse, Joël Jung, Marc Antonini To cite this version: Julien Le Tanou, Jean-Marc Thiesse, Joël Jung, Marc Antonini. Motion Vector Forecast and Mapping (MV-FMap) Method for Entropy Coding based Video Coders. MMSP’10 2010 IEEE International Workshop on Multimedia Signal Processing, Oct 2010, Saint Malo, France. pp.206. hal-00531819 HAL Id: hal-00531819 https://hal.archives-ouvertes.fr/hal-00531819 Submitted on 3 Nov 2010 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Motion Vector Forecast and Mapping (MV-FMap) Method for Entropy Coding based Video Coders Julien Le Tanou #1, Jean-Marc Thiesse #2, Joël Jung #3, Marc Antonini ∗4 # Orange Labs 38 rue du G. Leclerc, 92794 Issy les Moulineaux, France 1 [email protected] {2 jeanmarc.thiesse,3 joelb.jung}@orange-ftgroup.com ∗ I3S Lab. University of Nice-Sophia Antipolis/CNRS 2000 route des Lucioles, 06903 Sophia Antipolis, France 4 [email protected] Abstract—Since the finalization of the H.264/AVC standard between motion vectors of neighboring frames and blocks, we and in order to meet the target set by both ITU-T and MPEG propose in this paper a method for motion vector coding based to define a new standard that reaches 50% bit rate reduction on a motion vector residuals forecast followed by an adaptive compared to H.264/AVC, many tools have efficiently improved the texture coding and the motion compensation accuracy.
  • 11.2 Motion Estimation and Motion Compensation 421

    11.2 Motion Estimation and Motion Compensation 421

    11.2 Motion Estimation and Motion Compensation 421 vertical component to the enhancement filter, making the overall filter separable with 3 3 support. × 11.2 MOTION ESTIMATION AND MOTION COMPENSATION Motion compensation (MC) is very useful in video filtering to remove noise and enhance signal. It is useful since it allows the filter or coder to process through the video on a path of near-maximum correlation based on following motion trajectories across the frames making up the image sequence or video. Motion compensation is also employed in all distribution-quality video coding formats, since it is able to achieve the smallest prediction error, which is then easier to code. Motion can be characterized in terms of either a velocity vector v or displacement vector d and is used to warp a reference frame onto a target frame. Motion estimation is used to obtain these displacements, one for each pixel in the target frame. Several methods of motion estimation are commonly used: • Block matching • Hierarchical block matching • Pel-recursive motion estimation • Direct optical flow methods • Mesh-matching methods Optical flow is the apparent displacement vector field d .d1,d2/ we get from setting (i.e., forcing) equality in the so-called constraint equationD x.n1,n2,n/ x.n1 d1,n2 d2,n 1/. (11.2–1) D − − − All five approaches start from this basic equation, which is really just an ide- alization. Departures from the ideal are caused by the covering and uncovering of objects in the viewed scene, lighting variation both in time and across the objects in the scene, movement toward or away from the camera, as well as rotation about an axis (i.e., 3-D motion).
  • Diapositivo 1

    Diapositivo 1

    VC 14/15 – TP16 Video Compression Mestrado em Ciência de Computadores Mestrado Integrado em Engenharia de Redes e Sistemas Informáticos Miguel Tavares Coimbra Outline • The need for compression • Types of redundancy • Image compression • Video compression VC 14/15 - TP16 - Video Compression Topic: The need for compression • The need for compression • Types of redundancy • Image compression • Video compression VC 14/15 - TP16 - Video Compression Images are great! VC 14/15 - TP16 - Video Compression But... Images need storage space... A lot of space! Size: 1024 x 768 pixels RGB colour space 8 bits per color = 2,6 MBytes VC 14/15 - TP16 - Video Compression What about video? • VGA: 640x480, 3 bytes per pixel -> 920KB per image. • Each second of video: 23 MB • Each hour of vídeo: 83 GB The death of Digital Video VC 14/15 - TP16 - Video Compression What if... ? • We exploit redundancy to compress image and video information? – Image Compression Standards – Video Compression Standards • “Explosion” of Digital Image & Video – Internet media – DVDs – Digital TV – ... VC 14/15 - TP16 - Video Compression Compression • Data compression – Reduce the quantity of data needed to store the same information. – In computer terms: Use fewer bits. • How is this done? – Exploit data redundancy. • But don’t we lose information? – Only if you want to... VC 14/15 - TP16 - Video Compression Types of Compression • Lossy • Lossless – We do not obtain an – We obtain an exact exact copy of our copy of our compressed data after compressed data after decompression. decompression. – Very high compression – Lower compression rates. rates. – Increased degradation – Freely compress / with sucessive decompress images. compression / It all depends on what we decompression.
  • Motion Compensation on DCT Domain

    Motion Compensation on DCT Domain

    EURASIP Journal on Applied Signal Processing 2001:3, 147–162 © 2001 Hindawi Publishing Corporation Motion Compensation on DCT Domain Ut-Va Koc Lucent Technologies Bell Labs, 600 Mountain Avenue, Murray Hill, NJ 07974, USA Email: [email protected] K. J. Ray Liu Department of Electrical and Computer Engineering and Institute for Systems Research, University of Maryland, College Park, MD 20742, USA Email: [email protected] Received 21 May 2001 and in revised form 21 September 2001 Alternative fully DCT-based video codec architectures have been proposed in the past to address the shortcomings of the conven- tional hybrid motion compensated DCT video codec structures traditionally chosen as the basis of implementation of standard- compliant codecs. However, no prior effort has been made to ensure interoperability of these two drastically different architectures so that fully DCT-based video codecs are fully compatible with the existing video coding standards. In this paper, we establish the criteria for matching conventional codecs with fully DCT-based codecs. We find that the key to this interoperability lies in the heart of the implementation of motion compensation modules performed in the spatial and transform domains at both the encoder and the decoder. Specifically,if the spatial-domain motion compensation is compatible with the transform-domain motion compensation, then the states in both the coder and the decoder will keep track of each other even after a long series of P-frames. Otherwise, the states will diverge in proportion to the number of P-frames between two I-frames. This sets an important criterion for the development of any DCT-based motion compensation schemes.
  • A Dynamic Motion Vector Referencing Scheme for Video Coding

    A Dynamic Motion Vector Referencing Scheme for Video Coding

    A DYNAMIC MOTION VECTOR REFERENCING SCHEME FOR VIDEO CODING Jingning Han, Yaowu Xu, and James Bankoski WebM Codec Team, Google Inc. 1600 Amphitheatre Parkway, Mountain View, CA 94043 Emails: fjingning,yaowu,[email protected] ABSTRACT poral neighbors. Prior research exploits such correlations to improve coding efficiency [2]-[4]. Video codecs exploit temporal redundancy in video signals, direct mode through the use of motion compensated prediction, to achieve A derived motion vector mode named is pro- superior compression performance. The coding of motion posed in [5] for bi-directional predicted blocks. Unlike the vectors takes a large portion of the total rate cost. Prior re- conventional inter block that needs to send the motion vector search utilizes the spatial and temporal correlation of the mo- residual to the decoder, this mode only infers motion vector tion field to improve the coding efficiency of the motion in- from previously coded blocks. To determine the motion vec- formation. It typically constructs a candidate pool composed tor for each reference frame, the scheme checks the neigh- of a fixed number of reference motion vectors and allows the boring blocks in the order of above, left, top-right, and top- codec to select and reuse the one that best approximates the left, picks the first one that has the same reference frame, and motion of the current block. This largely disconnects the en- reuses its motion vector. A rate-distortion optimization ap- tropy coding process from the block’s motion information, proach that allows the codec to select between two reference and throws out any information related to motion consistency, motion vector candidates is proposed in [6].
  • COMPRESSIVE TECHNIQUES APPLICABLE for VIDEO COMPRESSION Jayshree R

    COMPRESSIVE TECHNIQUES APPLICABLE for VIDEO COMPRESSION Jayshree R

    COMPRESSIVE TECHNIQUES APPLICABLE FOR VIDEO COMPRESSION Jayshree R. Pansare1, Ketki R. Jadhav2 1,2 Department of Computer Engineering, M.E.S. College of Engineering, Pune, India Abstract there lot of techniques for compression of Communication over internet is becoming images/ videos, finding better compression important part of our lives. Now days techniques for diversified applications or millions of peoples spend their time on you improving performance of existing techniques is tube, video communication and video still concern of lot of researchers. conferencing etc. Videos require large Images/videos are either get compressed by amount of bandwidth and transmission time. using lossless compression methods consist of Lot of video compression standards, Run length encoding (RLE), LZW ( Lempel Ziv- techniques and algorithms had been Welch) Coding, Huffman coding, Area Coding developed to reduce data quantity and gain or using lossy compression techniques consist of expected quality as possible as can. Video Transformation coding, Fractal segmentation, compressions technologies are now become Vector quantization, Sub band coding and Block the important part of the way we create, truncation coding. Even though most of us want consume and communicate visual image representation of content rather than text information. Paper reviews no of video for better understanding and visual appearance, compression techniques. It divides those videos turned out to be standards now days and techniques based on used techniques and used used by lot of applications such as DVD, digital concepts in survey papers. And shows that for TV, HDTV, Video calls and teleconferencing. which type of videos these techniques are Because of lot of advancement in network useful.
  • Lecture 13 Review Questions.Pdf

    Lecture 13 Review Questions.Pdf

    Multimedia communications ECP 610 Omar A. Nasr [email protected] May, 2015 1 Take home messages from the course: Google, Facebook, Microsoft and other content providers would like to take over the whole stack! Google 2 3 4 5 6 7 Multimedia communications involves: Media coding (Speech, Audio, Images, Video coding) Media transmission Media coding (compression) Production models (For speech, music, image, video) Perception models Auditory system Masking Ear sensitivity to different frequency ranges Visual system Brightness vs color Low frequency vs High frequency Quality of Service and Perception QoS for voice QoS for video 8 Speech production system 9 10 Perception system 11 12 Digital Image Representation (3 Bit Quantization) CS 414 - Spring 2009 Image Representations Black and white image single color plane with 1 bits Gray scale image single color plane with 8 bits Color image three color planes each with 8 bits RGB, CMY, YIQ, etc. Indexed color image single plane that indexes a color table Compressed images TIFF, JPEG, BMP, etc. 4 gray levels 2gray levels Image Representation Example 24 bit RGB Representation (uncompressed) 128 135 166 138 190 132 129 255 105 189 167 190 229 213 134 111 138 187 128 138 135 190 166 132 129 189 255 167 105 190 229 111 213 138 134 187 Color Planes Techniques used in coding: Loss-Less compression : Huffman coding Variable length, prefix, uniquely decodable code Main objective: optimally assign different number of bits to symbols having different frequencies 16 Discrete
  • NEW METHODS for MOTION ESTIMATION with APPLICATIONS to LOW COMPLEXITY VIDEO COMPRESSION a Thesis Submitted to the Faculty Of

    NEW METHODS for MOTION ESTIMATION with APPLICATIONS to LOW COMPLEXITY VIDEO COMPRESSION a Thesis Submitted to the Faculty Of

    NEW METHODS FOR MOTION ESTIMATION WITH APPLICATIONS TO LOW COMPLEXITY VIDEO COMPRESSION A Thesis Submitted to the Faculty of Purdue University by Zhen Li In Partial Ful¯llment of the Requirements for the Degree of Doctor of Philosophy December 2005 ii To my parents, Meilian Jin and Yanzong Li; To my wife, Limin Liu; and in memory of my grandmother, Zhugu Cai. iii ACKNOWLEDGMENTS I would like to thank my advisor, Professor Edward J. Delp, for his guidance and support. I truly appreciate the research freedom he allowed. And I am deeply grateful for his reminder to keep focused when I was zigzagging with various research topics, and his advice to look deeper into one challenging problem rather than playing around with many problems. I would also like to thank my Doctoral Committee: Professors Charles A. Bouman, Michael D. Zoltowski, and Dongyan Xu, for their advice, encouragement, and in- sights. I am grateful to the organizations which supported the research in this disser- tation, in particular the Indiana Twenty-First Century Research and Technology Fund. I appreciate the support and friendship of all my colleagues in the Video and Image Processing (VIPER) lab. It has been a joyful and memorable journey with these excellent people and scholars: Dr. Eduardo Asbun, Dr. Gregory Cook, Dr. Paul Salama, Dr. Lauren Christopher, Dr. Eugene Lin, Dr. Zoe Yuxin Liu, Dr. Yajie Sun, Dr. Jinwha Yang, Dr. Cuneyt Taskiran, Dr. Sahng-Gyu Park, Hyung Cook Kim, Jennifer Talavage, Hwayoung Um, Anthony Martone, Aravind Mikkili- neni, Oriol Guitart, Michael Igarta, Liang Liang, Ying Chen, Fengqing Zhu, Ashok Mariappan and Hakeem Ogunleye.
  • Motion Estimation and Motion Compensation Driven Neural Network for Video Interpolation and Enhancement

    Motion Estimation and Motion Compensation Driven Neural Network for Video Interpolation and Enhancement

    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 1 MEMC-Net: Motion Estimation and Motion Compensation Driven Neural Network for Video Interpolation and Enhancement Wenbo Bao, Wei-Sheng Lai, Xiaoyun Zhang, Zhiyong Gao, and Ming-Hsuan Yang Abstract—Motion estimation (ME) and motion compensation (MC) have been widely used for classical video frame interpolation systems over the past decades. Recently, a number of data-driven frame interpolation methods based on convolutional neural networks have been proposed. However, existing learning based methods typically estimate either flow or compensation kernels, thereby limiting performance on both computational efficiency and interpolation accuracy. In this work, we propose a motion estimation and compensation driven neural network for video frame interpolation. A novel adaptive warping layer is developed to integrate both optical flow and interpolation kernels to synthesize target frame pixels. This layer is fully differentiable such that both the flow and kernel estimation networks can be optimized jointly. The proposed model benefits from the advantages of motion estimation and compensation methods without using hand-crafted features. Compared to existing methods, our approach is computationally efficient and able to generate more visually appealing results. Furthermore, the proposed MEMC-Net architecture can be seamlessly adapted to several video enhancement tasks, e.g., super-resolution, denoising, and deblocking. Extensive quantitative and qualitative evaluations demonstrate that