<<

H.265 (HEVC) BITSTREAM TO H.264 (MPEG 4 AVC)

BITSTREAM TRANSCODER

by

DEEPAK HINGOLE

Presented to the Faculty of the Graduate School of

The University of Texas at Arlington in Partial Fulfillment

of the Requirements

for the Degree of

MASTER OF SCIENCE IN ELECTRICAL ENGINEERING

THE UNIVERSITY OF TEXAS AT ARLINGTON

December 2015

Copyright © by Deepak Hingole 2015

All Rights Reserved

ACKNOWLEDGEMENTS

I would like to express my heartfelt gratitude to my advisor Dr. K. R. Rao for his unwavering support, encouragement, supervision and valuable inputs throughout this research work. He has been a constant source of inspiration for me to pursue this research work.

I would also like to extend my gratitude to my colleagues in Adobe Systems

Incorporated for their invaluable industrial insight and experience to help me understand and grow in the field of digital processing.

Additionally I would like to thank Dr. Schizas and Dr. Dillon for serving as members of my graduate committee.

A big thank you to Vasavee, Rohith, Maitri, Shwetha, Srikanth and Uma, my

Multimedia Processing Lab mates for providing valuable suggestions during the course of my research work.

Last but not the least; I would like to thank my parents, my siblings and my close friends for believing in me and supporting me in this undertaking. I wish for your continued support in future.

November 25, 2015

ABSTRACT

H.265 (HEVC) BITSTREAM TO H.264 (MPEG 4 AVC)

BITSTREAM TRANSCODER

Deepak Hingole, MS

The University of Texas at Arlington, 2015

Supervising Professor: K. R. Rao

With every new video coding standard the general rule of thumb has been to maintain same video quality at a reduced of about 50% as compared to the previous standard.

H.265 is the latest video coding standard with support for encoding with wide range of resolutions, starting from low resolution to beyond High Definition i.e. 4k or

8k. H.265 also known as HEVC was preceded by H.264 which is very well established and widely used standard in industry and finds its applications in broadcast, storage, multimedia . Currently almost all devices including low power handheld mobile devices have capabilities to decode H.264 encoded bitstream.

HEVC achieves high coding efficiency at the cost of increased implementation complexity and not all devices have hardware powerful enough to process (decode) this

HEVC bitstream. In order for HEVC coded content to be played on devices with support for H.264, of HEVC bitstream to H.264 bitstream is necessary.

Different transcoding architectures will be investigated and an easy to implement scheme will be studied as part of this research.

TABLE OF CONTENTS

ACKNOWLEDGEMENTS ...... iii

ABSTRACT ...... iv

LIST OF ILLUSTRATIONS ...... viii

LIST OF TABLES ...... x

Chapter 1 Introduction...... 1

1.1. Basics of Video Compression ...... 1

1.2. Need for Video Compression ...... 2

1.3. Video Coding Standards ...... 2

1.4. Thesis Outline ...... 3

Chapter 2 Overview Of H.264 ...... 4

2.1. Introduction ...... 4

2.2. Profiles and Levels in H.264 ...... 5

2.2.1. Profiles in H.264 ...... 6

2.2.1.1. Baseline profile ...... 7

2.2.1.2. Main profile ...... 7

2.2.1.3. Extended profile ...... 7

2.2.1.4. High profile ...... 8

2.2.2. Levels in H.264 ...... 8

2.3. H.264 Encoder ...... 10

2.4. H.264 Decoder...... 12

Chapter 3 Overview Of HEVC ...... 13

3.1 Introduction ...... 13

3.2 Profiles and levels in H.265 ...... 14

3.3 H.265 Encoder and Decoder ...... 16

3.3.1 Coding Tree Units (CTU) and Coding Tree Block (CTB) ...... 18

3.3.2 Coding Units (CU) and Coding Blocks (CB) ...... 18

3.3.3 Prediction Units (PU) and Prediction Blocks (PB) ...... 19

3.3.4 Transform Units (TU) and Transform Blocks (TB) ...... 19

3.3.5 Motion Vector Signaling ...... 19

3.3.6 ...... 20

3.3.7 Intra-picture prediction ...... 20

3.3.8 Quantization Control ...... 21

3.3.9 Entropy Coding ...... 21

3.3.10 In-loop Deblocking Filtering ...... 22

3.3.11 Sample Adaptive Offset (SAO) ...... 22

3.4 High-Level Syntax Architecture ...... 22

3.4.1 Parameter Set Structure ...... 22

3.4.2 NAL unit syntax structure ...... 23

3.4.3 Slices ...... 23

3.4.4 Supplemental Enhancement Information (SEI) and Video

Usability Information (VUI) ...... 24

3.5 Parallel Processing Features ...... 24

3.5.1 Tiles ...... 24

3.5.2 Wavefront Parallel Processing (WPP) ...... 25

3.5.3 Dependent slices ...... 26

Chapter 4 Transcoding...... 27

4.1 Introduction ...... 27

4.2 Transcoding Architectures ...... 28

4.2.1. Open Loop Transcoding Architecture ...... 28

4.2.2. Closed-loop Transcoding Architecture ...... 29

4.2.3. Cascaded Pixel-domain architecture ...... 30

4.2.4. Motion Compensation in the DCT Domain ...... 30

4.3. Choice of Transcoding Architecture ...... 31

Chapter 5 Results ...... 33

5.1 Quality Metrics For Cascaded Implementation ...... 33

5.2 Peak--To-Noise-Ratio (PSNR) versus Quantization

Parameter (QP) ...... 38

5.3 Bitrate versus Quantization Parameter ...... 41

5.4 Rate Distortion (R-D) Plot ...... 44

Chapter 6 Conclusion and Future Work ...... 47

APPENDIX A Test Sequences ...... 48

APPENDIX B Test Conditions ...... 52

Test Environment ...... 53

APPENDIX ...... 54

REFERENCES ...... 57

BIOGRAPHICAL INFORMATION ...... 62

LIST OF ILLUSTRATIONS

Figure 1—1: I-, P- and B- frames ...... 2

Figure 1—2: Evolution of video coding standards ...... 3

Figure 2—1: Different profiles in H.264 with distribution of various coding tools among the profiles ...... 6

Figure 2—2: H.264 Encoder block diagram ...... 10

Figure 2—3: Nine prediction modes for 4×4 Luma block ...... 11

Figure 2—4: H.264 Decoder block diagram ...... 12

Figure 3—1: 4:2:0 Subsampling ...... 15

Figure 3—2: Typical HEVC video encoder (with decoder modeling elements shaded in light gray) ...... 17

Figure 3—3: HEVC Decoder block diagram ...... 17

Figure 3—4: Modes and directional orientations for intra-picture prediction...... 21

Figure 3—5: Subdivision of a picture into slices ...... 23

Figure 3—6: Subdivision of a picture into tiles ...... 25

Figure 3—7: Illustration of Wavefront Parallel Processing ...... 25

Figure 4—1: Open Loop, partial decoding to DCT coefficients then requantize ...... 28

Figure 4—2: Closed-loop, drift compensation for requantized data ...... 29

Figure 4—3: Cascade decoder encoder architecture ...... 30

Figure 4—4: Frame based comparison of open loop, closed loop and cascaded pixel domain architecture ...... 31

Figure 4—5: General block diagram for proposed transcoding scheme ...... 32

Figure 5—1: PSNR (dB) versus QP for akiyo_cif.y4m ...... 38

Figure 5—2: PSNR (dB) versus QP for city_cif.y4m ...... 38

Figure 5—3: PSNR (dB) versus QP for crew_cif.y4m ...... 39

Figure 5—4: PSNR (dB) versus QP for flower_cif.y4m ...... 39

Figure 5—5: PSNR (dB) versus QP for football_cif.y4m ...... 40

Figure 5—6: Bitrate (kbps) versus QP for akiyo_cif.y4m ...... 41

Figure 5—7: Bitrate (kbps) versus QP for city_cif.y4m ...... 41

Figure 5—8: Bitrate (kbps) versus QP for crew_cif.y4m ...... 42

Figure 5—9: Bitrate (kbps) versus QP for flower_cif.y4m ...... 42

Figure 5—10: Bitrate (kbps) versus QP for football_cif.y4m ...... 43

Figure 5—11: R-D plot for akiyo_cif.y4m ...... 44

Figure 5—12: R-D plot for city_cif.y4m ...... 44

Figure 5—13: R-D plot for crew_cif.y4m ...... 45

Figure 5—14: R-D plot for flower_cif.y4m ...... 45

Figure 5—15: R-D plot for football_cif.y4m ...... 46

LIST OF TABLES

Table 2-1 Levels in H.264 ...... 9

Table 3-1 Levels limits for Main profile in HEVC ...... 16

Table 5-1 akiyo_cif.y4m sequence quality metrics ...... 33

Table 5-2 city_cif.y4m sequence quality metrics ...... 34

Table 5-3 crew_cif.y4m sequence quality metrics ...... 35

Table 5-4 flower_cif.y4m sequence quality metrics ...... 36

Table 5-5 football_cif.y4m sequence quality metrics ...... 37

Chapter 1

Introduction

1.1. Basics of Video Compression

Like many other recent technological developments, the emergence of video and image coding in the mass market is due to convergence of a number of areas. Cheap and powerful processors, fast network access, the ubiquitous and a large-scale research and standardization effort have all contributed to the development of image and video coding technologies [1].

Video can be thought of as a series of images displayed at a constant interval.

This constant interval also known as or frames per second (FPS) is an important factor in video technology [2].

The objective of any compression scheme is to represent the data in a compact form. Representation of data in reduced number of bits is achieved through the exploitation of various redundancies present in data. In case of video, we have spatial and temporal redundancies apart from statistical and perceptual redundancies. Spatial redundancies can be thought of as a block of pixels in a video frame bearing similarities with its neighboring blocks. Similarly, temporal redundancies can be thought of as a set frame bearing similarities with that of the frames that have arrived before and/or that will follow after the current frame of interest.

A picture or frame will belong to one of the I-picture, P-picture or B-picture categories. I – pictures or intra predicted frame is the one in which current frame is predicted without referring to any other frame. P – pictures and B – pictures are said to be inter-coded using motion-compensated prediction from a reference frame. P – pictures make use of a reference frame (the P – picture or I – picture preceding the current P -

picture), whereas B – pictures make use two reference frames (the P – and/or I – pictures before and after the current frame). The difference between predicted frame and actual frame carries less information and is coded to achieve compression. The three types of frames are shown in figure 1-1[2].

Figure 1—1: I-, P- and B- frames

1.2. Need for Video Compression

Is compression really necessary once transmission and storage capacities have increased to a sufficient level to cope with ! It is true that both the storage and transmission capacities continue to increase. However, an efficient and well- designed video compression system gives very significant performance advantages for visual communication at both low and high transmission bandwidths.

1.3. Video Coding Standards

There have been several video coding standards introduced by organizations like the International Union - Telecommunication Standardization Sector

(ITU-T), Moving Picture Experts Group (MPEG) and the Joint Collaborative Team on

Video Coding (JCT-VC). Each standard is an improvement over the previous standard.

With every standard, the general thumb of rule has been to retain the same video quality by being able to reduce the bit rate by 50%. Figure 1-2 shows the evolution of video coding standards over the years.

Figure 1—2: Evolution of video coding standards

1.4. Thesis Outline

Chapter 2 describes the overview of H.264 also known as MPEG 4 Part 10/AVC.

In a similar fashion, the overview of H.265 also known as High Efficiency Video Coding

(HEVC) is discussed in Chapter 3. Chapter 4 highlights the need for transcoding along with exploring different transcoding architectures and chooses one of them as preferred choice of transcoding scheme. Chapter 5 summarizes the results of proposed followed by Chapter 6 discussing about how well the proposed performed and what conclusions can be drawn from it along with future areas of research in the same direction.

Chapter 2

Overview Of H.264

2.1. Introduction

H.264/MPEG4-Part 10 (AVC) was iintroduced in 2003 and was developed by the Joint Video Team (JVT), consisting of Video Coding Experts

Group (VCEG) of International Telecommunication Union – Telecommunication standardization sector (ITU-T) and Moving Picture Experts Group (MPEG) of

International / (ISO/IEC) [4].

H.264 can support various interactive (video telephony) and non-interactive applications (broadcast, streaming, storage, ) as it facilitates a network friendly video representation [7]. It leverages on the previous coding standards such as

MPEG-1, MPEG-2, MPEG-4 part 2, H.261, H.262 and H.263 [6] [8] and adds many other coding tools and techniques which give it superior quality and compression efficiency.

Like any other previous motion-based , it uses the following basic principles of video compression [5]:

 Transform for reduction of spatial correlation

 Quantization for control of bitrate

 Motion compensated prediction for reduction of temporal correlation

 Entropy coding for reduction in statistical correlation.

The improved coding efficiency of H.264 can be attributed to the additional coding tools and the new features. Listed below are some of the new and improved techniques used in H.264 [7]:

 Adaptive intra-picture prediction

 Small block size transform with integer precision

 Multiple reference pictures and generalized B-frames

 Variable block sizes

 Quarter pel precision for motion compensation

 Content adaptive in-loop deblocking filter and

 Improved entropy coding by introduction of context adaptive binary arithmetic

coding (CABAC) and context adaptive variable length coding (CAVLC)

The increase in the coding efficiency and increase in the compression ratio results to a greater complexity of the encoder and the decoder algorithms of H.264, as compared to previous coding standards. In order to develop error resilience for transmission of information over the network, H.264 supports the following techniques [7]:

 Flexible (MB) ordering

 Switched slice

 Arbitrary slice order

 Redundant slice

 Data partitioning

 Parameter setting

2.2. Profiles and Levels in H.264

Profiles and levels specify conformance points for implementing the standard in an interoperable way across various applications that have similar functional requirements, whereas a level places constraints on certain key parameters of the bitstream, corresponding to decoder processing load and memory capabilities [13].

2.2.1. Profiles in H.264

A profile defines a set of coding tools or algorithms that can be used in generating a conforming bitstream [13].

The profiles defined for H.264 can be listed as follows [10]:

1. Baseline profile

2. Main profile

3. Extended profile

4. High Profiles defined in the FRExts amendment

Figure 2-1 illustrates the coding tools for the various profiles of H.264.

Figure 2—1: Different profiles in H.264 with distribution

of various coding tools among the profiles

2.2.1.1. Baseline profile

The list of tools included in the baseline profile are I (intra coded) and P

(predictive coded) slice coding, enhanced error resilience tools of flexible MB ordering, arbitrary slices and redundant slices. It also supports CAVLC. The baseline profile is intended to be used in low delay applications, applications demanding low processing power and in high packet loss environments. This profile has the least coding efficiency among all the three profiles.

2.2.1.2. Main profile

The coding tools included in the main profile are I, P, and B (bi-directionally prediction coded) slices, interlace coding, CAVLC and CABAC. The tools not supported by main profile are error resilience tools, data partitioning and switched intra (SI) coded and switched predictive (SP) coded slices. This profile is aimed to achieve highest possible coding efficiency.

2.2.1.3. Extended profile

This profile has all the tools included in the baseline profile. As illustrated in the figure 2-1, this profile also includes B, SP and SI slices, data partitioning, interlace frame and field coding, picture adaptive frame/field coding and MB adaptive frame/field coding.

This profile provides better coding efficiency than baseline profile. The additional tools result in increased complexity.

2.2.1.4. High profile

In September 2004 the first amendment of H.264/MPEG-4 AVC video coding standard was released [10]. A new set of coding tools were introduced as a part of this amendment. These are termed as “Fidelity Range Extensions” (FRExts). The aim of releasing FRExts is to be able to achieve significant improvement in coding efficiency for higher fidelity material. The application areas for the FRExts tools are professional production, and high-definition (HD) TV/DVD.

The FRExts amendment defines four new profiles. Discussion of those profiles is out of scope of this document, so skipped.

2.2.2. Levels in H.264

Level restrictions are established in terms of maximum sample rate, maximum picture size, maximum bit rate, minimum compression ratio and capacities of the decoded picture buffer (DPB), and the coded picture buffer (CPB) that holds compressed data prior to its decoding for data flow management purposes [13].

In H.264 /AVC, 16 levels are specified. The levels defined in H.264 are listed in

Table 2-1. The level ‘1b’ was added in the FRExts amendment.

Table 2-1 Levels in H.264

2.3. H.264 Encoder

Figure 2-2 illustrates the block diagram for the H.264 encoder. H.264 encoder works on MB and motion-compensation like most other previous generation codecs.

Video is formed by a series of picture frames. Each picture frame is an image which is split down into blocks. The block sizes can vary in H.264.

Figure 2—2: H.264 Encoder block diagram

The encoder may perform intra-coding or inter-coding for the MBs of a given picture. Intra coded frames are encoded and decoded independently. They do not need any reference frames. Hence they provide access points to the coded sequence where decoding can start.

. Figure 2-3 illustrates the nine prediction modes for 4×4 luma block [12]. There are total of nine optional prediction modes for each 4×4 luma block, four modes for a

16×16 luma block and four modes for the chroma components.

Figure 2—3: Nine prediction modes for 4×4 Luma block

Inter-coding uses inter-prediction of a given block from some previously decoded pictures. The aim to use inter-coding is to reduce the temporal redundancy by making use of motion vectors. Motion vectors give the direction of motion of a particular block from the current frame to the next frame.

The prediction residuals are obtained which then undergo transformation to remove spatial correlation in the block. The transformed coefficients, thus obtained, undergo quantization. The motion vectors, obtained from inter-prediction are combined with the quantized transform coefficient information. They are then entropy encoded using schemes such as CAVLC or CABAC to reduce statistical redundancies[6].

There is a local decoder within the H.264 encoder. This local decoder performs the operations of inverse quantization and inverse transform to obtain the residual signal in the spatial domain. The prediction signal is added to the residual signal to reconstruct the input frame. This input frame is fed in the deblocking filter to remove blocking artifacts at the block boundaries. The output of the deblocking filter is then fed to inter/intra prediction blocks to generate prediction signals.

2.4. H.264 Decoder

The H.264 decoder works similar in operation to the local decoder of H.264 encoder. Figure 2-4 illustrates the H.264 decoder block diagram [12]. An encoded bitstream is the input to the decoder. Entropy decoding (CABAC or CAVLC) takes place on the bitstream to obtain the transform coefficients. These coefficients are then inverse scanned and inverse quantized. This gives residual block data in the transform domain.

Inverse transform is performed to obtain the data in the spatial domain. The resulting output is 4x4 blocks of residual signal. Depending on inter-predicted or intra-predicted, an appropriate prediction signal is added to the residual signal. For an inter-coded block, a prediction block is constructed depending on the motion vectors, reference frames and previously decoded pictures. This prediction block is added to the residual block to reconstruct the video frames. These reconstructed frames then undergo deblocking before they are stored for future use for prediction or being displayed.

Figure 2—4: H.264 Decoder block diagram

Chapter 3

Overview Of HEVC

3.1 Introduction

H.264 is widely used for many applications, including broadcast of high definition

(HD) TV signals over satellite, cable, and terrestrial transmission systems, video content acquisition and editing systems, , security applications, Internet and mobile network video, Blu-ray Discs, and real-time conversational applications such as video chat, video conferencing, and systems. However, an increasing diversity of services, the growing popularity of HD video, and the emergence of beyond HD formats

(e.g., 4k×2k or 8k×4k resolution) are creating even stronger needs for coding efficiency superior to H.264/ MPEG-4 AVC’s capabilities. The need is even stronger when higher resolution is accompanied by stereo or multiview capture and display [13].

High Efficiency Video Coding (HEVC) is the latest Video Coding format. It challenges the state-of-the-art H.264/AVC Video Coding standard which is in the industry by being able to reduce the bit rate by 50%, retaining the same video quality.

HEVC is designed to address existing applications of H.264/MPEG-4 AVC and to focus on two key issues: increased video resolution and increased use of parallel processing architectures. It primarily targets consumer applications as pixel formats are limited to 4:2:0 8-bit and 4:2:0 10-bit. 4:2:0

3.2 Profiles and levels in H.265

Only three profiles targeting different application requirements, called the Main,

Main 10, and Main Still Picture profiles, are finalized by January 2013. Minimizing the number of profiles provides a maximum amount of interoperability between devices, and is further justified by the fact that traditionally separate services, such as broadcast, mobile, streaming, are converging to the point where most devices should become usable to support all of them. The three drafted profiles consist of the coding tools and high layer syntax described in the different sections of, while imposing the following restrictions [13]:

1) Only 4:2:0 chroma sampling is supported as shown in figure 3-1.

2) When an encoder encodes a picture using multiple tiles, it cannot also

use wavefront parallel processing, and each tile must be at least 256

luma samples wide and 64 luma samples tall.

3) In the Main and Main Still Picture profiles, only a video precision of 8 b

per sample is supported, while the Main 10 profile supports up to 10 b

per sample.

4) In the Main Still Picture profile, the entire bitstream must contain only one

coded picture (and thus inter-picture prediction is not supported).

Figure 3—1: 4:2:0 Subsampling

Currently, there are definition of 13 levels included in the first version of the standard as shown in Table 3-1, ranging from levels that support only relatively small picture sizes such as a luma picture size of 176×144 (sometimes called a quarter common intermediate format (QCIF)) to picture sizes as large as 7680×4320 (often called

8k×4k). The picture width and height are each required to be less than or equal to

√(8 − MaxLumaPS), where MaxLumaPS is the maximum luma picture size as shown in

Table 3-1 (to avoid the problems for decoders that could be involved with extreme picture shapes) [13].

Table 3-1 Levels limits for Main profile in HEVC

3.3 H.265 Encoder and Decoder

The video coding layer of HEVC employs the same hybrid approach (inter-/intra- picture prediction and 2-D ) used in all video compression standards since H.261. Figure 3-2 depicts the block diagram of a hybrid video encoder, which could create a bitstream conforming to the HEVC standard [13] whereas Figure 3-3 depicts the block diagram for HEVC Decoder.

Figure 3—2: Typical HEVC video encoder (with decoder modeling

elements shaded in light gray)

Figure 3—3: HEVC Decoder block diagram

Various features involved in hybrid video coding using HEVC are discussed in sub-sections to follow.

3.3.1 Coding Tree Units (CTU) and Coding Tree Block (CTB)

The MB, containing a 16×16 block of luma samples and, in the usual case of

4:2:0 color sampling, two corresponding 8×8 blocks of chroma samples is the core of coding layer in H.264. The analogous structure in HEVC is the coding tree unit (CTU), which has a size selected by the encoder and can be larger than a traditional MB[13].

The CTU consists of a luma CTB and the corresponding chroma CTBs and syntax elements. The size L×L of a luma CTB can be chosen as L = 16, 32, or 64 samples, with the larger sizes typically enabling better compression. HEVC then supports a partitioning of the CTBs into smaller blocks using a tree structure and quadtree-like signaling [13][14].

3.3.2 Coding Units (CU) and Coding Blocks (CB)

The quadtree syntax of the CTU specifies the size and positions of its luma and chroma CBs. The root of the quadtree is associated with the CTU. Hence, the size of the luma CTB is the largest supported size for a luma CB. The splitting of a CTU into luma and chroma CBs is signaled jointly. One luma CB and ordinarily two chroma CBs, together with associated syntax, form a coding unit (CU). A CTB may contain only one

CU or may be split to form multiple CUs, and each CU has an associated partitioning into prediction units (PUs) and a tree of transform units (TUs)[13].

3.3.3 Prediction Units (PU) and Prediction Blocks (PB)

The decision whether to code a picture area using inter-picture or intra-picture prediction is made at the CU level. A PU partitioning structure has its root at the CU level.

Depending on the basic prediction-type decision, the luma and chroma CBs can then be further split in size and predicted from luma and chroma prediction blocks (PBs). HEVC supports variable PB sizes from 64×64 down to 4×4 samples[13].

3.3.4 Transform Units (TU) and Transform Blocks (TB)

The prediction residual is coded using block transforms. A TU tree structure has its root at the CU level. The luma CB residual may be identical to the luma transform block (TB) or may be further split into smaller luma TBs. The same applies to the chroma

TBs. Integer basis functions similar to those of a discrete cosine transform (DCT) are defined for the square TB sizes 4×4, 8×8, 16×16, and 32×32. For the 4×4 transform of luma intrapicture prediction residuals, an integer transform derived from a form of (DST) is alternatively specified. Alternative 4 × 4 Transform is

29 55 74 84 퐻 = [ 74 74 0 −74 ] 84 −29 −74 55 55 −84 74 −29

3.3.5 Motion Vector Signaling

Advanced motion vector prediction (AMVP) is used, including derivation of several most probable candidates based on data from adjacent PBs and the reference picture. A merge mode for motion vector (MV) coding can also be used, allowing the inheritance of MVs from temporally or spatially neighboring PBs. Moreover, compared to

H.264/MPEG-4 AVC, improved skipped and direct motion inference are also specified[13].

3.3.6 Motion Compensation

Quarter-sample precision is used for the MVs, and 7-tap or 8-tap filters are used for interpolation of fractional-sample positions (compared to six-tap filtering of half-sample positions followed by linear interpolation for quarter-sample positions in H.264/MPEG-4

AVC). Similar to H.264/MPEG-4 AVC, multiple reference pictures are used. For each PB, either one or two motion vectors can be transmitted, resulting either in uni-predictive or bi-predictive coding, respectively. As in H.264/MPEG-4 AVC, a scaling and offset operation may be applied to the prediction signal(s) in a manner known as weighted prediction[13].

3.3.7 Intra-picture prediction

The decoded boundary samples of adjacent blocks are used as reference data for spatial prediction in regions where inter-picture prediction is not performed. Intra- picture prediction supports 33 directional modes (compared to eight such modes in

H.264/MPEG-4 AVC), plus planar (surface fitting) and DC (flat) prediction modes. The selected intra-picture prediction modes are encoded by deriving most probable modes

(e.g., prediction directions) based on those of previously decoded neighboring PBs.

Figure 3-4 shows the various modes used in intra-picture prediction [13].

Figure 3—4: Modes and directional orientations for intra-picture prediction.

3.3.8 Quantization Control

As in H.264/MPEG-4 AVC, uniform reconstruction quantization (URQ) is used in

HEVC, with quantization scaling matrices supported for the various transform block sizes[13].

3.3.9 Entropy Coding

As opposed to CAVLC and CABAC only CABAC is used as entropy coding scheme. CABAC scheme used in H.265 is similar to that of in H.264/MPEG-4 AVC, but has undergone several improvements to improve its throughput speed (especially for parallel-processing architectures) and its compression performance, and to reduce its context memory requirements[13].

3.3.10 In-loop Deblocking Filtering

A deblocking filter similar to the one used in H.264/MPEG-4 AVC is operated within the inter-picture prediction loop. However, the design is simplified in regard to its decision-making and filtering processes, and is made more friendly to parallel processing[13].

3.3.11 Sample Adaptive Offset (SAO)

A nonlinear amplitude mapping is introduced within the inter-picture prediction loop after the deblocking filter. Its goal is to better reconstruct the original signal amplitudes by using a look-up table that is described by a few additional parameters that can be determined by histogram analysis at the encoder side[13].

3.4 High-Level Syntax Architecture

A number of design aspects new to the HEVC standard improve flexibility for operation over a variety of applications and network environments and improve robustness to data losses. However, the high-level syntax architecture used in the

H.264/MPEG-4 AVC standard has generally been retained, including the following features[13].

3.4.1 Parameter Set Structure

Parameter sets contain information that can be shared for the decoding of several regions of the decoded video. The parameter set structure provides a robust mechanism for conveying data that are essential to the decoding process. The concepts of sequence and picture parameter sets from H.264/MPEG-4 AVC are augmented by a new video parameter set (VPS) structure[13].

3.4.2 NAL unit syntax structure

Each syntax structure is placed into a logical data packet called a network abstraction layer (NAL) unit. Using the content of a two NAL unit header, it is possible to readily identify the purpose of the associated payload data[13].

3.4.3 Slices

A slice is a data structure that can be decoded independently from other slices of the same picture, in terms of entropy coding, signal prediction, and residual signal reconstruction. A slice can either be an entire picture or a region of a picture. One of the main purposes of slices is resynchronization in the event of data losses. In the case of packetized transmission, the maximum number of payload bits within a slice is typically restricted, and the number of CTUs in the slice is often varied to minimize the packetization overhead while keeping the size of each packet within this bound[13].

Figure 3-5 depicts the subdivision of a picture into slices.

Figure 3—5: Subdivision of a picture into slices

3.4.4 Supplemental Enhancement Information (SEI) and Video Usability Information

(VUI) metadata

The syntax includes support for various types of metadata known as SEI and

VUI. Such data provide information about the timing of the video pictures, the proper interpretation of the color space used in the video signal, 3-D stereoscopic frame packing information, other display hint information, and so on [13].

3.5 Parallel Processing Features

Four new features are introduced in the HEVC standard to enhance the parallel processing capability or modify the structuring of slice data for packetization purposes.

Each of them may have benefits in particular application contexts, and it is generally up to the implementer of an encoder or decoder to determine whether and how to take advantage of these features[2] [13].

3.5.1 Tiles

HEVC has an option of partitioning its picture into rectangular independently decodable regions called as tiles. Its main purpose is for parallel processing. Tiles can also be used for random access to local regions in video pictures. Tiles provide parallelism at a more coarse level (picture/sub-picture) of granularity, and no sophisticated synchronization of threads is necessary for their use [2].

Figure 3-6 depicts the subdivision of a picture into tiles.

Figure 3—6: Subdivision of a picture into tiles

3.5.2 Wavefront Parallel Processing (WPP)

This is a new feature in HEVC which when enabled allows a slice to be divided into rows of CTUs. The processing of each row can be started only after certain decisions in the previous row have been made. WPP provides parallelism within slices. Figure 3-7 shows how WPP works.

Figure 3—7: Illustration of Wavefront Parallel Processing

3.5.3 Dependent slices

Dependent slices allow data associated with a particular wave front point entry or tile to be carried in a separate NAL unit. It also allows fragmented packetization of the data with lower latency than if it were all coded in one slice [2][13].

Chapter 4

Transcoding

4.1 Introduction

Video transcoding [15][16] is the process of converting video from one format to another. A format is basically defined by the characteristics such as bit-rate, frame rate, spatial resolution etc. One of the earliest applications of transcoding is to adapt the bit rate of a precompressed bitstream to the available . Hence transcoding is undertaken to meet the demands of constrained bandwidths and terminal capabilities [15]. Transcoding also leads to interoperability between different networks, devices and content representation formats.

Transcoding can be of various types [15]. Some of them are bit rate transcoding to facilitate more efficient transport of video, spatial and temporal resolution reduction transcoding for use in mobile devices with limited display and processing power and error-resilience transcoding in order to achieve higher resilience of the original bitstream to transmission errors.

To achieve optimum results by transcoding, the following criteria have to be fulfilled:

1) The quality of the transcoded bitstream should be comparable to the one

obtained by direct decoding and re-encoding of the output stream.

2) The information contained in the input stream should be used as much as

possible to avoid multigenerational deterioration.

3) The process should be cost efficient, low in complexity and achieve the

highest quality possible.

4.2 Transcoding Architectures

There are different standard transcoding architectures for changing bit rate, spatial resolution, format conversion. Few of them are discussed here in sections to follow:

4.2.1. Open Loop Transcoding Architecture

Figure 4-1 shows and open-loop system. In the open-loop system, the bit stream is variable-length decoded (VLD) to extract the variable-length code words corresponding to the quantized DCT coefficients, as well as MB data corresponding to the motion vectors and other MB-level information.

In this scheme, the quantized coefficients are inverse quantized and then simply requantized to satisfy the new output bit rate. Finally, the requantized coefficients and stored MB-level information are variable length coded (VLC).

Regardless of the techniques used to achieve the reduced rate, open-loop systems are relatively simple since a frame memory is not required and there is no need for an IDCT. In terms of quality, better coding efficiency can be obtained by the requantization approach since the variable-length codes that are used for the requantized data will be more efficient. However, open-loop architectures are subject to drift[16].

Figure 4—1: Open Loop, partial decoding to DCT coefficients

then requantize

4.2.2. Closed-loop Transcoding Architecture

In general, the reason for drift is due to the loss of high-frequency information.

Figure 4-2 shows a closed-loop system. Closed-loop system aims to eliminate the mismatch between predictive and residual components by approximating the cascaded decoder-encoder architecture [17].

Figure 4—2: Closed-loop, drift compensation for

requantized data

This simplified scheme requires only one reconstruction loop with one DCT and one IDCT. With the exception of this slight inaccuracy, this architecture is mathematically equivalent to a cascaded decoder-encoder approach.

4.2.3. Cascaded Pixel-domain architecture

Figure 4-3 shows cascaded decoder encoder architecture.

The main difference in structure between the cascaded decoder encoder architecture also known as cascaded pixel-domain architecture and the closed-loop scheme is that reconstruction in the cascaded pixel-domain architecture is performed in the spatial domain, thereby requiring two reconstruction loops with one DCT and two

IDCTs.

Figure 4—3: Cascade decoder encoder architecture

4.2.4. Motion Compensation in the DCT Domain

The closed-loop architecture described in the section 4.2.2. provides an effective transcoding structure in which the MB reconstruction is performed in the DCT domain.

However, since the memory stores spatial domain pixels, the additional DCT/IDCT is still needed. This can be avoided though by utilizing the compressed-domain methods for MC proposed by Chang and Messerschmidt [23]. In this way, it is possible to reconstruct reference frames without decoding to the spatial domain; several architectures describing this reconstruction process in the compressed domain have been proposed [24]-[26]. It was found that decoding completely in the compressed-domain could yield equivalent

quality to spatial-domain decoding [24]. However, this was achieved with floating-point matrix multiplication and proved to be quite costly.

Different transcoding architectures for spatial resolution reduction, temporal resolution reduction like Motion Vector Mapping , DCT-Domain Down Conversion,

Conversion of MB Type, Motion Vector Reestimation, Residual Reestimation are discussed in [16].

4.3. Choice of Transcoding Architecture

The cascaded pixel domain transcoding architecture gives optimum results in terms of complexity, quality and cost. The cascaded pixel domain transcoder offers greater flexibility in the sense that it can be used for bit rate transcoding, spatial/temporal resolution downscaling and for other coding parameter changes as well. Since in the case of standards transcoding it is required to take into consideration the different coding characteristics of H.265 and H.264, flexibility is a key issue.

Figure 4—4: Frame based comparison of open loop, closed loop

and cascaded pixel domain architecture

It is evident from figure 4-4 that the open-loop architecture suffers from severe drift, and the quality of the simplified closed-loop architecture is very close to that of the cascaded pixel-domain architecture [16].

According to [17], cascaded pixel-domain scheme is considered as ideal transcoder since it comprises of one full decoder and one full encoder. Another benefit of this approach is that decoding is usually fast since it does not involve and predictions can be made for frames based on variable length decoding (VLD) of motion vectors from the encoded bitstream. The quality of transcoded video in turn is dependent upon the input to encoder stage. So better the input to encoding stage of transcoder, better the end video quality. This satisfies the criteria for the optimum transcoder as discussed in section 4.1.

Figure 4-5 shows the general block diagram for this proposed transcoding scheme.

Figure 4—5: General block diagram for proposed transcoding scheme

Chapter 5

Results

5.1 Quality Metrics For Cascaded Implementation

Table 5-1 akiyo_cif.y4m sequence quality metrics

Transco JM Original ded wrt Transcoder HM encoder encoder Transcoder (HM) HM Metrics wrt HM QP wrt original wrt wrt original bitrate output Type ouput metrics original metrics (kbps) bitrate metrics metrics (kbps) Y-PSNR 44.5998 43.867 44.215 43.0355 U-PSNR 46.1159 43.658 45.158 43.0486 22 1825.712 713.23 V-PSNR 47.4654 44.893 46.141 43.5127 YUV-PSNR 45.1475125 43.969125 44.573625 43.0967875 Y-PSNR 41.6746 40.994 40.87 39.837 U-PSNR 43.3141 40.282 41.416 39.9474 27 1156.44 381.94 V-PSNR 44.702 41.525 42.529 40.1852 YUV-PSNR 42.2579625 40.971375 41.145625 39.894325 Y-PSNR 38.4861 37.59 37.326 36.2346 U-PSNR 40.9056 36.726 37.559 36.9227 32 727.588 224.61 V-PSNR 42.3115 39.042 40.097 36.6982 YUV-PSNR 39.2667125 37.6635 37.7015 36.3785625 Y-PSNR 35.1775 34.296 34.19 32.5155 U-PSNR 38.9167 32.548 32.986 33.8233 37 459.048 134.95 V-PSNR 40.5028 36.876 37.923 33.8848 YUV-PSNR 36.3105625 34.4 34.506125 32.8501375

Table 5-2 city_cif.y4m sequence quality metrics

Transcod Original ed wrt Transcoder HM encoder JM encoder Transcoder (HM) HM Metrics wrt HM QP wrt original wrt original wrt original bitrate output Type ouput metrics metrics metrics (kbps) bitrate metrics (kbps) Y-PSNR 41.5384 40.092 39.625 39.7951 U-PSNR 45.044 42.399 44.579 40.9913 22 5256.828 1572.55 V-PSNR 46.709 44.115 46.248 40.7581 YUV-PSNR 42.622925 40.88325 41.072125 40.065 Y-PSNR 37.2802 36.339 35.878 35.8954 U-PSNR 42.2834 39.708 42.115 37.3842 27 3259.876 710.23 V-PSNR 44.203 41.883 44.094 37.0019 YUV-PSNR 38.77095 37.453125 37.684625 36.2198125 Y-PSNR 33.3606 32.568 32.52 32.0698 U-PSNR 40.2032 37.638 39.644 33.8711 32 1827.208 328.17 V-PSNR 42.1804 40.559 42.519 33.2926 YUV-PSNR 35.3184 34.200625 34.660375 32.4478125 Y-PSNR 29.9036 29.178 30.004 28.7262 U-PSNR 38.735 36.52 38.825 30.8487 37 913.304 153.96 V-PSNR 41.027 39.548 41.983 29.9378 YUV-PSNR 32.39795 31.392 32.604 29.1429625

Table 5-3 crew_cif.y4m sequence quality metrics

Transcoded Original HM JM Transcoder wrt HM Transcoder (HM) Metrics encoder encoder wrt HM QP output wrt original bitrate Type wrt original wrt original ouput bitrate metrics (kbps) metrics metrics metrics (kbps) Y-PSNR 43.0851 41.802 42.438 41.1578 U-PSNR 45.3917 43.468 45.554 40.8753 22 2779.76 1639.75 V-PSNR 45.3494 43.101 44.67 41.621 YUV-PSNR 43.6564625 42.172625 43.1065 41.1803875 Y-PSNR 39.7001 38.359 39.108 37.7123 U-PSNR 42.8602 40.438 42.174 37.0713 27 1629.764 775.62 V-PSNR 42.1531 39.235 40.705 38.087 YUV-PSNR 40.4017375 38.728375 39.690875 37.6790125 Y-PSNR 36.4592 34.727 35.678 34.2097 U-PSNR 40.78 37.369 38.857 33.2896 32 917.952 334.5 V-PSNR 39.7165 35.119 36.308 34.6532 YUV-PSNR 37.4064625 35.10625 36.154125 34.150125 Y-PSNR 33.3714 31.437 32.705 31.0932 U-PSNR 39.2579 35.039 36.419 30.4227 37 494.356 140.06 V-PSNR 38.0154 32.147 33.32 31.7606 YUV-PSNR 34.6877125 31.976 33.246125 31.0928125

Table 5-4 flower_cif.y4m sequence quality metrics

Transcoded Original HM JM Transcoder wrt HM Transcoder (HM) Metrics encoder encoder wrt HM QP output wrt original bitrate Type wrt original wrt original ouput bitrate metrics (kbps) metrics metrics metrics (kbps) Y-PSNR 42.8073 41.095 40.971 40.24 U-PSNR 43.6829 41.668 42.272 41.0877 22 7189.492 3839.43 V-PSNR 43.7016 41.566 42.529 38.018 YUV-PSNR 43.0285375 41.2255 41.328375 40.0682125 Y-PSNR 38.19 36.483 36.153 35.8933 U-PSNR 39.4775 36.914 37.707 36.8757 27 5152.432 2189.5 V-PSNR 39.9636 37.611 39.075 33.74 YUV-PSNR 38.5726375 36.677875 36.7125 35.7469375 Y-PSNR 33.6551 31.917 31.572 31.5062 U-PSNR 36.2778 31.384 32.761 32.7165 32 3511.9 1140.71 V-PSNR 37.2267 34.551 36.718 29.9209 YUV-PSNR 34.4293875 32.179625 32.363875 31.459325 Y-PSNR 29.306 27.72 27.595 27.5471 U-PSNR 33.9998 27.95 29.283 29.0038 37 2221.812 534.44 V-PSNR 35.4845 32.793 35.366 26.2062 YUV-PSNR 30.6650375 28.382875 28.777375 27.561575

Table 5-5 football_cif.y4m sequence quality metrics

Transcoded HM JM Original Transcoder wrt HM encoder encoder Transcoder (HM) Metrics wrt HM QP output wrt wrt wrt original bitrate Type ouput bitrate original original metrics (kbps) metrics (kbps) metrics metrics Y-PSNR 42.4278 40.93 41.233 39.8323 U-PSNR 44.5653 42.432 43.68 40.4773 22 4586.968 3714.19 V-PSNR 45.0987 43.07 44.571 40.4451 YUV-PSNR 43.02885 41.38525 41.956125 39.989525 Y-PSNR 38.5341 36.947 37.317 35.9325 U-PSNR 41.3666 38.82 40.077 36.8278 27 2954.876 2141.81 V-PSNR 42.14 39.989 41.687 36.8699 YUV-PSNR 39.3389 37.561375 38.20825 36.1615875 Y-PSNR 34.6902 32.877 33.58 32.1351 U-PSNR 38.8097 35.46 36.745 33.3498 32 1785.22 1151.46 V-PSNR 39.8703 37.557 39.434 33.5022 YUV-PSNR 35.85265 33.784875 34.707375 32.457825 Y-PSNR 31.1695 29.117 30.374 28.7168 U-PSNR 36.8086 32.296 33.361 30.0739 37 1016.188 563.92 V-PSNR 38.186 35.469 37.372 30.2616 YUV-PSNR 32.75145 30.308375 31.622125 29.0795375

5.2 Peak-Signal-To-Noise-Ratio (PSNR) versus Quantization Parameter (QP)

50 akiyo_cif.y4m

45

40

35

30 43.969125

34.4

43.0967875

40.971375 37.6635 25 39.894325

JM encoder wrt original 36.3785625

PSNR (dB) PSNR 20

32.8501375 Transcoder wrt original 15 10 5 0 1 2 3 4 QP

Figure 5—1: PSNR (dB) versus QP for akiyo_cif.y4m

45 city_cif.y4m

40

35

40.065

30

40.88325

25 37.453125

31.392 36.2198125

20 34.200625 JM encoder wrt original

32.4478125 PSNR (dB) PSNR

15 29.1429625 Transcoder wrt original 10 5 0 1 2 3 4 QP

Figure 5—2: PSNR (dB) versus QP for city_cif.y4m

45 crew_cif.y4m

40

35

30

42.172625 41.1803875

25 38.728375

35.10625

31.976 37.6790125

20 34.150125 JM encoder wrt original PSNR (dB) PSNR 15 31.0928125 Transcoder wrt original 10 5 0 1 2 3 4 QP

Figure 5—3: PSNR (dB) versus QP for crew_cif.y4m

45 flower_cif.y4m

40

35

41.2255

30

25 40.0682125 36.677875

20 35.7469375 JM encoder wrt original

32.179625

31.459325

PSNR (dB) PSNR 28.382875

15 27.561575 Transcoder wrt original 10 5 0 1 2 3 4 QP

Figure 5—4: PSNR (dB) versus QP for flower_cif.y4m

45 football_cif.y4m

40

35

41.38525

30

39.989525

25 37.561375 36.1615875 33.784875 JM encoder wrt original

20 32.457825

PSNR (dB) PSNR 30.308375

15 29.0795375 Transcoder wrt original 10 5 0 1 2 3 4 QP

Figure 5—5: PSNR (dB) versus QP for football_cif.y4m

5.3 Bitrate versus Quantization Parameter

800 akiyo_cif.y4m

700

600

713.23

500 658.42

400 JM encoder wrt original

300

Bitrate(kbps) Transcoder wrt HM

356.9

381.94

200 reconstructed

100

224.61

209.37

127.46 134.95 0 1 2 3 4 QP

Figure 5—6: Bitrate (kbps) versus QP for akiyo_cif.y4m

1800 city_cif.y4m

1600

1400

1200 1572.55 1000 JM encoder wrt original

800 1286.22

Bitrate(kbps) 600 Transcoder wrt HM

reconstructed

400 710.23

200 579.94

275.12 328.17 134.74 153.96 0 1 2 3 4 QP

Figure 5—7: Bitrate (kbps) versus QP for city_cif.y4m

1800 crew_cif.y4m

1600

1400

1639.75 1200 1618.32 1000 JM encoder wrt original

800

Bitrate(kbps) 600 Transcoder wrt HM

784.16 reconstructed

775.62

400

200

350.05 150.88 140.06 334.5 0 1 2 3 4 QP

Figure 5—8: Bitrate (kbps) versus QP for crew_cif.y4m

4500 flower_cif.y4m

4000

3500

3000 3839.43

2500 3627.96 JM encoder wrt original

2000

Bitrate(kbps) 1500 Transcoder wrt HM 2189.5

reconstructed

1000 2012.26

500

1140.71

993.01 456.1 534.44 0 1 2 3 4 QP

Figure 5—9: Bitrate (kbps) versus QP for flower_cif.y4m

4000 football_cif.y4m

3500

3000

3721.92

3714.19

2500

2000 JM encoder wrt original

1500 Bitrate(kbps)

2191.82 Transcoder wrt HM

2141.81

1000 reconstructed

500

1211.88

1151.46

616.04 563.92 0 1 2 3 4 QP

Figure 5—10: Bitrate (kbps) versus QP for football_cif.y4m

5.4 Rate Distortion (R-D) Plot

akiyo_cif.y4m 46

44

42

40 JM encoder wrt original

PSNR (dB) PSNR 38 Transcoder wrt original 36

34

32 0 200 400 600 800 Bitrate (kbps)

Figure 5—11: R-D plot for akiyo_cif.y4m

43 city_cif.y4m 41 39

37 35 JM encoder wrt original

PSNR (dB) PSNR 33 Transcoder wrt original 31 29 27 0 500 1000 1500 2000 Bitrate (kbps)

Figure 5—12: R-D plot for city_cif.y4m

44 crew_cif.y4m

42

40

38 JM encoder wrt original 36 Transcoder wrt original

34

32

30 0 500 1000 1500 2000

Figure 5—13: R-D plot for crew_cif.y4m

43 flower_cif.y4m 41 39

37

35 JM encoder wrt

33 original PSNR (dB) PSNR 31 Transcoder wrt original 29 27 25 0 1000 2000 3000 4000 5000 Bitrate (kbps)

Figure 5—14: R-D plot for flower_cif.y4m

43 football_cif.y4m 41 39

37

35 JM encoder wrt

33 original PSNR (dB) PSNR 31 Transcoder wrt original 29 27 25 0 1000 2000 3000 4000 Bitrate (kbps)

Figure 5—15: R-D plot for football_cif.y4m

Chapter 6

Conclusion and Future Work

The objective of thesis is to implement a transcoding scheme that would make it possible for a device with H.264 support to play H.265 encoded bitstreams. It can be verified from the results in chapter 5 that the main purpose of any optimal transcoder being able to implement transcoding and get similar quality video has been met.

As expected, PSNR has decreased while bitrate has increased in case of transcoder when compared with that of JM encoder directly working on original raw video. This is due the fact that re-encoding was on a reconstructed video which already had some deviations from that of original video since it was processed through HM encoder and decoder. The quality of the video depended upon how well the HM encoded video was decoded before giving it as input to transcoder.

Time complexity of this implementation is high since full decoding followed by encoding is implemented. Motion estimation contributes to most of the time spent in re encoding phase. Various optimization techniques can be implemented to take care of this constraint.

This thesis was based on format conversion from one standard to another. In coming years, we will have devices capable of HEVC playback, so one other area of transcoding that can be explored is spatial reduction resolution of HEVC bitstream for use in mobile displays.

HEVC supports 35 intra prediction modes, these can be mapped to one of 9 intra prediction modes in AVC, thereby avoiding the need to decode the video all the way down to spatial domain and re-encoding.

APPENDIX A

Test Sequences

48

A 1: akiyo_cif.y4m

A 2: city_cif.y4m

49

A 3: crew_cif.y4m

A 4:flower_cif.y4m

50

A 5:football_cif.y4m

51

APPENDIX B

Test Conditions

52

The code revision of reference software for HEVC encoder and decoder i.e., HM used for this research is HM 16.7 [41]

The code revision of reference software for H.264 encoder and decoder i.e., JM used for this research is JM 19.0 [42]

All the work was done on a system with following configuration:

 Operating System: Windows 10 Home Edition

 Processor: Intel(R) Core(TM) i7-3537U @ 2.00GHz 2.50GHz

 RAM: 8.00 GB

 System type: 64-bit Operating System, x64-based processor

Test Environment

H.265 encoded streams were generated by using reference HEVC encoder configured to work with main profile and in intra mode. Quantization parameter was incremented in steps of 5 ranging from 22 to 37 for all the test sequences and measurements like PSNR, encoding time, bitrate were recorded. Similarly H.264 encoder was used with high profile. A total of 60 frames at 30 frames per second were encoded in both the cases.

53

APPENDIX C

Acronyms

54

AMVP – Advance Motion Vector Prediction

AVC - Advanced Video Coding

CABAC – Context Adaptive Binary

CAVLC – Context Adaptive Variable Length Coding

CB – Coding Block

CIF – Common Intermediate Format

CPB – Coded Picture Buffer

CTB – Coding Tree Block

CTU – Coded Picture Buffer

CU – Coding Unit dB – decibel

DCT – Discrete Cosine Transform

DPB – Decoded Picture Block

DST – Discrete Sine Tranform

FPS – Frames Per Second

FRExts – Fidelity Range Extesions

HD – High Definition

HEVC – High Efficiency Video Coding

HHR – Half Horizontal Resolution

IDCT – Inverse Discrete Cosine Transform

ISO – International Standards Organization

ITU-T– International Telecommunication Union

JCT-VC - Joint Collaborative Team on Video Coding

JVT - Joint video team kbps – kilo bits per second

55

MB - Macroblock

MC – Motion Compensation

MPEG – Moving Pictures Experts Group

MV – Motion Vector

NAL – Network Abstraction Layer

PB – Prediction Block

PSNR – Peak-Signal-to-Noise-Ration

PU – Prediction Unit

QCIF – Quarter

QP – Quantization Parameter

SAO – Sample Adaptive Offset

SD – Standard Definition

SEI - Supplemental Enhancement Information

SI – Switched Intra

SP – Switched Predictive

TB – Transform Block

TU – Transform Unit

VLC – Variable Length Coding

VLD – Variable Length Decoding

VPS – Video Parameter Set

VUI – Video Usability Information

WPP – Wavefront Parallel Processing

56

REFERENCES

[1] Iain E. G. Richardson, “Video Design, Developing Image and Video

Compression Systems”, Wiley, 2002.

[2] Jayesh Dubhashi, “Complexity Reduction of Motion Estimation in HEVC”, M. S.

Thesis, EE Department, UTA, Dec. 2014

[3] K. R. Rao, Do Nyeon Kim, Jae Jeong Hwang, “Video coding standards: AVS

China, H.264/MPEG-4 PART 10, HEVC, VP6, DIRAC and VC-1”, Springer, 2014.

[4] ITU-T Recommendation H.264 – Advanced Video Coding for Generic Audio-

Visual services.

[5] Joint Video Team (JVT), ITU-T website

http://www.itu.int/en/ITU-T/studygroups/com16/video/Pages/jvt.aspx

[6] S. Kwon, A. Tamhankar, and K. R. Rao, "Overview of the H.264/MPEG-4 part

10," Journal of Visual Communication and Image Representation, vol. 17, is. 9,

pp. 186-215, April 2006.

[7] T. Wiegand and G. J. Sullivan, “The H.264 video coding standard”, IEEE Signal

Processing Magazine, vol. 24, pp. 148-153, March 2007.

[8] K. R. Rao and J. J. Hwang, “Techniques and standards for image/video/audio

coding”, Prentice-Hall, 1996.

[9] A. Puri et al, “Video Coding using the H.264/ MPEG-4 AVC compression

standard”, Signal Processing: Image Communication, vol. 19, pp: 793 – 849, Oct.

2004.

[10] D. Marpe and T. Wiegand, “H.264/MPEG4-AVC Fidelity Range Extensions:

Tools, Profiles, Performance, and Application Areas”, Proc. IEEE International

Conference on Image Processing 2005, vol. 1, pp. I - 596, 11-14 Sept. 2005.

57

[11] J. Ostermann et al, “Video coding with H.264/AVC: Tools, Performance, and

Complexity”, IEEE Circuits and Systems Magazine, vol. 4, Issue 1, pp. 7 – 28,

First Quarter 2004.

[12] Iain E. G. Richardson, “H.264 and MPEG-4 Video Compression, Video Coding

for Next-generation Multimedia”, Wiley, 2003.

[13] G. J. Sullivan et al, “Overview of the High Efficiency Video Coding (HEVC)

standard”, IEEE Transactions on Circuits and Systems for Video Technology,

Vol. 22, No. 12, pp. 1649-1668, Dec. 2012.

[14] H. Samet, “The quadtree and related hierarchical data structures,” Comput.

Survey, vol. 16, no. 2, pp. 187–260, Jun. 1984.

[15] J.Xin, C.W.Lin, M.T.Sun , “ Transcoding” , Proceedings of the IEEE,

Vol. 93, Issue 1,pp 84-97, January 2005.

[16] A.Vetro, C.Christopoulos and H.Sun, “Video transcoding architectures and

techniques: an overview”, IEEE Signal Processing magazine, Vol. 20, Issue 2, pp

18 - 29, March 2003.

[17] P. Assunçno and M. Ghanbari, “Post-processing of MPEG-2 coded video for

transmission at lower bit-rates,” in Proc. IEEE Int. Conf. Acoustics, Speech and

Signal Processing, Atlanta, GA, 1996, pp. 1998-2001.

[18] J.L. Wu, S.J. Huang, Y.M. Huang, C.T. Hsu, and J. Shiu, “An efficient JPEG to

MPEG-1 transcoding algorithm,” IEEE Trans. Consumer Electron., vol. 42, pp.

447-457, Aug. 1996.

[19] N. Memon and R. Rodilia, “Transcoding GIF images to JPEG-LS,” IEEE Trans.

Consumer Electron., vol. 43, pp. 423-429, Aug. 1997.

[20] N. Feamster and S. Wee, “An MPEG-2 to H.263 transcoder,” in Proc. SPIE Conf.

Voice, Video Data Communications, Boston, MA, Sept. 1999.

58

[21] H. Kato, H. Yanagihara, Y. Nakajima, and Y. Hatori, “A fast motion estimation

algorithm for DV to MPEG-2 conversion,” in Proc. IEEE Int. Conf. Consumer

Electronics, Los Angeles, CA, June 2002, pp. 140-141.

[22] W. Lin, D. Bushmitch, R. Mudumbai, and Y. Wang, “Design and implementation

of a high-quality DV50-MPEG2 software transcoder,” in Proc. IEEE Int. Conf.

Consumer Electronics, Los Angeles, CA, June 2002, pp. 142-143.

[23] S.F. Chang and D.G. Messerschmidt, “Manipulation and compositing of MC-DCT

compressed video,” IEEE J. Select. Areas Commun., vol. 13, pp. 1-11, Jan.

1995.

[24] H. Sun, A. Vetro, J. Bao, and T. Poon, “A new approach for memory-efficient

ATV decoding,” IEEE Trans. Consumer Electron., vol. 43, pp. 517-525, Aug.

1997.

[25] J. Wang and S. Yu, “Dynamic rate scaling of coded digital video for IVOD

applications,” IEEE Trans. Consumer Electron., vol. 44, pp. 743-749, Aug. 1998.

[26] P. Assunção and M. Ghanbari, “A frequency-domain video transcoder for

dynamic bit-rate reduction of MPEG-2 bitstreams,” IEEE Trans. Circuits Syst.

Video Technol., vol. 8, pp. 953-967, Dec. 1998.

[27] Sreejana Sharma, “Transcoding of H.264 bitstream to MPEG-2 bitstream”, M. S.

Thesis, EE Department, UT Arlington, May 2007.

[28] HEVC and VP9 video codecs - try them yourself:

http://vcodex.blogspot.com/2014/05/hevc-and-vp9-video-codecs-try-them.

[29] HEVC resources:

http://www.vcodex.com/h265.html

[30] How to use HM software:

https://codesequoia.wordpress.com/2012/11/04/make-your-first-hevc-stream/

59

http://kohtoomaung.blogspot.com/p/blog-page_10.html

[31] Exercise on Running :

http://www.cs.tut.fi/~vc/Exercises/Exercise%201.

[32] H.264 reference software:

http://iphome.hhi.de/suehring/tml/

[33] YUV video sequences:

http://trace.eas.asu.edu/yuv/

[34] HEVC wavefront parallel processing :

http://www.parabolaresearch.com/blog/2013-12-01-hevc-wavefront-

animation.html

[35] HEVC and its extensions:

https://www.itu.int/en/ITU-T/C-

I/interop/13022015/Documents/Presentations/Keynote_P2-Gary-Sullivan.pdf

[36] HEVC walkthrough by vcodex:

https://vimeo.com/65819014

[37] HEVC Analyzers:

http://vcodex.blogspot.com/2014/08/hevc-analysers.html

[38] Relax its only HEVC:

http://www.worldbroadcastingunions.org/wbuarea/library/docs/isog/presentations/

2012B/2.4%20Bross%20HHI.pdf

[39] HEVC reference software:

https://hevc.hhi.fraunhofer.de/

[40] Test sequences:

https://media.xiph.org/video/derf/

[41] HM_16.7 location:

60

https://hevc.hhi.fraunhofer.de/svn/svn_HEVCSoftware/tags/HM-16.7+SCM-5.4/

[42] JM_19.0 location:

http://iphome.hhi.de/suehring/tml/download/

61

BIOGRAPHICAL INFORMATION

Deepak Hingole was born in Nanded, Maharashtra, India in 1989. After completing his schooling at Yeshwant Mahavidyalaya Nanded in 2006, he went on to obtain his Bachelor of Technology in Electronics Engineering from Veermata Jijabai

Technological Institute (VJTI), Mumbai (An Autonomous Institute Affiliated to University of

Mumbai) in 2011.

He worked as Software Engineer in HCL Technologies Limited, Bangalore, India from 2011 to 2013. In Fall 2013, Deepak enrolled at University of Texas at Arlington to pursue his Master of Science in Electrical Engineering.

Deepak’s main areas of interest are Video Processing and Embedded Systems.

While attending the university, Deepak joined Multimedia Processing Lab, which is supervised by Dr. K. R. Rao.

Deepak worked as Video Systems Research Intern in Adobe Systems

Incorporated in Summer 2015. He had also assisted Dr. Rao as Graduate Teaching

Assistant (GTA) for Fall 2014, Spring 2015 and Fall 2015 for courses on

Processing and Discrete Transforms.

62