COMPUTATIONALLY EFFICIENT BASIC UNIT RATE CONTROL FOR H.264/AVC

Thesis

Submitted to

The School of Engineering of the

UNIVERSITY OF DAYTON

In Partial Fulfillment of the Requirements for

The Degree of

Master of Science in Electrical and Computer Engineering

By

Tanner Ryan Adams

UNIVERSITY OF DAYTON

Dayton, Ohio

December, 2013 COMPUTATIONALLY EFFICIENT BASIC UNIT RATE CONTROL FOR H.264/AVC

Name: Adams, Tanner Ryan

APPROVED BY:

Eric J. Balster, Ph.D. Frank A. Scarpino, Ph.D. Advisor Committee Chairman Committee Member Assistant Professor, Department of Professor Emeritus, Department of Computer and Electrical Engineering Computer and Electrical Engineering

Raul Ordonez, Ph.D. Committee Member Professor, Department of Computer and Electrical Engineering

John G. Weber, Ph.D. Tony E. Saliba, Ph.D. Associate Dean Dean, School of Engineering School of Engineering & Wilke Distinguished Professor

ii c Copyright by

Tanner Ryan Adams

All rights reserved

2013 ABSTRACT

COMPUTATIONALLY EFFICIENT BASIC UNIT RATE CONTROL FOR H.264/AVC

Name: Adams, Tanner Ryan University of Dayton

Advisor: Dr. Eric J. Balster

Video compression has come a long way in the past 15 years. Due to the rise in bandwidth usage, HD , and other digital media, new and improved methods are required to enable ease of transfer/storage of digital video content. H.264 is the latest and most complex advance video coding standard to date. The complexity is caused from H.264’s vast coding options when it comes to encoding possibilities. While complex, the large number of coding options are what make H.264 perform better than any other standard. Because of these options, compressed video files can vary greatly in the amount of bits used. To solve this dilemma, rate control schemes have been introduced to control data throughput. This thesis presents a method for a computationally efficient rate control algorithm. This algorithm has a low complexity which, in turn, facilitates hardware implementation.

iii For my Parents and Grandparents

iv ACKNOWLEDGMENTS

I would like to thank my friends and family for their support throughout my entire life. I would also like to thank the following people for making my graduate school experience possible:

• Thank you Chris McGuinness and Marc Hoffman for helping me to better understand my

work and for putting up with my constant questions.

• Thank you to the entire ADDA lab for all of your advice during graduate school.

• Thank you to Kerry Hill, Al Scarpelli, and the Air Force Research Laboratory for making this

experience possible.

• Thank you to Dr. Frank Scarpino and Dr. Raul Ordonez for serving on my thesis committee.

• Special thanks to Dr. Eric Balster for being one of greatest professors and advisors I have

ever had.

v TABLE OF CONTENTS

ABSTRACT ...... iii

DEDICATION ...... iv

ACKNOWLEDGMENTS ...... v

LIST OF FIGURES ...... viii

LIST OF TABLES ...... xi

I. Introduction ...... 1

1.1 Overview of Video Coding ...... 1 1.2 H.264 Coding Standard ...... 1 1.2.1 MPEG-2 Visual ...... 2 1.2.2 H.263 ...... 2 1.2.3 MPEG-4 Visual ...... 3 1.3 H.264 Syntax ...... 3 1.4 H.264 Process ...... 4 1.5 H.264 Prediction ...... 5 1.5.1 Intra Prediction ...... 6 1.5.2 Inter Prediction ...... 8 1.6 Transform, Quantization and Coding ...... 10 1.6.1 Transform ...... 10 1.6.2 DC Transform ...... 11 1.6.3 Quantization ...... 12 1.7 Coding ...... 13 1.7.1 Exp-Golomb Coding ...... 13 1.7.2 Context Adaptive Variable Length Coding, CAVLC ...... 15 1.7.3 Context-based Adaptive Binary , CABAC ...... 15 1.8 Profiles ...... 16 1.9 Rate Control ...... 16 1.10 Motive and Organization ...... 17 vi II. Proposed Rate Control Algorithm ...... 18

2.1 Introduction ...... 18 2.2 Baseline MB Level Rate Control ...... 18 2.3 Proposed BU Rate Control Scheme ...... 21 2.4 Experimental Results ...... 26 2.5 Conclusion ...... 30

III. Intra Prediction ...... 31

3.1 Overview of Intra Prediction ...... 31 3.2 Experimental Results ...... 33

IV. Conclusion and Future Work ...... 37

4.1 Conclusions ...... 37 4.2 Future Work ...... 37

BIBLIOGRAPHY ...... 39

APPENDICES

A. Testing Data for α and β ...... 41

B. Proposed Rate Control Testing ...... 49

C. Intra Prediction Mode Data ...... 56

vii LIST OF FIGURES

1.1 Levels of H.264 ...... 3

1.2 ...... 4

1.3 H.264 encoding process ...... 5

1.4 types and sources ...... 7

1.5 Spatial prediction order ...... 7

1.6 Intra and inter prediction flow diagram ...... 8

1.7 Partition sizes of ...... 9

1.8 Predicting from past and future frames ...... 9

1.9 Forward transform and quantization for Luma and Chroma ...... 10

1.10 Residual block coding order, 4:2:0 sampling ...... 12

1.11 Bitstream organization ...... 14

1.12 Profiles for H.264 ...... 17

2.1 Design for Baseline MB level rate control ...... 19

2.2 Flow Char for Baseline MB Level Rate Control ...... 20

2.3 Design for Baseline BU level rate control ...... 22

2.4 Design for BU level rate control ...... 23

viii 2.5 Testing for Foreman at 2 Mbps ...... 25

2.6 Testing for Stockholm at 15 Mbps ...... 25

2.7 Arithmetic operations for both systems ...... 26

2.8 Foreman test sequence at 2 Mbps ...... 27

2.9 Flower test sequence at 7.5 Mbps ...... 28

2.10 Stockholm test sequence at 15 Mbps ...... 29

3.1 Available blocks for intra prediction ...... 31

3.2 Prediction modes for 8x8 and 4x4 intra prediction ...... 33

3.3 Encoding time for first three modes against all nine modes for Foreman ...... 34

3.4 PSNR for first three modes against all nine modes for Foreman ...... 34

3.5 Encoding time for first three modes against all nine modes for Flower ...... 35

3.6 PSNR for first three modes against all nine modes for Flower ...... 36

A.1 Testing for Foreman at 1 Mbps ...... 42

A.2 Testing for Foreman at 3 Mbps ...... 43

A.3 Testing for Flower at 5 Mbps ...... 44

A.4 Testing for Flower at 7.5 Mbps ...... 45

A.5 Testing for Flower at 10 Mbps ...... 46

A.6 Testing for Stockholm at 10 Mbps ...... 47

A.7 Testing for Stockholm at 20 Mbps ...... 48

B.1 Testing for Foreman at 1 Mbps ...... 50

B.2 Testing for Foreman at 3 Mbps ...... 51

ix B.3 Testing for Flower at 5 Mbps ...... 52

B.4 Testing for Flower at 10 Mbps ...... 53

B.5 Testing for Stockholm at 10 Mbps ...... 54

B.6 Testing for Stockholm at 20 Mbps ...... 55

C.1 Encoding time for first three modes against all nine modes for Foreman ...... 57

C.2 PSNR for first three modes against all nine modes for Foreman ...... 58

C.3 Encoding time for first three modes against all nine modes for Flower ...... 59

C.4 PSNR for first three modes against all nine modes for Flower ...... 60

x LIST OF TABLES

1.1 Intra Prediction Modes ...... 6

1.2 Exp-Golomb Coding for code num ...... 14

2.1 Arithmetic Complexity Calculations ...... 26

2.2 Rate Distortion Comparison ...... 28

2.3 Comparison ...... 29

3.1 Intra Prediction Possibilities ...... 32

xi CHAPTER I

Introduction

1.1 Overview of Video Coding

Digital media has undergone some extensive changes and continues to grow to this day. With these changes, came the rise of digital video signals. The main drivers for the shift towards dig- ital video include commercial factors, legislation, social changes, and technological advances [1].

Technology like HD television, internet streaming, video conferencing, and cell phones all deal with digital but each different application space has different requirements in bandwidth, power, and processing capability. These technologies created the demand for video coding which we see today.

Video coding is the process of compressing a video stream, transmitting or storing that stream then decompressing the transmitted stream at the receiver. Video coding is useful because it reduces the amount of data that needs to be transmitted or stored thus using less bandwidth or storage [1].

1.2 H.264 Coding Standard

Video coding has many different applications that stretch across many platforms. For that rea- son, standardized video coding formats have been introduced because of the potential benefits such as [1]:

1 • Encoders and decoders from different manufacturers wont have any compatibility issues.

• Platforms can be built such that many different applications such as video and audio

codecs interact consistently.

• Techniques and algorithms used in a specific video coding standard are well defined and less

likely to fall under patent infringement.

The H.264 video coding standard was jointly published by the International Telecommunications

Union (ITU) and the International Standards Organisation (ISO) in 2003. Other names for the video coding standard are MPEG-4 Part 10 and Advance Video Coding (AVC). H.264 is able to produce better results than any other standard before it, which means that it can compress video into a smaller space which in turn takes up less storage or bandwidth[1]. For completeness, we briefly describe some video coding standards that pre-date H.264 namely, MPEG-2 Visual, H.263 and MPEG-4

Visual.

1.2.1 MPEG-2 Visual

MPEG-2 Visual was standardized in 1994 and is mainly used in digital television and HDTV.

MPEG-2 is designed to be a superset of MPEG-1, using the same coding techniques with a few added features for betting compression. Some of these features include prediction modes for in- terlaced video and scalable video coding. For additional information on MPEG-2 Visual, refer to

[2]

1.2.2 H.263

H.263 was standardized in 1996 and later updated to H.263v2 (H.263+). The luminance is sampled at these given resolutions while the chrominance is downsampled by two in the horizontal

2 Figure 1.1: Levels of H.264

and vertical direction. H.263+ was introduced in 1998 and extended the coding capabilities by the addition of 12 optional coding modes. For more information on H.263, refer to [3]

1.2.3 MPEG-4 Visual

MPEG-4 Visual was standardized in 1999 and uses a discrete cosine transform and is loosely based on H.263. Since its release it has gone through two additional revisions. MPEG-4 Visual implements the MPEG-4 Systems Description Language (MSDL) which is an attempt to standardize video ”tools” or fully defined algorithms. For more information on MPEG-4 Visual, refer to [4].

1.3 H.264 Syntax

Due to the complexity of H.264, a general overview of terms are discussed. Figure 1.1 shows how a given frame is broken down into sections. A consecutive number of frames is called a group of pictures (GOP). The size of the GOP i.e. how many frames are contained in a GOP is user defined and is usually between 10-100. A GOP is broken down into individual frames. A frame can then be separated into one or more slices. A slice is comprised of one or more macroblocks (MB). A MB 3 Figure 1.2: Motion estimation

is a 16x16-pixel region of a frame and it is the fundamental unit of compression within the H.264 standard[1]. MB’s are used for motion estimation and .

Motion estimation involves finding another 16x16-pixel region in a previously encoded frame that closely resembles the current MB region. This process is demonstrated in Figure 1.2. The difference in the current MB location and best matched MB location is called a motion vector. Every MB that is predicted from a previously coded MB, in a different frame, will have a motion vector associated with it. Motion compensation is the process of taking the matched region and subtracting it from the current MB. This forms a residual block which is then encoded and transmitted along with the motion vector.

1.4 H.264 Process

The H.264 encoding process can be categorized into four different processes which produce the bitstream. These processes are prediction, transformation and quantization, and .

The H.264 encoding processes are shown in Figure1.3.

4 Figure 1.3: H.264 encoding process

Prediction is the process of reducing the entropy by using the similarities between spacial and/or temporal image samples. Transformation is the process of further spatial decorrelation of residual pixels by representing them in a different domain. The output samples of the transformation pro- cess are referred to as coefficients. The resulting coefficients are then quantized which removes insignificant values and leaves only a smaller representation of the transform coefficients, effec- tively compressing the data. The final process, entropy encoding, converts the data into a H.264 bitstream. Once the bitstream has been created it is then transmitted or stored. The inverse of each process is then performed to decode the bitstream. In general, the decoded video sequence is not the same as the original video sequence due to quantization error[1].

1.5 H.264 Prediction

H.264 has three possible prediction sources for a macroblock, namely I Macroblock, P Mac- roblock, and B Macroblock. I Macroblocks (I MB) predict pixel values using intra prediction from samples in the same frame. A P Macroblock (P MB) predicts pixel values from a previously coded frame which may come before or after the current frame in display order. A B Macroblock (B MB)

5 predicts pixel values from one or two previously coded frame which means one ’past’ and one ’fu- ture’ frame. Figure 1.4 shows the sources for the three prediction types [1]. A frame can contain multiple slices in which each slice can be a I, P or B slice. Within those slices you can have multiple prediction types for a MB depending on the declared slice type.

1.5.1 Intra Prediction

Intra prediction relies on spatial correlation to make a prediction in the current MB. Figure 1.5 shows what previously encoded MB’s are used for the spatial prediction with X being the current

4x4 or 8x8 partition. If X is a 16x16 MB, D is not used for prediction. The I MB can either be partitioned into 4x4, 8x8 or 16x16 samples for prediction. Both 4x4 and 8x8 partitions have nine different prediction modes while 16x16 blocks only have four modes which are defined in Table

1.1. This gives 4x4 and 8x8 blocks a better accuracy over 16x16 blocks but at the cost of encoding more bits.

Table 1.1: Intra Prediction Modes 8x8 and 4x4 16x16 Mode 0 (Vertical) Mode 0 (Vertical) Mode 1 (Horizontal) Mode 1 (Horizontal) Mode 2 (DC) Mode 2 (DC) Mode 3 (Diagonal Down-Left) Mode 3 (Plane) Mode 4 (Diagonal Down-Right) Mode 5 (Vertical-Right) Mode 6 (Horizontal-Down) Mode 7 (Vertical-Left) Mode 8 (Horizontal-Up)

6 Figure 1.4: Macroblock types and sources

Figure 1.5: Spatial prediction order

7 Figure 1.6: Intra and inter prediction flow diagram

1.5.2 Inter Prediction

Inter prediction is the process of predicting pixel values from a frame that has already been encoded. These previously encoded frames are stored for future reference when using inter predic- tion. Inter prediction is achieved by having a prediction region, generating a prediction block and subtracting it from the original to form a residual. This process is shown in Figure 1.6 The offset between the current MB and the predicted MB is then used as the motion vector. Both the predicted samples and motion vectors are then encoded and transmitted/stored. The current MB can then be split into four partitions. Within those four partitions, each can then be separated into three smaller partitions which is shown in Figure 1.7. The numbers within the partitions represent the order in which they are encoded. Each partition has a motion vector that is associated with it. B MB’s have two motion vectors associated with them due to the fact that it predicts from two reference frames.

If one reference frame is a past frame and one is a future frame then the motion vectors are predicted from neighboring motion vectors that have the same temporal direction. However, if both reference frames are past or future frames, then one of the motion vectors is encoded as a scaled motion vector

[1]. Figure 1.8 shows the process of prediction motion vectors using past and future frames.

8 Figure 1.7: Partition sizes of macroblocks

Figure 1.8: Predicting from past and future frames

9 Figure 1.9: Forward transform and quantization for Luma and Chroma

1.6 Transform, Quantization and Coding

All predictive processing modes are lossless, meaning they can be reversed with no loss of infor- mation or quality. The following section covers details on how the prediction data is transformed, quantized and coded to produce the compressed data. Quantized pixel data is then converted to a bitstream using one of two techniques i.e. context adaptive variable length coding (CAVLC) or context adaptive binary arithmetic coding (CABAC) [1].

1.6.1 Transform

Figure 1.9 gives and overview of the steps taken to transform and quantize luma and chroma blocks. Transforms are used for further reduction in entropy to facilitate compression. H.264 utilizes the Discrete Cosine Transform (DCT) which is a applied to 4 pixel by 4 pixel subsections or blocks of each MB. The DCT of a sample block is given by Equation 1.1.

Y = AXAT , (1.1) where X is the residual data, Y is the coefficient matrix, and

 a a a a  1 a = 2  b c −c −b  q A =   , b = 1 cos π = 0.6532...  a −a −a a  2 8   q 1 3π c −b b −c c = 2 cos 8 = 0.2706...

10 Equation 1.1 requires an approximation of irrational number namely, b and c. Due to this, each row of A is scaled and rounded to the nearest integer producing  1 1 1 1     2 1 −1 −2  C =   . (1.2)  1 −1 −1 1  1 −2 2 −1 After some simplification the integer based transform used in H.264 is,

Y = [CXCT ]S (1.3) where

 1 √1 1 √1  4 2 10 4 2 10  √1 1 √1 1   10 10  S =  2 10 2 10  .  1 √1 1 √1   4 2 10 4 2 10  √1 1 √1 1 2 10 10 2 10 10

1.6.2 DC Transform

DC coefficients of a MB are further transformed when the selected MB is a 16x16 Intra predic- tion or when the chroma data is being transformed. Figure 1.10 shows the process and encoding order for a 16x16 Intra predict block for luma and chroma. DC represents the ’low frequency’ po- sition at the (0,0) coefficient while AC coefficients represents the remaining quantized transformed coefficients. The -1 for the DC coefficients is optional depending on what type and size the current

MB is. Once the 4x4 block is created, it is transformed using the Hadamard transform, shown in

Equation 1.4, while the 2x2 chroma DC transform is shown in Equation 1.5.  1 1 1 1   1 1 1 1  1      1 1 −1 −1   1 1 −1 −1  Y4 =   [W4]   (1.4) 2  1 −1 −1 1   1 −1 −1 1  1 −1 1 −1 1 −1 1 −1 " # " #! 1 1 1 1 1 Y = [W ] (1.5) 2 2 1 −1 2 1 −1 11 Figure 1.10: Residual block coding order, 4:2:0 sampling

where W are the DC coefficients of luma or chroma and Y is the output after transformation [1].

1.6.3 Quantization

Quantization is defined as: |X| Y = sign(x) (1.6) QP where X represents the signal to be quantized, Y represents the sample signal but with a reduced range of values and QP is the quantization parameter. The higher the value of QP , the larger the change will be from the original value to final value. The values of QP can range from 1 to 51 where 1 represents low compression but high quality and 51 represents high compression but low quality. The purpose of quantizing a signal with values X is to reduce the range of values to Y. This makes it possible to represent Y with fewer bits since it uses a smaller range of values. H.264 uses scalar quantizing which maps one sample of the input to one sample of the output[1].

12 1.7 Coding

Figure 1.11 shows the elements required to create a valid H.264 bitstream. The Network Ab- straction Layer (NAL) is the upper most hierarchical element and handles most control parameters.

The Sequence Parameter Sets (SPS) and Picture Parameter Set (PPS) are part of the NAL that con- trol certain parameters of the coding process. The next level of bits is called the slice layer which handles some parameters associated with that slice and the slice data which beings with a Instan- taneous Decoder Refresh (IDR) Access Unit. The IDR will always happen before an I-Frame and signifies that no frame previous to the current I-Frame will be used as a reference. The slice data is comprised of MB data which is made up of Intra or Inter prediction data and then the compressed data which has been transformed,quantized and coded [1].

Along with the hierarchical organization, H.264 has multiple coding techniques used to create an H.264 bitstream. These methods are

• Fixed length coding

• Exponential-Golomb variable length coding

• CAVLC

• CABAC

1.7.1 Exp-Golomb Coding

Exp-Golomb coding involves assigning varying bit lengths depending on how regular a given pattern is. For common patterns it assigns small binary codes while uncommon patterns are assigned larger binary codes. This technique is what causes the data to be expressed in a compressed form

[1]. Table 1.2 gives an example of Exp-Golomb coding using the variable code num. Exp-Golomb

13 Figure 1.11: Bitstream organization

Table 1.2: Exp-Golomb Coding for code num

code num Codeword 0 1 1 010 2 011 3 00100 4 00101 5 00110 6 00111 7 0001000 8 0001001 ......

14 codewords will always have the following structure [1]:

Cw = (Zp1Inf ). (1.7)

The codenum consists of a Zp of M zeros, where M is 0 or a positive integer and is calculated from

Equation1.8, followed by a 1 and ending in a M-bit information field calculated using Equation 1.9.

Exp-Golomb codes are constructed logically and can be decoded algorithmically without look-up

tables where [1]

M = blog2[code num + 1]c, (1.8) and

M Inf = code num + 1 − 2 . (1.9)

1.7.2 Context Adaptive Variable Length Coding, CAVLC

The primary role of CAVLC is to encode residual data using variable length codes (VLCs). A block of coefficients is scanned using zigzag or field scan and converted into a series of VLC’s. The correct VLC table is chosen by looking at the statistics of neighboring previously encoded MB’s i.e. the number of non-zero coefficients in neighboring blocks and the magnitude of recently-coded coefficients. This is what makes CAVLC, context adaptive. For more information on CAVLC, refer to [5].

1.7.3 Context-based Adaptive Binary Arithmetic Coding, CABAC

CABAC is an optional encoder that uses entropy coding to produce valid H.264 bitstream. With

CABAC, a syntax element is converted to a series of bits, each of which corresponds to a single binary decision. The statistics of already encoded syntax elements are then used to update probabil- ity models. CABAC is more complicated than CAVLC but it does provide better compression. For more information on CABAC, refer to [6].

15 1.8 Profiles

Due to different computational demands, H.264 provides different profiles. Profiles place limi- tations on the algorithmic capabilities of the decoder. H.264 defines three profiles; baseline, main, and extended. Constrained baseline, a subset of baseline, is also defined. The baseline profile is used for low-delay video applications such as video conferencing. The main profile is used in standard- definition digital television. The extended profile is used for video streaming applications. Figure

1.12 shows an example of H.264 profiles and their capabilities. For more information regarding

H.264, refer to [7].

1.9 Rate Control

The number of bits produced when encoding a macroblock is not constant [1]. If all encoding parameters remain unchanged, variations in motion and detail cause the bitrate to vary greatly. Many applications require a constant or near constant bit rate. Managing the encoder bitrate is achieved by determining the rate and/or the buffer fullness of the encoder while sending feedback. The QP

can then be changed accordingly on three different rate control levels namely, frame, basic unit,

or MB. These levels determine how frequently the QP will be changed. Frame level rate control

techniques change the QP once per frame. This causes large fluctuations in the achieved bitrate but

is computationally inexpensive [8, 9]. MB level rate control changes the QP every MB and tends to

be the most accurate rate control method but requires the most computation time [10–12]. A Basic

Unit (BU) is defined as a being larger than a MB in size but smaller than a frame. Typically a BU

is a complete row of MB’s in a given frame. This method of rate control offers the best tradeoff

between accuracy and computational load [13]. A rate control scheme can also be categorized as a

single pass or multiple pass algorithm. A single pass scheme never re-encodes MB’s while multiple

pass schemes can re-encode MB’s two or more times [14].

16 Figure 1.12: Profiles for H.264

1.10 Motive and Organization

Due to the fact that static QP ’s can lead to drastically varying bitrates, a rate control scheme is introduced. This rate control algorithm varies the QP in order to achieve a target bit rate while maintaining a low complexity to facilitate real-time implementation. The thesis is divided into three additional chapters. Chapter II covers the proposed rate control algorithm and the parameters associated with it. Chapter III covers the implementation of all nine Intra Prediction modes. And

finally Chapter IV covers the conclusion and future work concerning the proposed rate control algorithm.

17 CHAPTER II

Proposed Rate Control Algorithm

2.1 Introduction

Rate control plays a key role in certain video compression applications. A problem that is considered is the possibility of implementing a rate control system into a real-time environment.

For this to be possible, the computational load must be limited to achieve a throughput requirement.

Methods like [14–16] prove to be computationally expensive and in some cases, MB’s have to be encoded multiple times. These qualities are desirable in a real-time environment which is why a new rate control scheme has been proposed in this paper.

The method that was implemented prior to proposed method namely, the baseline MB level rate control, is used as a reference for comparison. This method will be covered in depth in the following section.

2.2 Baseline MB Level Rate Control

A MB level rate control system that uses the variance of a MB along with other calculated values, to determine the change in QP [i], is used as a means of comparison for the newly proposed rate control algorithm. The following rate control method was implemented prior to the proposed

18 Figure 2.1: Design for Baseline MB level rate control

method and later improved upon. Figure 2.1 shows what needs to be calculated before ∆QP [i] can be determined for the next MB.

A flow chart illustrating the baseline MB level rate control is shown in Figure 2.2. The flow chart shows the processing order of the already implemented rate control system. The overall change in

QP for the next MB in a frame is determined by ∆QP [i], where i denotes the current MB index.

∆QP [i] is calculated below in Equation 2.1.

∆QP [i] = QP [i + 1] − QP [i], (2.1)

where QP [i] is the value of the quantization parameter for MB i, and QP [i + 1] is given by

Q [i] + Q [i] + Q [i] Q [i + 1] = P P V AR PBIT , (2.2) P 5 where VQP V AR[i] is determined based on the variance value of the current MB, QPBIT [i] is de- termined based on the comparison of target bits, BT , and actual bits used, B[i], per MB. QP [i] is calculated by i 1 X Q [i] = Q [k] (2.3) P 5 P k=i−4

19 Figure 2.2: Flow Char for Baseline MB Level Rate Control

VQP V AR[i] is calculated as follows,

 2 V (x, y) < 250   1 250 ≤ V (x, y) < 750 Q [i] = (2.4) P V AR −1 750 ≤ V (x, y) < 1500   −2 V (x, y) ≥ 1500 where V (x, y) is the variance of the current MB calculated by

B−1 B−1 1 X X 2 V (x, y) = (I(Bx + i, By + j) − I(Bx, By)) , (2.5) B2 i=0 j=0

and B−1 B−1 1 X X I(x, y) = I(x + i, y + i). (2.6) B2 i=0 j=0 B = 16, I is the raw imagery values, and x and y are the column and row number of the current

MB, respectively. QPBIT [i] is determined by one of two cases. Case 1 is when the target number of bits per MB is greater than or equal to the actual number of bits used. Case 2 is when the target number of bits per MB is less than the actual number of bits used. BITQP [i] is then calculated by

Case 1: BT ≥ B[i] 20  −2 B + B == −1  F MB QPBIT [i] = −1 BF + BMB == 0 (2.7)  0 else

Case 2: BT < B[i]

( 0 BF == 0 & BMB == −1 QPBIT [i] = (2.8) BF + BMB else where ( 1 C > T B = t f (2.9) F 0 else.

Ct is the number of bits used for the current frame and Tf is the target number of bits per frame.

BBM [i] is determined as follows

( −1 B[i] − B[i − 1] > 0 B [i] = (2.10) MB 1 B[i] − B[i − 1] ≤ 0

.

2.3 Proposed BU Rate Control Scheme

The proposed rate control scheme uses a BU rate control algorithm for simplicity. To limit the complexity, only the number of bits per BU are used to decide whether the QP should be changed. Figure 2.3 shows the flow chart for the proposed BU rate control scheme. This rate control scheme improves upon the existing baseline scheme in terms of PSNR, achieved bit rate error, and complexity.

To begin the encoding process, an estimation of the starting QP for the first I frame is used.

Using a variation of [17], the starting QP is estimated as follows,

 35, bpp ≤ 0.1  λ  25, 0.1 < bpp ≤ 0.3 Q [0] = λ λ (2.11) P 20, 0.3 < bpp ≤ 0.6  λ λ  10, else. 21 Figure 2.3: Design for Baseline BU level rate control

where λ is calculated to be 0.18. The value of bpp is calculated as follows,

B bpp = (2.12) fHW where B is the target bit rate in bits per second, f is the in frames per second, H and

W are the height and width of the current frame respectively. Figure 2.4 shows what needs to be

calculated in order to determine the QP for the next BU. The rate control scheme is then broken

down into two cases. Case 1 is when the current frame is a P frame. Case 2 is when the current

th frame is an I frame. To determine the change in QP for the j BU in a P frame, the total bits used

per BU is used three different ways as follows,

Case 1:  −2 BU[j] ≤ (TBU − R1)   −1 (T − R ) < BU[j] ≤ T  BU 1 BU QPB[j] = 0 TBU < BU[j] ≤ (TBU + R1) (2.13)  2 BU[j] > (T + R )  BU 1  0 else

22 Figure 2.4: Design for BU level rate control

where QPB[j] is one of three QP variables, BU[j] is the number of bits used in the current BU,

TBU is the target bits per BU, and R1 is a percentage of TBU . Both TBU and R1 are calculated as

follows, B TBU = , (2.14) fHMB and

R1 = βTBU (2.15) where HMB is the height of the given frame in MB’s, and β is a model parameter that is determined through testing to be .3. The second way the amount of bits were used was to determine the bit rate at each BU and determine how far off it is from the target bit rate. This method is as follows,  −2 RBU [j] ≤ (B − R2)   −1 (B − R ) < R [j] ≤ B  2 BU QPR[j] = 0 B < RBU [j] ≤ (B + R2) (2.16)  2 R [j] > (B + R )  BU 2  0 else where QPR[j] is one of the three QP variables, RBU [j] is the calculated bit rate for the current BU,

and R2 is a percentage of the target bit rate. R2 is calculated as follows,

R2 = βB (2.17) 23 The calculation of the final QP variable looks at how far the calculated bit rate differs from the target bit rate and assigns a value to QPF [j] accordingly as follows,   −2 Diff[j] ≥ α   −1 α > Diff[j] ≥ 0  −α QPF [j] = 0 0 > Diff[j] ≥ 2 (2.18)  2 Diff[j] < −α   0 else, where α is a model parameter and Diff[j] is what percentage the current BU bit rate is from the target. The best value of α is determined through testing to be 30. Diff[j] is calculated as follows,

T − R [j] Diff[j] = BU BU 100. (2.19) TBU

Testing is done to determine the optimal values of α and β. α was varied from 0 to 70 and β was

varied from 0 to .7 for three video sequences. Figure 2.5 shows the testing done on the foreman

video sequence. The error between the target bit rate and the encoded bit rate is squared and plotted

along with the varying values of α and β. The ’X’ signifies the values that were chosen as the optimal values.

Figure 2.6 shows the testing that was done on the Stockholm video sequence. Once the three

QP values have been determined, they are then used to determine what the change in QP will be

for the next BU in a frame. This method is as follows,

Q [j] + Q [j] + Q [j] ∆Q [j] = PB PR PF , (2.20) P 4 where ∆QP [j] is the change in QP for the next BU.

To determine the change in QP for an I frame, the total bits used to encode the entire frame are analyzed as follows,

Case 2: ( −1 B [n] < T ∆Q [j] = I I , (2.21) P 1 else

th where BI [n] is the total number of bits used in the n I frame and TI is the target number of bits for an I frame. Using this method, compared to the original method saves computation time thus 24 Figure 2.5: Testing for Foreman at 2 Mbps

Figure 2.6: Testing for Stockholm at 15 Mbps

25 Figure 2.7: Arithmetic operations for both systems

Table 2.1: Arithmetic Complexity Calculations

Multiplication Division Addition Original (HW256+4)GOP (HW4+3)GOP (HW510+3)GOP H5 H6 H6 Proposed 16 (GOP − 1) + 1 16 (GOP − 1) + 3 8 (GOP − 1) + 1

adding less latency to an encoder. The number of mathematical operations for the MB level rate control and BU level rate control are shown in Table 2.1, where GOP represents how many frames are in each group of pictures. The data is plotted for given video sizes and is shown in Figure 2.7.

2.4 Experimental Results

For testing, the proposed rate control algorithm is implemented and compared against the base- line system. Testing is done with a GOP structure of IPPP, with the first frame in a GOP being an I

26 Figure 2.8: Foreman test sequence at 2 Mbps

frame and all other frames being P frames. The number of encoded frames is set to 200, the number of reference frames is set to 2, frame rate is set to 25 FPS, and 16x16 MBs is used. Figure 2.8 shows the original and baseline schemes bit rate at every GOP for the Foreman test sequence at 2 Mbps.

Figure 2.9 shows the testing for Flower at 7.5 Mbps and Figure 2.10 is the testing for Stockholm at 15 Mbps. Additional results are shown in B. More testing data involving the optimal values of α and β are shown in Appendix A. All testing plots show a horizontal line representing ±5% of the target bit rate, which is chosen as the acceptable error range. Table 2.2 compares the PSNR and

Table 2.3 compares the achieved bit rate of the two algorithms. An average PSNR gain of .54 dB is achieved with the new algorithm. The previously implemented system’s average standard deviation

27 Figure 2.9: Flower test sequence at 7.5 Mbps

Table 2.2: Rate Distortion Comparison

Sequence Bit Rate Average PSNR (dB) Original Proposed PSNR Gain 1 Mbps 43.04 43.79 0.75 Foreman 2 Mbps 50.57 50.98 0.41 3 Mbps 57.96 57.08 -0.88 5Mbps 39.38 41.27 1.89 Flower 7.5 Mbps 44.78 47.00 2.22 10 Mbps 49.00 51.00 2.00 10 Mbps 32.86 33.72 0.85 Stockholm 15 Mbps 34.41 34.93 0.52 20 Mbps 35.60 36.15 0.55

28 Figure 2.10: Stockholm test sequence at 15 Mbps

Table 2.3: Bit Rate Comparison sequence Bit Rate Achieved Bit Rate (Mbps) Target Bit Error(%) Original Proposed Original Proposed 1 Mbps 1.004 1.03 0.5 3 Foreman 2 Mbps 2.00 1.99 0.5 -0.5 3 Mbps 3.01 2.94 0.33 -1.8 5 Mbps 5.11 5.01 2.24 0.22 Flower 7.5 Mbps 7.62 7.58 1.72 1.11 10 Mbps 10.13 10.11 1.38 1.12 10 Mbps 10.00 10.17 0 1.72 Stockholm 15 Mbps 15.00 14.94 0 -0.39 20 Mbps 19.80 20.37 -0.96 1.87

29 in bit rate per GOP was calculated to be 493 kbps while the newly implemented system’s standard deviation is only 207 kbps.

2.5 Conclusion

The proposed algorithm is computationally efficient such that, little to no latency is added to already existing compression systems. Using a BU level rate control helped to lower bit fluctuations compared to the MB level rate control scheme. Using a dynamic starting QP also helped to begin the encoding process closer to the target bit rate. The experiments show that the proposed scheme improves overall PSNR while still maintaining a target bit rate.

30 CHAPTER III

Intra Prediction

3.1 Overview of Intra Prediction

As mentioned in Chapter I, intra prediction relies on a spatial correlation for determining pixel values. Using neighboring MB’s from the current frame, a prediction is made and then subtracted from the original to create the residual. Figure 3.2 shows the block to be predicted in the current frame as well as what blocks have and have not been encoded using raster scan order[1]. For a MB predicted with intra prediction, there are three possible block sizes for the luma data, namely 16x16,

8x8, or 4x4. From this a single prediction block is produced for the chroma component. Table 3.1 gives an overview of prediction possibilities with intra prediction. Choosing a smaller prediction size (4x4) tends to lead to better prediction. This means that the predicted 4x4 block tends to be

Figure 3.1: Available blocks for intra prediction

31 Table 3.1: Intra Prediction Possibilities Block Size Summary 16x16 (luma) One 16x16 prediction block is produced from four possible prediction modes 8x8 (luma) An 8x8 prediction block is produced per 8x8 block from nine possible prediction modes 4x4 (luma) A 4x4 prediction block is produced per 4x4 block from nine possible prediction modes Chroma One prediction block is produced for each component of chroma with 4 possible prediction modes.

closer in prediction thus the residual data will contain smaller values meaning less bits. However, since each new 4x4 has to be signaled to the decoder it often means that smaller prediction sizes have a lower compression ratio. Likewise, larger blocks (16x16) tend to be less accurate but because of the sample size, it leads to higher compression ratios than 4x4’s [1]. Depending on the selected prediction size of the current MB, different prediction samples will be used to predict the current

MB. If the prediction size is 16x16, the only samples used are above and to the left of the current

MB. However, if the sample size is 8x8 or 4x4, the samples are either taken from above, directly to the left, above and to the left, above and to the right, or a combination of these depending on what prediction mode is selected [1]. Figure 3.2 shows all nine possible prediction modes for 8x8 and 4x4 intra prediction. What these different modes accomplish is trying to predict the pattern of the current MB. If the current 4x4 or 8x8 has even horizontal pixels then prediction mode 1 will most likely be selected as the best. Once the prediction block is determined, the sum of absolute error (SAE) is determined between the predicted block and the raw block data. The prediction mode that produces the lowest SAE is then chosen as the prediction type. The calculation for SAE is as follows, B−1 B−1 X X SAE = (P (Bx + i, By + j) − I(Bx + i, By + j))2, (3.1) i=0 j=0

32 Figure 3.2: Prediction modes for 8x8 and 4x4 intra prediction

where B is the current block size, namely 8x8 or 4x4, P is the prediction that was determined from one of the prediction types and I is the raw pixel data. Calculating all nine prediction types for every

4x4 and 8x8 can be computationally expensive so the possibility of an early termination process is favorable. After each pixel has been predicted, subtract the prediction from the raw pixel value and check to see if it is already above the SAE value of the best prediction mode thus far. If the current

SAE value is below the SAE of the best prediction mode, the prediction will continue. However, if they current SAE value becomes greater than the SAE of the best prediction mode, the current 4x4 prediction is terminated early.

3.2 Experimental Results

Testing was done with only I-Frame and 4x4 turn on as well as an IPPP sequence with only 4x4 for I-Frames. Figure 3.3 shows the encoding time of the improved intra prediction against the intra prediction that only utilizes the first three prediction types for the foreman video sequence using only I-Frames. Overall for this test, the average PSNR gain was .15 while the average increase in encoding time was .15 seconds. Figure 3.4 shows the PSNR for the foreman sequence with all nine intra prediction modes compared to the first three modes only. The flower video sequence was also tested and the results for the encoding time are shown in figure 3.5. Only I-Frames and 4x4 blocks

33 Figure 3.3: Encoding time for first three modes against all nine modes for Foreman

Figure 3.4: PSNR for first three modes against all nine modes for Foreman

34 Figure 3.5: Encoding time for first three modes against all nine modes for Flower

were used for testing. Adding the additional intra prediction modes produced an average PSNR gain of 0.005 and increased encoding time by an average of 1.15 seconds. Figure 3.6 shows the PSNR for the flower sequence with all nine intra prediction modes compared to the first three modes only.

More testing data for the intra prediction modes is shown in Appendix C.

35 Figure 3.6: PSNR for first three modes against all nine modes for Flower

36 CHAPTER IV

Conclusion and Future Work

4.1 Conclusions

In this thesis, a new rate control for H.264 video compression is implemented. The advantages of the proposed algorithm is that it operates on a basic unit level and not a macroblock or frame level. It is also favorable in a real-time application due to the low complexity of the algorithm.

The proposed method does produce favorable results when comparing the PSNR to the baseline

MB level rate control. Additionally, this thesis presents the addition of all possible intra prediction modes, which adds more robustness, to the H.264 encoder. Although only one mode is really re- quired namely, DC prediction, adding all prediction modes adds better prediction accuracy. Better block predictions reduces the amount of data that needs to be transmitted or stored. The added benefit of all prediction modes can mainly be seen in videos with high motion content.

4.2 Future Work

Rate control algorithms are not required in H.264. They do allow for better control when dealing with buffer fullness and limited bandwidths. As the capabilities of the software encoder increases, the proposed rate control algorithm might need additional ’tuning’. Some improvements could be

37 made to the rate control algorithm i.e. variable BU sizes, improved algorithm and a better dynamic

QP algorithm. Additionally, the proposed algorithm could be implemented into a real-time environ- ment. Testing could then be done to determine the exact amount of latency the proposed algorithm adds. Additional profiles could be added to give the software encoder more rigidity. This would allow for better compression ratios with little to no degradation in video quality.

38 BIBLIOGRAPHY

[1] I. E. Richardson, The H.264 Advanced Video Compression Standard 2nd Edition. John Wiley and Sons, 2003.

[2] T. Sikora, “Mpeg digital video-coding standards,” Signal Processing Magazine, IEEE, vol. 14, no. 5, pp. 82–100, 1997.

[3] K. Rijkse, “H.263: video coding for low-bit-rate communication,” Communications Magazine, IEEE, vol. 34, no. 12, pp. 42–45, 1996.

[4] T. Sikora, “The mpeg-4 video standard verification model,” Circuits and Systems for Video Technology, IEEE Transactions on, vol. 7, no. 1, pp. 19–31, 1997.

[5] M. Hoffman, E. Balster, and W. Turri, “High-throughput cavlc architecture for real-time h.264 coding using reconfigurable devices,” Real-Time Image Processing, 2013.

[6] D. Marpe, H. Schwarz, and T. Wiegand, “Context-based adaptive binary arithmetic coding in the h.264/avc video compression standard,” Circuits and Systems for Video Technology, IEEE Transactions on, vol. 13, pp. 620–636, 2003.

[7] T. Wiegand, G. Sullivan, G. Bjontegaard, and A. Luthra, “Overview of the h.264/avc video coding standard,” Circuits and Systems for Video Technology, IEEE Transactions on, vol. 13, pp. 560–576, 2003.

[8] A. Malewar, R. Singh, and V. Gadre, “Novel target bit rate estimation for improved frame level h.264 rate control,” in Signal Processing and Communications (SPCOM), 2010 International Conference on, 2010, pp. 1–5.

[9] M. Jiang, X. Yi, and N. Ling, “Improved frame-layer rate control for h.264 using mad ratio,” in Circuits and Systems, 2004. ISCAS ’04. Proceedings of the 2004 International Symposium on, vol. 3, 2004, pp. III–813–16 Vol.3.

[10] Z. He, Y. K. Kim, and S. Mitra, “Low-delay rate control for dct video coding via rho;-domain source modeling,” Circuits and Systems for Video Technology, IEEE Transactions on, vol. 11, no. 8, pp. 928–940, 2001.

39 [11] W. Yuan, S. Lin, Y. Zhang, W. Yuan, and H. Luo, “Optimum bit allocation and rate control for h.264/avc,” Circuits and Systems for Video Technology, IEEE Transactions on, vol. 16, no. 6, pp. 705–715, 2006.

[12] J. Dong and N. Ling, “A model parameter and mad prediction scheme for h.264 macroblock layer rate control,” in Circuits and Systems, 2008. ISCAS 2008. IEEE International Symposium on, 2008, pp. 628–631.

[13] Z. Li, F. Pan, K. Lim, X. Lin, and S. Rahardja, “Adaptive rate control for h.264,” in Image Processing, 2004. ICIP ’04. 2004 International Conference on, vol. 2, 2004, pp. 745–748 Vol.2.

[14] S. Ma, W. Gao, and Y. Lu, “Rate-distortion analysis for h.264/avc video coding and its ap- plication to rate control,” Circuits and Systems for Video Technology, IEEE Transactions on, vol. 15, no. 12, pp. 1533–1544, 2005.

[15] N. Eiamjumrus and S. Aramvith, “New rate control scheme based on cauchy rate-distortion optimization model for h.264 video coding,” in Intelligent Signal Processing and Communi- cations, 2006. ISPACS ’06. International Symposium on, 2006, pp. 143–146.

[16] D.-K. Kwon, M.-Y. Shen, and C.-C. Kuo, “Rate control for h.264 video with enhanced rate and distortion models,” Circuits and Systems for Video Technology, IEEE Transactions on, vol. 17, no. 5, 2007.

[17] L. Czuni, G. Csaszar, and A. Licsar, “Estimating the optimal quantization parameter in h.264,” in Pattern Recognition, 2006. ICPR 2006. 18th International Conference on, vol. 4, 2006, pp. 330–333.

40 APPENDIX A

Testing Data for α and β

Shown below, in Figure A.1, is the testing data that was collected for Foreman to determine the optimal values of α and β at 1 Mbps. As described in Chapter II, the ’X’ signifies the values that were chosen as the optimal values.

41 Figure A.1: Testing for Foreman at 1 Mbps

Figure A.2 shows the testing data for Foreman at 3 Mbps.

42 Figure A.2: Testing for Foreman at 3 Mbps

Figure A.3 shows the testing data for Flower at 5 Mbps.

43 Figure A.3: Testing for Flower at 5 Mbps

Figure A.4 shows the testing data for Flower at 7.5 Mbps.

44 Figure A.4: Testing for Flower at 7.5 Mbps

Figure A.5 shows the testing data for Flower at 10 Mbps.

45 Figure A.5: Testing for Flower at 10 Mbps

Figure A.6 shows the testing data for Stockholm at 10 Mbps.

46 Figure A.6: Testing for Stockholm at 10 Mbps

Figure A.7 shows the testing data for Stockholm at 20 Mbps.

47 Figure A.7: Testing for Stockholm at 20 Mbps

48 APPENDIX B

Proposed Rate Control Testing

Additional testing was done for the proposed rate control algorithm. The testing was done with additional target bit rates for the Foreman, Flower and Stockholm testing sequences. Figure B.1 shows the testing done for Foreman at 1 Mbps.

49 Figure B.1: Testing for Foreman at 1 Mbps

50 Figure B.2: Testing for Foreman at 3 Mbps

Figure B.2 shows the testing done for Foreman at 3 Mbps.

51 Figure B.3: Testing for Flower at 5 Mbps

Figure B.3 shoes the testing done for Flower at 5 Mbps

52 Figure B.4: Testing for Flower at 10 Mbps

Figure B.4 shows the testing done for Flower at 10 Mbps.

53 Figure B.5: Testing for Stockholm at 10 Mbps

Figure B.5 shows the testing done for Stockholm at 10 Mbps.

54 Figure B.6: Testing for Stockholm at 20 Mbps

Figure B.6 shows the testing done for Stockholm at 20 Mbps.

55 APPENDIX C

Intra Prediction Mode Data

Additional results were produced for the implementation of all nine intra prediction modes as mentioned in Chapter III. The testing was done on the Foreman and Flower video sequences using a IPPP frame sequence with the I-Frames only using 4x4 blocks. Figure C.3 shows the encoding time for Foreman using the first three and all nine intra prediction modes.

56 Figure C.1: Encoding time for first three modes against all nine modes for Foreman

Figure C.2, shows the PSNR’s for the Foreman sequence using the first three and all nine intra prediction modes.

57 Figure C.2: PSNR for first three modes against all nine modes for Foreman

Figure C.3, shows the encoding time for the Flower sequence.

58 Figure C.3: Encoding time for first three modes against all nine modes for Flower

Figure C.4, shows the PSNR’s for the Flower sequence.

59 Figure C.4: PSNR for first three modes against all nine modes for Flower

60