Quick viewing(Text Mode)

H.265/HEVC Video Transmission Over 4G Cellular Networks

H.265/HEVC Video Transmission Over 4G Cellular Networks

H.265/HEVC transmission over 4G cellular networks

by

Aman Jassal

Dipl.Ing., Ecole Sup´erieured’Ing´enieursen Informatique et G´eniedes T´el´ecommunications, 2008

A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF

MASTER OF APPLIED SCIENCE

in

The Faculty of Graduate and Postdoctoral Studies

(Electrical and Computer Engineering)

THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver) January 2016 c Aman Jassal 2016 Abstract

Long Term Evolution has been standardized by the 3GPP consortium since

2008, with 3GPP Release 12 being the latest iteration of LTE Advanced, which was finalized in March 2015. High Efficiency Video Coding has been standardized by the Moving Picture Experts Group since 2012 and is the video compression technology targeted to deliver High-Definition video con- tent to users. With video traffic projected to represent the lion’s share of mobile traffic in the next few years, providing video and non-video users with high Quality of Experience is key to designing 4G systems and future 5G systems.

In this thesis, we present a cross-layer scheduling framework which de- livers video content to video users by exploiting encoding features used by the High Efficiency Video Coding such as coding structures and motion compensated prediction. We determine which frames are referenced the most within the coded video bitstream to determine which frames have higher utility for the High Efficiency Video Coding decoder located at the user’s device and evaluate the performances of best effort and video users in 4G networks using finite buffer traffic models. We look into throughput performance for best effort users and packet loss performance for video users to assess Quality of Experience. Our results demonstrate that there is sig-

ii Abstract nificant potential to improve the Quality of Experience of best effort and video users using our proposed Reference Aware Proportional Fair scheme compared to the baseline Proportional Fair scheme.

iii Preface

I hereby declare that I am the author of this thesis. This thesis is an original, unpublished work under the supervision of Dr. Cyril Leung. In this work,

I played the primary role in designing and performing the research, doing data analysis and preparing the manuscript under the supervision of Dr.

Cyril Leung.

iv Table of Contents

Abstract ...... ii

Preface ...... iv

Table of Contents ...... v

List of Tables ...... vii

List of Figures ...... viii

List of Acronyms ...... xii

Acknowledgements ...... xiii

Dedication ...... xiv

1 Introduction ...... 1

2 Basics of H.265/HEVC ...... 4

2.1 Syntax Structures and Syntax Elements ...... 4

2.2 Coding Structures and Reference Picture Lists ...... 7

2.2.1 Coding Structures ...... 8

2.2.2 Reference Picture Lists ...... 10

v Table of Contents

2.3 Motion Compensated Prediction ...... 13

2.4 Operation with Networking Layers ...... 15

3 Cross-Layer Frame Reference Aware Scheduling Framework 18

3.1 Mathematical Formulation of the Shared Resource Allocation

Problem ...... 19

3.2 Solution to the proposed Shared Resource Allocation Problem 25

4 System Model ...... 28

4.1 H.265/HEVC Video Content Generation ...... 28

4.2 LTE-Advanced System Model ...... 30

4.2.1 Network Model ...... 30

4.2.2 Traffic Model ...... 34

4.2.3 Channel Model ...... 35

4.2.4 Feedback Model ...... 40

5 Simulation Results and Analysis ...... 42

5.1 Simulation Assumptions ...... 43

5.2 Simulation Results and Discussion ...... 48

5.2.1 Results for video users ...... 49

5.2.2 Results for Best Effort users ...... 54

6 Conclusions and Future Work ...... 60

6.1 Contributions ...... 60

6.2 Future Work ...... 61

Bibliography ...... 64

vi List of Tables

2.1 Generic NAL unit syntax, adapted from [3] ...... 5

2.2 Reference Picture Sets for the Hierarchical-B Coding Struc-

ture of GOP-size 8 ...... 11

2.3 Reference Picture Lists for the Hierarchical-B Coding Struc-

ture of GOP-size 8 ...... 13

4.1 H.265/HEVC Table of Video Test Sequences ...... 28

4.2 H.265/HEVC Parameters ...... 30

4.3 FTP Traffic Model 1 ...... 33

4.4 H.265/HEVC Traffic Model ...... 35

5.1 LTE-Advanced Parameters ...... 46

5.2 Offered Load and corresponding Resource Utilization . . . . . 49

vii List of Figures

2.1 Frame dependencies in the reference coding structure. . . . . 9

2.2 Uni- and bi-predictive inter-prediction illustration from adja-

cent pictures, adapted from [4] ...... 14

2.3 RTP Single NAL unit packet structure ...... 16

2.4 H.265/HEVC system layer stack ...... 17

4.1 Hexagonal Network Grid Layout ...... 31

4.2 Wrap Around of Hexagonal Network ...... 32

4.3 LTE Downlink PRB allocation illustration ...... 33

5.1 Video users’ active download time ...... 50

5.2 Satisfied Video User Percentage ...... 51

5.3 CRA LDU Loss Ratio ...... 53

5.4 Average throughput for Best Effort users ...... 55

5.5 Coverage throughput for Best Effort users ...... 56

5.6 Illustration of the outer 10% of the coverage area ...... 57

5.7 Average BE user throughput in -Edge region ...... 58

viii List of Acronyms

3GPP Third Generation Partnership Project.

ADT Active Download Time.

BE Best Effort.

CB Coding Block.

CDF Cumulative Distribution Function.

CQI Channel Quality Indicator.

CSI Channel State Information.

CVS Coded Video Sequence.

DASH Dynamic Adaptive Streaming over HTTP.

EESM Exponential Effective SNR Mapping.

FDD Frequency Division Duplex.

GOP .

ix List of Acronyms

H.264/AVC .

H.265/HEVC High Efficiency Video Coding.

HTTP Hypertext Transfer Protocol.

IETF Engineering Task Force.

IP .

ITU-R International Union Radiocommunications Sec-

tor.

JCT-VC Joint Collaborative Team on Video Coding.

KPIs Key Performance Indicators.

LDU Logical Data Unit.

LTE Long Term Evolution.

LTE-A LTE Advanced.

MANE Media Aware Network Element.

MIESM Mutual Information Effective SNR Metric.

MIMO Multiple Input Multiple Output.

MOS Mean Opinion Score.

MPEG Moving Picture Experts Group.

x List of Acronyms

MU-MIMO Multi User Multiple Input Multiple Output.

NAL Network Abstraction Layer.

NGMN Next Generation Mobile Networks.

OFDMA Orthogonal Frequency Division Multiple Access.

OSI Open Systems Interconnection.

PB Prediction Block.

PLR Packet Loss Ratio.

PMI Precoding Indicator.

POC Picture Order Count.

PRB Physical Resource Block.

QAM Quadrature Amplitude Modulation.

QoE Quality of Experience.

QoS Quality of Service.

QPSK Quaternary Phase Shift Keying.

RBSP Raw Byte Sequence Payload.

RI Rank Indication.

RTP Real Time Protocol.

xi List of Acronyms

RU Resource Utilization.

SINR Signal to Interference and Noise Ratio.

SNR Signal to Noise Ratio.

SRST Single RTP stream on a single media transport.

SU-MIMO Single User Multiple Input Multiple Output.

TCP Transmission Control Protocol.

UDP User Protocol.

UMTS Universal Mobile Telecommunications System.

VCL Video Coding Layer.

Wi-Fi Wireless Fidelity.

xii Acknowledgements

I would like to take this opportunity to express my utmost gratitude and sincerest thanks to my supervisor, Dr. Cyril Leung, who has given me great support, encouragement and guidance throughout my work and my M.A.Sc program. My discussions with him were a constant source of inspiration and his insights helped make this research work more valuable. Without his invaluable knowledge and understanding in this research area, this thesis would have never been possible.

I would also like to thank Dr. Ahmed Saadani for his guidance and support throughout my engineering program and at Orange Labs where he gave me the opportunity to do research work on 4G systems. My former colleagues, Mr. Sebastien Jeux and Dr. Sofia Martinez Lopez, and more generally all the research community involved in research and standardiza- tion with the 3GPP, have had a great influence on me and without their inspiration I would have never undertaken my program at the University of

British Columbia.

All of the work that has been done in this thesis was supported in part by the Natural Sciences and Engineering Research Council (NSERC) of Canada under Grant RGPIN 1731-2013.

xiii Dedication

To my parents and my sister

xiv Chapter 1

Introduction

With the emergence of Long Term Evolution (LTE) and its subsequent it- erations standardized by the Third Generation Partnership Project (3GPP) consortium, video services are fast becoming the dominant data services in 4G mobile networks and mobile video traffic is projected to account for

72% of the total mobile data traffic by 2019 [1]. The transmission of video services over cellular networks is challenging due to the large bandwidth requirement, the low latency required due to inter-operation and the effect of error propagation within the video sequence in the event of packet losses. The current dominant standard for video coding is Ad- vanced Video Coding (H.264/AVC) [2] and is used to deliver a wide range of video services. However, H.264/AVC requires extremely high bandwidth, making the delivery of High-Definition (HD) video services impractical. Its successor, High Efficiency Video Coding (H.265/HEVC) [3], was standard- ized by the Moving Picture Experts Group (MPEG) in 2012 and is expected to reduce the rate compared to H.264 High Profile by about 50% while maintaining comparable subjective quality [4]. Therefore H.265/HEVC is a more practical choice for delivering HD and Ultra High-Definition (UHD) video content to consumers using wired and wireless networks.

1 Chapter 1. Introduction

As we move towards 5G, one of the key targets that we need to achieve is to provide a more consistent user experience across the whole network as well as higher Quality of Experience (QoE) [5]. Cross-layer QoE-aware resource allocation schemes have been proposed for Orthogonal Frequency

Division Multiple Access (OFDMA) systems [6], where the scheduling al- gorithm uses the Mean Opinion Score (MOS) as a way to provide QoE.

Other attributes that the research community has been focusing on in order to improve the QoE of video users are the playback buffer status and the rebuffering time [7]-[8]. One of the limitations in these works is the reliance on video traces that were generated for low-definition video sequences en- coded using H.264/AVC, which are not representative of the targets that

5G networks are supposed to satisfy. Rather they are aimed at delivering

HD or UHD video services anywhere anytime. Other works have considered

H.265/HEVC video streaming over Wi-Fi wireless networks and shown that the QoE of video sequences, reflected through the use of MOS, is very sen- sitive to network impairments such as packet losses. Nightingale et al. [9] assumed that packet losses are random; however in cellular networks this assumption is rarely valid as the combination of traffic load, the characteris- tics of the video sequence and the individual user’s link quality will dictate the overall performance that can be achieved.

In this thesis, we focus on the use-case of H.265/HEVC video trans- mission over 4G networks. Existing works have not used the compression properties of H.265/HEVC, specifically in terms of exploiting the tempo- ral inter-dependence between frames within coding structures, or evaluated how well video services can be delivered in 4G/beyond-4G networks with

2 Chapter 1. Introduction dynamic user arrivals. We use performance evaluation methodologies which use Key Performance Indicators (KPIs) that have been recommended by the

Next Generation Mobile Networks (NGMN) Alliance for 5G networks [5].

The main novel contributions of this thesis are as follows:

1. The definition of a cross-layer scheduling framework exploiting frame

referencing to deliver video content

2. The evaluation of capacity for the delivery of H.265/HEVC video ser-

vices over beyond-4G networks

3. The joint-assessment of the QoE of video users and Best Effort users

The remainder of this thesis is organized as follows. Chapter 2 out- lines the basics of the H.265/HEVC standard that are relevant to this work.

Chapter 3 presents the proposed cross-layer scheduling framework for video content transmission. The simulation model is presented in Chapter 4. Sim- ulation results, analysis and discussions are provided in Chapter 5. Conclu- sions and future work are presented in Chapter 6.

3 Chapter 2

Basics of H.265/HEVC

In this chapter, we describe the features of the H.265/HEVC standard that are directly relevant to this thesis and to the problem formulation that will be presented and developed in Chapter 3. Specifically, we present the high-level syntax used to represent the video data, the motion prediction techniques used for video compression and the coding structures and refer- ence picture lists used to perform the motion-predicted compensation task in

H.265/HEVC [3]. The main point to understand is that the encoder knows about the specifics of the coding structure and it has to provide the decoder about the information needed to reconstitute it. This is done through using a given coding order (which is implicitly embedded in the way LDUs are ordered) and through using Reference Picture Sets and Reference Picture

Lists (the former are explicitly transmitted and the latter are derived dur- ing the decoding process). In this chapter we will explain how all of these features work.

2.1 Syntax Structures and Syntax Elements

H.265/HEVC uses so-called syntax structures to represent the encoded video data. An H.265/HEVC encoder generates syntax structures encapsulated

4 2.1. Syntax Structures and Syntax Elements

Table 2.1: Generic NAL unit syntax, adapted from [3] nal unit(NumBytesInNalUnit) { forbidden zero bit nal unit type nuh layer id nuh temporal id plus1 NumBytesInRbsp=0 for(i=2; i < NumBytesInNalUnit; i++) if(i+2 < NumBytesInNalUnit && next bits(24) == 0x000003) { rbsp byte[NumBytesInRbsp++] rbsp byte[NumBytesInRbsp++] i+=2 emulation prevention three byte /* equal to 0x03 */ } else rbsp byte[NumBytesInRbsp++] } inside logical data units called Network Abstraction Layer (NAL) units.

An H.265/HEVC decoder decapsulates NAL units and consumes syntax structures to reconstitute a given picture1. The sequence of NAL units can be viewed as a text written in a specific language with a syntax and semantics that the decoder can read and understand. The syntax is the set of words the decoder knows and the semantics tells the decoder how the syntax is to be used. The information conveyed by the combination of the syntax and the semantics is recovered through the decoding process, which is fully specified in [3].

Table 2.1 illustrates the syntax structure of a generic NAL unit and the syntax elements it carries, syntax elements are highlighted in bold. Syntax elements have associated descriptors which are used for parsing purposes but these are not covered in this thesis and the interested reader is in- 1In this thesis, we will interchangeably use the terms ”Picture” and ”Frame”.

5 2.1. Syntax Structures and Syntax Elements vited to refer to [4] (Chapter 5) for more details. Every NAL unit carries

NumBytesInNalUnit bytes, which further breaks down into a 16-bit header made of 4 syntax elements and a payload which is the Raw Byte Sequence

Payload (RBSP) , carrying NumBytesInRbsp bytes. The

first syntax element is the forbidden zero bit (forbidden zero bit). The second syntax element is nal unit type, which is written over 6 and carries the type of the RBSP contained in the NAL unit. The values that it can take are specified in Table 7-1 of [3], NAL unit types belong either to

Video Coding Layer (VCL) or non-VCL. VCL types comprise all NAL units that contain coded video data whereas non-VCL types contain parameter information. The third syntax element is the layer identifier, nuh layer id, which is written over 6 bits. Its value is always 0 although other values can be specified by future recommendations of ITU-T that relate to future scalable or 3D video coding extensions of [3]. The fourth and final syntax element of the header is the temporal identifier, nuh temporal id plus1, which is written over 3 bits. Its value is typically 1, which means that there is only one temporal layer. We assume that this is the case throughout the thesis. The temporal identifier for the NAL unit, TemporalID, is obtained as:

TemporalID = nuh temporal id plus1 − 1 (2.1)

The payload of NAL units is the RBSP, denoted as the rbsp byte syntax element, where rbsp byte contains NumBytesInRBSP bytes and rbsp byte[i] is the ith byte of the RBSP. Because there are various types of

NAL units, the RBSP itself can be viewed as a syntax structure carrying syn-

6 2.2. Coding Structures and Reference Picture Lists tax elements. For each nal unit type, the H.265/HEVC standard provides the description of the associated syntax structure. For instance, the RBSP of a Video Parameter Set has a dedicated syntax structure (Section 7.3.2.1 of [3]), the RBSP of a Clean Random Access NAL unit has a dedicated syntax structure further broken into a slice segment header, a slice segment data and trailing bits (Section 7.3.2.9 of [3]), etc. In order to guarantee that every NAL unit has a unique start identifier byte, the H.265/HEVC standard uses dedicated bytes called emulation prevention three byte.

During the decoding process, this byte is usually discarded. In this thesis, we assume that a bitstream is only made of generic VCL NAL units and from this point onwards, a NAL unit will be referred to as Logical Data Unit

(LDU).

2.2 Coding Structures and Reference Picture

Lists

An H.265/HEVC bitstream is made up of several entities called Coded Video

Sequence (CVS). A CVS is the coded representation of a sequence of pictures which can be decoded using pictures within that sequence. Similarly, a coded picture is the coded representation of a picture, which typically consists of multiple LDUs. A coded picture is embedded in a so-called access unit which contain all the LDUs associated with that picture. In this section we will present some of the tools used by the H.265/HEVC standard for motion compensated prediction: coding structures and reference picture lists.

7 2.2. Coding Structures and Reference Picture Lists

2.2.1 Coding Structures

H.265/HEVC relies on temporal coding structures to perform its video com- pression task. A coding structure designates a set of consecutive pictures with clearly defined dependencies between pictures and a given coding or- der. The purpose of having pictures depend on others is for prediction, which can be done from one picture or two pictures (called uni-prediction and bi- prediction respectively). Coding structures define a coding order, which is different from the output order: the coding order is the order in which pic- tures are encoded while the output order is the order in which pictures are displayed on the screen. Because of this difference, the H.265/HEVC stan- dard uses a Picture Order Count (POC) to uniquely identify a given picture in output order. From this point onwards and for the sake of convenience, we will refer to the picture whose POC is equal to n as pocn. The definition of a coding structure bears a strong similarity to that of a

Group of Pictures (GOP) in H.264/AVC. In earlier video compression stan- dards such as H.264/AVC, a GOP designates a set of consecutive pictures with clearly defined dependencies where the first picture is an intra-coded picture (or equivalently an I-Frame). The difference between a GOP and a coding structure is that the first picture in a coding structure does not have to be an I-Frame. Basically, the pictures that belong to a coding struc- ture only reference other pictures within the coding structure for prediction purposes. In this case, the coding structure is called a closed GOP. The

H.265/HEVC standard also allows cases where a picture within a coding structure references a picture from another coding structure, in which case

8 2.2. Coding Structures and Reference Picture Lists

Figure 2.1: Frame dependencies in the reference coding structure. the coding structure is called an open GOP. Throughout this chapter, we will use the hierarchical-B coding structure that was used by the Joint Col- laborative Team on Video Coding (JCT-VC) for the Main Profile Random

Access encoder configuration as described in [10]. All figures and tables will refer to that specific coding structure. For simplicity, throughout the remainder of this thesis, we will refer to this coding structure simply as the reference coding structure.

9 2.2. Coding Structures and Reference Picture Lists

Fig. 2.1 depicts four illustrations of frame dependencies in the reference coding structure. Referenced pictures are denoted by a (*) and arrows point from the referenced picture to denote all direct dependent pictures. Depen- dent pictures can be either before or after the referenced picture in display order. The reference coding structure is actually an open GOP coding struc- ture and by design it operates with a GOP size of 8. We can see the open side of the reference coding structure in Fig. 2.1 on the examples where poc0, poc4 and poc6 are the referenced pictures. They are referred by pictures be- yond the GOP size: poc0, poc4 and poc6 are all referenced by poc16. The reference coding structure uses I-Frames and B-Frames. The coding order of this coding structure is defined as {pocn, pocn−4, pocn−6, pocn−7, pocn−5, pocn−2, pocn−3, pocn−1}. poc0 is a special case and constitutes a GOP on its own since there are no pictures before poc0. Using this definition, we can easily identify that after poc0, the next GOP is comprised of {poc8, poc4, poc2, poc1, poc3, poc6, poc5, poc7}. The reference coding structure is then be applied periodically on the succeeding pictures throughout the video sequence. The encoder can change the coding structure if it yields better performance but we assume that it remains unchanged throughout the en- coding of a video sequence. The decoder at the receiver side will extract the information regarding the referenced pictures from Reference Picture Lists, which we describe in the next section.

2.2.2 Reference Picture Lists

Coding structures specify the coding order and the dependencies between a given set of pictures. The decoder does not have any knowledge about the

10 2.2. Coding Structures and Reference Picture Lists

Table 2.2: Reference Picture Sets for the Hierarchical-B Coding Structure of GOP-size 8 Reference Picture Set Reference POCs 0 pocn−8, pocn−10, pocn−12, pocn−16 1 pocn−4, pocn−6, pocn+4 2 pocn−2, pocn−4, pocn+2, pocn+6 3 pocn−1, pocn+1, pocn+3, pocn+7 4 pocn−1, pocn−3, pocn+1, pocn+5 5 pocn−2, pocn−4, pocn−6, pocn+2 6 pocn−1, pocn−5, pocn+1, pocn+3 7 pocn−1, pocn−3, pocn−7, pocn+1 coding structure that was used by the encoder, it must derive this informa- tion from the LDUs that carry the encoded video data. In this section, we explain how the encoder transmits the information regarding the dependen- cies between pictures.

At the receiver end, as a picture gets decoded, it is either displayed on the screen or stored in the Decoded Picture Buffer until it is eventually output. Any picture located in the Decoded Picture Buffer can be reused as reference for prediction. Pictures that are available for inter prediction are listed in a so-called Reference Picture Set. The Reference Picture Set is sent in the Sequence Parameter Set and each picture indexed in there is explicitly identified using its POC value. Table 2.2 lists the different Reference Picture

Sets defined for the reference coding structure that was used by the JCT-VC for the Main Profile Random Access encoder configuration as described in

[10]. Eight Reference Picture Sets are defined and for a given picture pocn, the corresponding referenced POCs are given. Since poc0 is the first POC of a video sequence, there can be no negative POC, therefore if poci with

11 2.2. Coding Structures and Reference Picture Lists i < 0 were to be in a Reference Picture Set, the picture would simply not be included.

The LDUs of a given picture carry a header that specifies which Reference

Picture Set to activate. H.265/HEVC uses two Reference Picture Lists for inter prediction, called List0 and List1. The decoder reconstructs these lists from the Reference Picture Sets that were supplied in the Sequence

Parameter Set and this process is specified in Section 8.3.4. of [3]. The main difference between a Reference Picture Set and a Reference Picture

List is that a Reference Picture List is a subset of the Reference Picture

Set which is actually used for inter prediction. For uni-predicted frames

(P-Frames) only List0 is activated while for bi-predicted frames (B-Frames) both List0 and List1 are activated. Motion compensated prediction is then performed using the activated lists. The resulting prediction can be either made from one picture only or a combination of pictures. Using these lists, the hierarchy between pictures can be recovered. Table 2.3 depicts the hierarchical-B coding structure of size 8 that was used by the JCT-VC for the Main Profile Random Access encoder configuration as described in [10].

This is the reference coding structure that we use throughout this thesis for all our video sequences. For each picture, we provide the Reference Picture

Set that is used and the POCs of the pictures in the Reference Picture

Lists. The first picture of a coded video sequence is usually an I-Frame and I-Frames do not use Inter Prediction. Therefore it does not have any associated Reference Picture Set and its associated Reference Pictures Lists are empty. poc8 and poc16 both use the same Reference Picture Set, however for poc8, three of the pictures do not exist therefore poc8 only references

12 2.3. Motion Compensated Prediction

Table 2.3: Reference Picture Lists for the Hierarchical-B Coding Structure of GOP-size 8 POC RPS used List0 POCs List1 POCs 0 - N/A N/A 8 0 0 0 4 1 0, 8 8, 0 2 2 0, 4 4, 8 1 3 0, 2 2, 4 3 4 2, 0 4, 8 6 5 4, 2 8, 4 5 6 4, 0 6, 8 7 7 6, 4 8, 6 16 0 8, 6, 4, 0 8, 6, 4, 0 12 1 8, 6 16, 8 10 2 8, 6 12, 16 9 3 8, 10 10, 12 ......

poc0. By combining the information in Table 2.2 and Table 2.3, one can easily reconstitute the direct dependencies that we illustrated earlier in Fig.

2.1 for the reference coding structure.

2.3 Motion Compensated Prediction

There are two types of prediction used in video compression: Intra(-frame)

Prediction and Inter(-frame) Prediction. Intra prediction is used for intra- coded frames (I-Frames) whereas inter prediction is used for all other frames, which can be uni-predicted frames (P-Frames) or bi-predicted frames (B-

Frames). Inter prediction in H.265/HEVC relies on Motion Compensated

Prediction in order to perform efficient compression. The main idea be- hind inter prediction is that a given picture uses another picture as ref-

13 2.3. Motion Compensated Prediction

Figure 2.2: Uni- and bi-predictive inter-prediction illustration from adjacent pictures, adapted from [4] erence, searches for the block in that reference picture that best matches the predicted area and encodes the information of the motion of that block between both pictures. In H.265/HEVC, a given picture may use one or two pictures as reference for inter prediction. Fig. 2.2 illustrates the con- cept of uni-predictive and bi-predictive inter prediction. This is achieved using the coding structures that we introduced in Section 2.2.1. poc does uni-prediction from picture poc − 2 and does bi-prediction from its adjacent pictures poc − 1 and poc + 1. Note that bi-prediction does not require the pictures to be adjacent to poc, one CB from poc uses poc − 2 and poc − 1 for bi-prediction.

The H.265/HEVC standard operates on a block-basis. The most basic block used in H.265/HEVC is called a Coding Block (CB). Each picture is partitioned into multiple CBs. Each CB is further partitioned into smaller blocks called Prediction Block (PB). After the picture has been partitioned into PBs, the encoder will then perform prediction on a PB-basis from the reference pictures whose POCs are given in the Reference Picture Lists.

14 2.4. Operation with Networking Layers

The encoder will look through the reference pictures for the same area as the one in the PB on a PB-basis using a rate-distortion criterion. Once it finds the area which presents the lowest amount of rate-distortion, it encodes the information of the shift as the tuple of the motion vector and the reference picture’s POC. The motion vector is the shift between the area corresponding to the PB and the area in the reference picture which presented the lowest amount of rate-distortion. The basic idea behind rate- distortion optimization is that the encoder looks for the best possible coding mode that reduces the loss of video quality, i.e. the distortion, and the required to encode that area, i.e. the rate. It is beyond the scope of this thesis to delve into rate-distortion algorithms and their specifics and the interested reader is invited to refer to [11] and to [4] (Chapter 2) for more details on the application of rate-distortion in video compression.

2.4 Operation with Networking Layers

Video compression techniques such as H.264/AVC and H.265/HEVC operate at the , which sits at the highest level in the Open Sys- tems Interconnection (OSI) model [12]. The encoder generates LDUs which are then sent to the lower layers for transmission over packetized networks based on the Internet Protocol (IP). One of the commonly used solutions for delivering video content over IP networks is to use the Real Time Proto- col (RTP). The Internet Engineering Task Force (IETF) has formulated the

RFC 6184, which details the operation of RTP for delivering H.264/AVC content [13]. Similarly the IETF has formulated a draft RFC for the op-

15 2.4. Operation with Networking Layers eration of RTP for delivering H.265/HEVC content [14]. We will look into the specifics of RTP operation for delivering H.265/HEVC content. In this thesis, we assume that that for all users we have a Single RTP stream on a single media transport (SRST) and all LDUs are sent in RTP packets that use the Single NAL unit packet structure. Fig. 2.3 shows the structure of such an RTP packet. The PayloadHdr field is the bit-exact copy of the LDU header, the DONL field is optional and carries the 16 least significant bits of the Decoding Order Number. We assume that this field does not exist.

The NAL unit payload data field is the payload of the LDU and the last

field is also optional and included for the purpose of padding. We assume that all RTP packets have a padding field occupying 10 bytes. Given that the RTP specification for H.265/HEVC is still at a draft-level at the time of writing, we allow ourselves to make some modifications and introduce a new

field in the Single NAL unit packet structure: the RefCount field. Since the encoder knows exactly what coding structure is used to compress a video sequence, it can also keep track of the number of times a given picture is referenced within the video sequence and propagate that information to the

Figure 2.3: RTP Single NAL unit packet structure

16 2.4. Operation with Networking Layers

Figure 2.4: H.265/HEVC system layer stack

RTP packets. We assume that the RefCount field occupies 2 bytes.

For live streaming services, RTP is used in conjunction with the User

Datagram Protocol (UDP) to supply packets to IP. Another solution that has been developed for buffered streaming services by MPEG is Dynamic

Adaptive Streaming over HTTP (DASH). DASH performs video streaming over the Hypertext Transfer Protocol (HTTP) using adaptive bit rate and is -agnostic. Since this solution is based on HTTP, packets are supplied to IP using the Transmission Control Protocol (TCP). IP packets can then be supplied to different wireless access technologies, such as LTE or Wireless

Fidelity (Wi-Fi). Fig. 2.4 gives an illustration of how the protocol stacks are set up. In this thesis, we will focus on using video streaming services to cellular users. We assume the use of RTP and UDP to supply packets over IP, using the modified Single NAL unit packet structure for the RTP payload, and LTE-A as the .

17 Chapter 3

Cross-Layer Frame Reference Aware Scheduling Framework

In the previous chapter, we presented some of the features of the H.265/HEVC standard that are relevant for video compression. We presented coding structures, syntax structures and syntax elements, which are used to en- code video content. We also presented motion compensated prediction for more bandwidth-efficient encoding and reference picture lists for helping the decoder track which pictures to use as reference when doing motion predic- tion. Using these features, we define a cross-layer scheduling framework that exploits these features and delivers video content based on their de- pendencies between each-other. In this chapter, we propose a mathematical formulation of the shared resource allocation problem for delivering video content and derive the optimal solution to this problem.

18 3.1. Mathematical Formulation of the Shared Resource Allocation Problem

3.1 Mathematical Formulation of the Shared

Resource Allocation Problem

Let us consider S to be the set of users actively sharing resources. Let

us consider a user k and let the channel capacity of user k for time-slot n

be denoted by Ck(n). Kelly [15] has provided a mathematical formulation of the shared resource allocation problem, which has been widely used by

the research community for tackling rate control problems in communication

networks. This shared resource allocation problem, which we will call SRAP,

is formulated as the following constrained optimization problem and solved

at the beginning of every time-slot n.

SRAP:

X maximize F (~r(n)) , Uk(rk(n)) (3.1) k∈S

subject to rk(n) < Ck(n), rk(n) ≥ 0, k ∈ S (3.2)

F is the objective function that we are trying to maximize, Uk(rk(n)) denotes

the utility function of user k and rk(n) is the average throughput of user k up to time-slot n. Constraint (3.2) ensures that the rate of the user does

not exceed the channel capacity Ck(n) that user k is experiencing during time-slot n. Under the assumptions that the objective function F in (3.1)

is strictly concave and differentiable and that the feasible region in (3.2) is

compact, we know from Nonlinear Programming Theory [16] that an optimal

solution exists for SRAP and Kelly has provided an explicit optimal solution

to this problem using Lagrangian methods [15].

19 3.1. Mathematical Formulation of the Shared Resource Allocation Problem

In wireless networks, the channel capacity and the number of users ac-

tively sharing resource varies with time. This is due to the random nature

of the wireless channel and the network’s traffic. As a result, the optimal

solution to SRAP also varies with time. Hosein [17] proposed a solution

to SRAP by observing that finding the optimal solution consists in finding

the user which maximizes the gradient of the objective function. Hosein

developed his solution by introducing update equations using exponential

smoothing filters in order to keep track of each user’s throughput, whose

expression is given as follows

 1 dk(n)  (1 − )rk(n) + if user k is served, r (n + 1) = τ τ (3.3) k 1  (1 − )r (n) otherwise. τ k

dk(n) is the throughput of user k estimated for time-slot n in bits per sec-

ond. τ > 1 is the time constant of the exponential smoothing filter. rk(n) is the average throughput of user k up to time-slot n. Because the objec-

tive function is strictly concave, Hosein showed that all we need to find is

the direction, i.e. the user, which maximizes the gradient of the objective

function. If we denote this user as user k∗ then

k∗ = argmax{∇F (~r)}. (3.4) k∈S

As an example, if the utility function Uk of each user k is defined as the

logarithmic function of the rate of that user log(rk), then the maximum gradient direction, i.e. the user maximizing the gradient function, is given

20 3.1. Mathematical Formulation of the Shared Resource Allocation Problem

by: ( ) d (n) k∗ = argmax k (3.5) k∈S rk(n)

(3.5) is the well-known Proportional Fair metric, widely used for scheduling

in cellular networks such as Universal Mobile Telecommunications System

(UMTS) and LTE. An alternate way of finding this result is as follows2. The

utility function Uk of each user k is defined as the logarithmic function of the rate of user k and we know how the rate of each user is computed. Let

us assume that user i is selected at time-slot n, the new utility value will be

X −1 −1 −1 log((1 − τ )rk(n)) + log((1 − τ )ri(n) + τ di(n)). (3.6) k∈S k6=i

−1 By adding and subtracting log((1 − τ )ri(n) in Eq. (3.6), the sum will be performed for all users and Eq. (3.6) then becomes

−1 −1 ! X −1 (1 − τ )ri(n) + τ di(n) log((1 − τ )rk(n)) + log −1 . (3.7) (1 − τ )ri(n) k∈S

After some simplifications, Eq. (3.7) eventually boils down to

! X −1 1 di(n) log((1 − τ )rk(n)) + log 1 + . (3.8) (τ − 1) ri(n) k∈S

From Eq. (3.8), it is obvious to see that the overall utility is maximized if

user i maximizes di(n) , which is the Proportional Fair metric. Hosein [17] ri(n) also proposed the use of barrier methods in order to account for Quality of

Service (QoS) constraints. In nonlinear programming, barrier methods are

2The author of this simple and elegant proof is Dr. Cyril Leung.

21 3.1. Mathematical Formulation of the Shared Resource Allocation Problem

used on optimization problems in order to force the solutions to remain in

the interior of the feasibility region. Another alternative to barrier methods

are penalty methods, which forces the solutions to remain in a certain area

of the feasibility region by imposing large penalties to solutions that lie

outside of that area. In this thesis, we propose to use barrier functions in

order to deliver video content by exploiting frame references. For a detailed

discussion of penalty and barrier methods, the interested reader is invited

to refer to Chapter 13 of [16].

In order to deliver video content, we extend the formulation of SRAP

to account for frame reference awareness and call this new problem SRAP-

FRA. We introduce a new constraint on the frame reference count of user k,

ck(n), to account for the fact that the network does not hold transmission queues of infinite size. This also prevents the scenario where a video user

watches an infinitely long video sequence. This aspect is modelled through

Finite-Buffer traffic models and these will be discussed in greater detail in

Chapter 4. Just like SRAP, SRAP-FRA is also solved at the beginning of

every time-slot n. The expression of SRAP-FRA is as follows.

SRAP-FRA:

X maximize F (~r(n),~c(n)) , Uk(rk(n), ck(n)) (3.9) k∈S

subject to rk(n) < Ck(n), rk(n) ≥ 0. (3.10)

0 ck(n) < Ck(n), ck(n) ≥ 0. (3.11)

0 Ck(n) is the constraint on the number of frame references the transmission

22 3.1. Mathematical Formulation of the Shared Resource Allocation Problem

queue of user k can hold at any given time-slot n, ck(n) is the average number of frame references that user k has been receiving up to time-slot n and

Uk(rk(n), ck(n)) is the combined utility function of user k that we introduce for our frame reference aware scheduling framework. For our scheduling

framework, we need to track for each user whether its transmission queue

is holding any frame that is referenced within the video sequence user k is

watching and take any decision based on that. Essentially, we are building

a scheduling framework where users watching video content get sent content

that the decoder needs to perform its task as efficiently as possible and by

incurring as little delay as possible in playback. To that end, we use barrier

functions and express the combined utility function for each user k as

Uk(rk(n), ck(n)) = Uk,1(rk(n)) + Uk,2(ck(n)), (3.12)

where

Uk,1(rk(n)) , log(rk(n)),Uk,2(ck(n)) , −λ exp(−µ(ck(n) − cmin)). (3.13)

In (3.13), Uk,2 is a generalized expression of a barrier function, λ and µ are positive-valued parameters for adjusting the penalty for leaving the feasible

region. Hosein [17] has proposed the use of such functions for delivering

QoS though there is no indication in the literature to suggest that this type

of approach is the most optimal way of accounting for QoE constraints.

Other approaches and methodologies should definitely be investigated for

addressing such issues. Our motivation for using a barrier function based

23 3.1. Mathematical Formulation of the Shared Resource Allocation Problem

approach is to provide a simple scheduling framework.

In parallel to the update equation of the rate of user k, we also introduce

an exponentially smoothed update equation for keeping track of the frame

reference count of user k.

 1 tk(n)  (1 − )ck(n) + if user k is served, c (n + 1) = T T (3.14) k 1  (1 − )c (n) otherwise, T k

where ck(n) is the frame reference count of user k at the beginning of time-

slot n, cmin is the minimum number of frame references that we force the

system to provide to each video user, T > 1 is the time constant of the

exponential smoothing filter and tk(n) is the number of frame references being transmitted to user k at time-slot n. Due to the assumptions that we

made regarding the proposed combined utility function, the formulation of

SRAP-FRA can be rewritten as

SRAP-FRA:

X  maximize F (~r(n),~c(n)) , Uk,1(rk(n)) + Uk,2(ck(n)) (3.15) k∈S

subject to rk(n) < Ck(n), rk(n) ≥ 0. (3.16)

0 ck(n) < Ck(n), ck(n) ≥ 0. (3.17)

This is the cross-layer scheduling framework that we propose to solve in this

thesis and for which we derive a solution in the following section.

24 3.2. Solution to the proposed Shared Resource Allocation Problem

3.2 Solution to the proposed Shared Resource

Allocation Problem

In this section, we are going to derive the solution to the proposed optimiza- tion problem SRAP-FRA (3.15). We need to find the user that maximizes the gradient of the objective function. Since we constructed our combined utility function as the sum of two separate utility functions (3.12), maxi- P mizing the combined utility can be written as maximizing k∈S Uk,1(rk(n)) P and k∈S Uk,2(ck(n)) individually. We already know the solution to the maximization of the sum of the first utility function Uk,1(rk(n)). We will focus on deriving the solution to the maximization of the sum of the second utility function Uk,2(ck(n)). Let us call j the user selected to be served at time-slot n by the network. Let us call β the parameter with which we parameterize the movement of the sum of the second utility functions in the direction of serving user j. The objective function F can then be written as:

Fj,2(β) = Uj,2(cj(n) + β(cj(n + 1) − cj(n)))+ X Uk,2(ck(n) + β(ck(n + 1) − ck(n))) (3.18) k∈S k6=j

25 3.2. Solution to the proposed Shared Resource Allocation Problem

User j is served and all other users are not. Given the update equations of the frame reference count (3.14), (3.18) simplifies to

t (n) − c (n) F (β) = U (c (n) + β j j )+ j,2 j,2 j T X ck(n) U (c (n) − β ). (3.19) k,2 k T k∈S k6=j

Taking the partial derivative of Fj,2 with respect to β and setting β to 0, we get:

∂Fj,2 tj(n) − cj(n) X ck(n) = U 0 (c (n)) − U 0 (c (n)). (3.20) ∂β T j,2 j T k,2 k k∈S k6=j

Eq. (3.20) can be rewritten as:

∂Fj,2 tj(n) X ck(n) = U 0 (c (n)) − U 0 (c (n)). (3.21) ∂β T j,2 j T k,2 k k∈S

∂Fj,2 Since we are looking to maximize ∂β , we can ignore the second term of (3.21) as this term is a sum which is common to all users in the network. We also know the expression of Uk,2, so the expression of the maximum gradient direction is

( ) ∗ λµ k = argmax tk(n) exp(−µ(ck(n) − cmin)) . (3.22) k T

Essentially, this means that the system will maximize the utility of users by prioritizing the transmission of referenced frames ahead of unreferenced frames. As we saw in Section 2.2.1, there is a clear hierarchy in the way

26 3.2. Solution to the proposed Shared Resource Allocation Problem frames depend upon each-other in video sequences. If a video user is pro- vided frames which the decoder can always decode or if the decoder does not have to wait for other frames before being able to decode those frames, then video users can watch video sequences with no perceptible delay and this will enhance the Quality of Experience of video users. This sort of procedure helps counter error propagation within the video decoding process, therefore the proposed cross-layer scheduling framework can be seen as a form of error resilience. Using (3.5), (3.12) and (3.22), the final expression of the metric for the proposed scheduling framework (3.15) can then be expressed as:

dk(n) λµ + tk(n) exp(−µ(ck(n) − cmin)). (3.23) rk(n) T

For the rest of this thesis, we shall refer to our proposed scheduling scheme as Frame Reference Aware Proportional Fair (FRA-PF).

27 Chapter 4

System Model

In this chapter, we describe our system model and simulation methodology for evaluating the performance of our proposed scheduling framework. Our evaluation methodology is centered upon using system-level simulations. In this chapter we will cover the components that are of utmost relevance to this thesis. More in-depth and complete description of system level simulation methodologies can be found in [18], [19] and [20].

4.1 H.265/HEVC Video Content Generation

Analytical traffic models have been proposed for near-real time video stream- ing in [18], where the packet sizes and packet inter-arrival times are based on truncated Pareto distributions. While this analytical model captures the

Table 4.1: H.265/HEVC Table of Video Test Sequences Sequence length Resolution (frames) (fps) (px x px) FourPeople 600 60 1280x720 Johnny 600 60 1280x720 KristenAndSara 600 60 1280x720 SlideShow 200 20 1280x720 SlideEditing 300 30 1280x720

28 4.1. H.265/HEVC Video Content Generation variability in the packet sizes coming from the video source, it is agnostic to the specifics of the H.265/HEVC standard and therefore cannot be relied on for generating realistic video traffic. Moreover, our objective is to evaluate the application level experience of H.265/HEVC video users and to this end, we use HM 14.0 to generate video bitstreams [21]. We use different video test sequences which were used for development and testing purposes by

MPEG: FourPeople, Johnny, KristenAndSara, SlideShow and SlideEditing.

The characteristics of these video test sequences are given in Table 4.1. For each of these video sequences, we generate the corresponding bitstream and trace files using HM 14.0 [21], from which we extract the information of the

Reference Picture Lists, as defined in Section 2.2.2, for all frames in order to determine the frame reference dependence structure.

For simplicity, we assume that each frame consists of only one slice seg- ment (see Section 2.1), so that each frame is encoded inside one LDU. The

GoP size is set to 8, the Intra-Period is defined as the interval between two consecutive I-Frames in terms of frames. The Intra-Period is always set so that an I-Frame can be found approximately every second. Its value de- pends on the frame-rate of the video sequence: for a frame rate of {20, 24,

30, 50, 60} fps, the Intra-Period is set to {16, 24, 32, 48, 64} (respectively).

Aside from I-Frames, we use B-Frames only. Using the bitstreams generated from the video sequences we selected, we create a custom Traffic Model for each video sequence and use it as input to our LTE-A simulator, which is described below. The H.265/HEVC parameters used to generate the bit- are summarized in Table 4.2. Other parameters needed to run HM

14.0 are left to their default values as in [10].

29 4.2. LTE-Advanced System Model

Table 4.2: H.265/HEVC Parameters High Efficiency Video Coding Parameters Video Sequence Length 10 seconds SliceMode 0 Coding Unit size 64 pixels x 64 pixels GoP size 8 Quantization Parameter 32 Frame Structure IBB...BIBB...B Decoding Refresh Type Clean Random Access

4.2 LTE-Advanced System Model

In order to evaluate the performance of our proposed scheduling framework, we use system-level simulations based on openWNS and IMTAphy [22]-

[23]. The performance evaluation methodology is based on the simulation methodology described in Annex A of the 3GPP Technical Report 36.814

[19] and in the Evaluation Methodology Document of IEEE 802.16m [18].

In this section, we will describe some of the components and features that we use in our performance evaluation. Evaluation methodologies based on system-level simulations require many components to capture aspects of the and the protocols implemented at the .

4.2.1 Network Model

We consider a downlink LTE Advanced (LTE-A) system using Frequency

Division Duplex (FDD) with N = 19 base stations. Each base station is assumed to have three sectors each in order to provide coverage, thus there is a total of 57 sectors in the network. An illustration of the hexagonal grid layout is provided in Fig. 4.1. To ensure that all cells experience

30 4.2. LTE-Advanced System Model

Figure 4.1: Hexagonal Network Grid Layout similar interference and that we accurately model the impact of outer-cells, we implement a wrap-around technique. The full system is actually modelled as a network consisting of 7 clusters, where each cluster is made of N = 19 base stations. The central cluster is where the users are created and where all of the statistics are collected. Fig. 4.2 illustrates the concept of wrap- around. Virtual clusters are depicted in grey while the central cluster is depicted in white, the central base station of each cluster is depicted in yellow. The surrounding clusters are virtual clusters in the sense that no user is actually dropped there. All the cells in the virtual clusters are copies of the

31 4.2. LTE-Advanced System Model

Figure 4.2: Wrap Around of Hexagonal Network original cells in the central cluster. Everything the virtual cells have is the same in terms of antenna configuration, traffic and fast-fading, with the only difference being the location. Users are dropped independently at uniformly random locations in the central cluster. For all base stations, we assume that each sector uses 4 transmit antennas and each user uses 2 receive antennas.

This corresponds to a 4x2 Multiple Input Multiple Output (MIMO) system.

The system bandwidth B is assumed to be 10 MHz. Resource Allocation

32 4.2. LTE-Advanced System Model

Figure 4.3: LTE Downlink PRB allocation illustration

Type is assumed to be 0, i.e. that we allocate groups of Physical Resource

Block (PRB) to users. For a system bandwidth of 10 MHz, the 3GPP standard specifies that users are allocated groups of 3 contiguous PRBs.

Fig. 4.3 depicts a PRB allocation with 4 users in a system with 10 MHz of bandwidth. Note that at 10 MHz, the last group only contains 2 PRBs as the total number of PRBs at 10 MHz of bandwidth is 50.

Table 4.3: FTP Traffic Model 1 Parameter Statistical Characterization File size 2 Megabytes User arrival rate λbe Poisson distributed process with rate λbe Number of downloads 1 (each user downloads a single file)

33 4.2. LTE-Advanced System Model

4.2.2 Traffic Model

We model two types of traffic: Best Effort (BE) traffic and video traffic.

Traffic type assignment probability between BE and video is 0.5 each. Usu- ally users are assumed to be active for the entire duration of the simulation, i.e. they are created at the beginning of the simulation and dropped at the end of the simulation, as stated in [18]. In this thesis, we decided to use more realistic traffic models. Users are created at random time instants accord- ing to a Poisson distributed random process. Users remain in the network until they have completed their session or until they are dropped from the network. For the BE traffic model, we use FTP Traffic Model 1 defined in the 3GPP Technical Report [19] and whose parameters are summarized in

Table 4.3.

Similarly, we define a traffic model for video users; in this thesis we use our own custom traffic model. Because we need information about frame reference dependencies, we turn to HM 14.0 to generate realistic video bit- streams for use in our performance evaluation. Section 4.1 covers the actual generation of the video bitstreams in more detail. We wrap the video bit- streams around six times as each bitstream individually carries 10 seconds’ worth of video data. This helps us generate video traffic representing one minute’s worth of video data. Video users remain in the network until there are no more packets left for them to receive. The parameters of our video traffic model are summarized in Table 4.4.

34 4.2. LTE-Advanced System Model

Table 4.4: H.265/HEVC Traffic Model Parameter Statistical Characterization Video duration 1 minute User arrival rate λv Poisson distributed process with rate λv Number of sessions 1 (each user watches a single video once)

4.2.3 Channel Model

For every user in the network, we need to model the effects of the large-scale and small-scale fading. Depending on the simulated scenario, the propaga- tion and fading characteristics of the channel may be different. In this thesis, we focus on the Urban Macrocell scenario, also referred to as Case 1 by the

3GPP, as defined by the 3GPP in Table A.2.1.1-1 of [24]. It should be noted that Urban Macrocell is also a scenario defined by the International

Telecommunications Union Radiocommunications Sector (ITU-R) in report

M.2135 [25]. The ITU-R scenario defines users traveling at vehicular speeds

(30 km/h) whereas the 3GPP Urban Macrocell scenario defines users as traveling at pedestrian speeds (3 km/h). The reason for using the 3GPP

Urban Macrocell scenario is because we consider services which require high data rates, which are more practical if the users are moving at pedestrian speed. System-level simulations typically rely on stochastic channel models such as the Spatial Channel Model [26] to capture these aspects. Typically, channel models capture the number of clusters3 and their spatial character- istics such as the delay spread, the angular spread and the power carried by each cluster. The original implementation of the system-level simulation tool we used, IMTAphy, uses the channel model specified by the ITU-R in

3In this thesis, we will interchangeably use the terms ”Cluster” and ”Tap”.

35 4.2. LTE-Advanced System Model report M.2135 [25]. In [25], the channel model for the Urban Macrocell sce- nario is defined as a 20-tap model, whereas the channel model we decided to use is the Spatial Channel Model [26], which is a 6-tap model. There are two reasons for choosing the 3GPP Spatial Channel Model. The first reason is that although the ITU-R Channel Model is more accurate, it requires a large memory footprint in terms of storing cluster and ray specific information. It also requires high computational power due to having to sum a large num- ber of clusters for every link, for every subcarrier and for every time-slot.

The second reason is that we are looking to do a fair comparison between two different scheduling schemes. The relevant aspect of the channel model that we need in order to do this is to accurately capture statistical char- acteristics of the channel such as Delay Spread and Angular Spread rather than to provide accurate performance predictions in real environments. The radio channel can typically be described through its large-scale and small- scale characteristics. Large-scale characteristics are captured through the path-loss and the shadow fading distribution. The deterministic path-loss formula used for the Urban Macrocell scenario is defined in [24] as follows

PL(d) = 128.1 + 37.6 log10(d) (4.1) where PL denotes the mean path loss in dB between a given user and a given base station and d denotes the distance between the user and the base station in kilometers. This mean path-loss formula is valid for carrier frequencies around 2 GHz. The distance between a user and a base station must always be at least 35 meters. The short-term statistics are characterized by small-

36 4.2. LTE-Advanced System Model scale parameters. Let us denote the number of clusters in a link by N. The generation of the parameters required to compute the channel coefficients is documented in [26] and [20]. The eventual channel impulse responses account for the aspects of modelling a MIMO channel and are given for a given pair of antennas s and u (resp. station and user) and a given cluster n:

 r r  1 NLoS KR LoS  hu,s,n (t) + hu,s,n(t) n = 1,  KR + 1 KR + 1 hu,s,n(t) = (4.2) r 1  NLoS  hu,s,n (t) 2 6 n 6 N,  KR + 1

NLoS where KR is the Ricean factor, hu,s,n is the non line-of-sight component of LoS the channel and hu,s,n is the line-of-sight component of the channel, which is applied only to the first cluster. The way the Spatial Channel Model is designed, the first cluster is the cluster for which the delay is the shortest.

The non line-of-sight channel component is expressed for a given cluster and for a given pair of transmit-receive antenna elements as follows [26]:

 T r M Pn X Frx,u,V (θn,m) hNLoS(t) =   u,s,n M   m=1 Frx,u,H (θn,m)  √    vv −1 vh exp(jΦn,m) κ exp(jΦn,m) Ftx,s,V (φn,m) √     −1 hv hh    κ exp(jΦn,m) exp(jΦn,m) Ftx,s,H (φn,m)

−1 −1 exp(jds2πλ0 sin(φn,m)) exp(jdu2πλ0 sin(θn,m)) exp(j2πνn,mt) (4.3)

where Pn is the power of the nth cluster, M is the number of rays within the cluster, Frx,u,V and Frx,u,H are the field patterns of the uth antenna element

37 4.2. LTE-Advanced System Model at the receiver side in the vertical and horizontal polarizations respectively,

Ftx,s,V and Ftx,s,H are the field patterns of the sth antenna element at the transmitter side in the vertical and horizontal polarizations respectively,

θn,m and φn,m are the arrival and departure angles of the mth ray in the nth cluster, ds and du are the distance between antenna elements at the transmitter and receiver side respectively, νn,m is the Doppler frequency component of the mth ray of the nth cluster and t is the time instant.

vv vh hv hh Φn,m,Φn,m,Φn,m and Φn,m are uniformly generated random phases used for initialization purposes.

In a similar fashion to the non line-of-sight channel component, the line- of-sight channel component for a given pair of transmit-receive antenna el- ements and is expressed as follows [26]:

 T     F (θ ) exp(jΦvv ) 0 F (φ ) LoS rx,u,V LoS LoS tx,s,V LoS hu,s,n(t) =          hh    Frx,u,H (θLoS) 0 exp(jΦLoS) Ftx,s,H (φLoS)

−1 −1 exp(jds2πλ0 sin(φLoS)) exp(jdu2πλ0 sin(θLoS)) exp(j2πνLoSt) (4.4)

where Frx,u,V and Frx,u,H are the field patterns of the uth antenna element at the receiver side in the vertical and horizontal polarizations respectively,

Ftx,s,V and Ftx,s,H are the field patterns of the sth antenna element at the transmitter side in the vertical and horizontal polarizations respectively,

θLoS and φLoS are the arrival and departure angles of the line-of-sight ray, ds and du are the distances between antenna elements at the transmitter and receiver respectively, νLoS is the Doppler frequency component of the

vv hh line-of-sight ray and t is the time instant. ΦLoS and ΦLoS are uniformly

38 4.2. LTE-Advanced System Model generated random phases used for initialization purposes.

The channel impulse responses given by (4.2) are expressed in the time- domain. Since we are considering an LTE-A air interface, which is based on

OFDMA, we need frequency domain channel coefficients. The frequency do- main channel coefficients are obtained by applying a on the time domain channel impulses responses. The equivalent frequency domain channel matrix at the kth subcarrier for a 4x2 MIMO system are given as:

  H (k) H (k) H (k) H (k)  1,1 1,2 1,3 1,4  H(k) =   , k ∈ {1, 2, ..., NFFT } (4.5) H2,1(k) H2,2(k) H2,3(k) H2,4(k)

where NFFT is the Fast Fourier Transform size. Let us denote the Fast Fourier Transform by F. Each individual component of the channel transfer function H(k) at a given time-instant t is a function of the channel impulse responses given by (4.2) and is expressed as follows [20]

Hu,s(k) = F[hu,s,1(t), hu,s,2(t), ..., hu,s,N (t)], k ∈ {1, 2, ..., NFFT }. (4.6)

In the specific case of LTE, the subcarrier spacing is defined as 15000 Hz.

For a system bandwidth of size 10 MHz, we need a sampling rate that is at least higher than 10 MHz and that is a multiple of the subcarrier spacing, i.e.

15000 Hz. Since Fast Fourier Transforms are optimized for lengths that are integer powers of 2, we use a Fast Fourier Transform of size NFFT = 1024.

39 4.2. LTE-Advanced System Model

4.2.4 Feedback Model

Critical to the performance of most wireless communications systems are mechanisms for delivering Channel State Information (CSI) to the transmit- ter. It is shown in Chapter 8 of [27] that with CSI knowledge at the trans- mitter, one can extract the maximum performance available from MIMO systems. The 3GPP standard has outlined several control signalling mech- anisms for each of the transmission modes it defines. In this thesis, we use

Transmission Mode 10 with 4-Tx Release 12 linear precoding matrices [28].

The 3GPP standard defines an implicit feedback mechanism to operate the

Uplink control signalling. What is meant by ”implicit” is that instead of sending information about the channel matrix itself, the user sends quan- tized information about different channel statistics that can help the net- work make appropriate scheduling decisions. The 3GPP standard defines the content of the control signalling through 3 indicators [28]:

• Rank Indication (RI),

• Precoding Matrix Indicator (PMI),

• Channel Quality Indicator (CQI).

The RI is the rank of the channel matrix, i.e. the number of degrees of freedom that it can carry. The PMI is the index of the Precoding Matrix that maximizes the received power at the receiver and the CQI is the spectral efficiency that the receiver would be able to achieve. The PMI and CQI reports are conditioned upon the value of the RI. The reporting mode we use in this thesis is the Aperiodic CSI Reporting Mode 3-1, as defined in

40 4.2. LTE-Advanced System Model

Section 7.2.1 of [28]. Other reporting modes are also defined by the 3GPP

[28].

Aperiodic CSI Reporting Mode 3-1 consists in a single RI report, a single

PMI report and several subband CQI reports. The size of a subband is specified by the 3GPP standard to be 6 PRBs for a system bandwidth of 10

MHz in [28]. Thus, a single CSI report from the user will contain one value for the RI, one value for the PMI and nine values for the CQI (one CQI value per subband). In this thesis, we assume that the periodicity of the

CSI reports is set to 5 ms. The RI is typically a statistic that is reported less frequently than the PMI or the CQI and its periodicity is set to 20 ms. For the subband CQI reports, we assume non-ideal channel estimation, which is obtained by modelling a noisy sample of the interference covariance matrix in the equalizer vector using the complex Wishart distribution [29].

41 Chapter 5

Simulation Results and Analysis

Some of the key targets specified by the NGMN Alliance for 5G networks can be broadly summarized as providing consistent user experience and en- hanced Quality of Experience. These targets are defined and outlined in [5].

As an example, one target is for the network to be able to provide a certain user throughput for 95% of the time across 95% of the coverage area. This is typically referred to as the 5th percentile of the Cumulative Distribution

Function (CDF) of the user throughput. We also look at the average user throughput as an indicator of the overall user experience.

In this chapter, our simulation assumptions and results are described, including insights gained from our results. So far, all the works in the field of video transmission over wireless networks use Full Buffer methodologies to evaluate performance. The main problem with Full Buffer methodologies is that they only capture performance metrics (for instance user throughput and served cell throughput) in a range where the network is operating at full load. Since cellular networks experience different types of loads depending on the time of the day, it is useful for carriers to have a more complete

42 5.1. Simulation Assumptions view of performance at different traffic load points. One motivation for using traffic models where user arrivals are modelled according to a Poisson distributed random process is to capture performance at traffic load points that are meaningful to carriers.

Intuitively, we expect that performance will be good at low traffic load points because there is a small number of users in the network, which results in low interference and high user throughputs. This ensures that users that enter the network are served quickly and leave quickly. This scenario is not attractive to carriers because although the Quality of Experience is excellent, they are earning little revenue due to the small number of users.

Conversely, we expect that performance will be bad at high traffic load points because there is a large number of users in the network, which results in high interference and low user throughputs. This scenario is also unattractive to carriers because although revenues are high due to the large number of users accessing their spectrum, the Quality of Experience is mediocre and this will lead to customer dissatisfaction. The desirable scenario for carriers is intermediate traffic loads: where the number of users on the network leads to a reasonable revenue for the carrier; the resulting moderate interference leads to acceptable throughputs and users can enjoy reasonably good Quality of Experience.

5.1 Simulation Assumptions

In this section, we outline some of the assumptions made in our simulations.

The main components of our system model are described in Chapter 4. Here,

43 5.1. Simulation Assumptions we describe some of the other assumptions made. We assume that the base station in our LTE-A network is a Media Aware Network Element (MANE).

A MANE is a network node which has the ability to parse an encoded video bitstream and identify specific LDUs. Since our LTE-A base stations can parse video bitstreams, they can specifically look for each user’s LDUs and keep track of the RefCount field in the LDUs. Using the information carried by the RefCount field, the LTE-A system can then keep track of the referenced frames being sent to each video user, using exponential smoothing update equations (3.14) and allocate resources accordingly. In the simulation of our proposed scheduling framework, the following parameter values are used: λ = 25, µ = 1,T = 25 and cmin = 50. Our motivation in this work is to model a realistic 4G/beyond-4G sys- tem. Although several research projects on 5G have been initiated, there is no air interface specified yet for a 5G system. Therefore we use a 4G air interface with as many up-to-date features as possible to do our performance evaluation using metrics which have been proposed for 5G systems. For our

LTE-A system, we decide to model a 4x2 MIMO system. We also assume the use of Single User Multiple Input Multiple Output (SU-MIMO), as opposed to Multi User Multiple Input Multiple Output (MU-MIMO). It is shown in Chapter 7 of [27] that in MIMO systems, the availability of both multi- ple transmit antennas and multiple receive antennas can provide additional spatial dimensions for communication. These additional degrees of freedom can be exploited by spatially different data streams onto the

MIMO channel. The main difference between SU-MIMO and MU-MIMO is that SU-MIMO will focus on sending multiple data streams towards the

44 5.1. Simulation Assumptions same user whereas MU-MIMO will focus on sending data streams towards spatially separate users. We also assume the use of Transmission Mode 10 and assume the use of 4-Tx Release 12 Precoding Matrices [28]-[30]. Trans- mission Mode 10 is a mode where the system allows the use of so-called non-codebook based precoding with up to 8 layers. It is beyond the scope of this thesis to describe the physical layer procedures and processing fea- tures that are relevant for the operation of Transmission Mode 10. More detailed description of Transmission Mode 10 and the associated physical layer procedures are provided in [31]-[28]. For system-level simulations, we need link-to-system models that can accurately translate an instantaneous

Signal to Noise Ratio (SNR) value into a corresponding instantaneous block error rate value. Several methods exist in the literature such as Exponen- tial Effective SNR Mapping (EESM) [32] and Mutual Information Effective

SNR Metric (MIESM) [33]. In this thesis, we use EESM. The basic idea behind EESM is as follows: let us assume a user received a transmission over Nsc subcarriers with instantaneous SNR value γk at the kth subcarrier.

The instantaneous effective SNR γeff using EESM is obtained as:

Nsc  ! 1 X γk γeff = −β ln exp − , (5.1) Nsc β k=1 where β is a correction parameter used for tuning a specific modulation.

The resulting γeff is then mapped to a corresponding block error rate. The values of the β parameters depend on the modulation and the code rate, e.g. β = 1.49 for Quaternary Phase Shift Keying (QPSK) with a code rate

1 of 3 or β = 7.68 for Quadrature Amplitude Modulation (QAM)-16 with a

45 5.1. Simulation Assumptions

Table 5.1: LTE-Advanced Parameters LTE Advanced Parameters System Bandwidth 10 MHz Channel Model Spatial Channel Model [20] Scenario Urban Macro-cell [24] Carrier Frequency 2 GHz Link-to-System Interface Exponential ESM Traffic Model Finite Buffer Receiver Type Wishart-IRC [29] MIMO scheme 4x2 SU-MIMO Transmission Mode TM 10 Precoding Codebook 4-Tx Release 12 [30] CSI Reporting Mode Aperiodic Mode 3-1 [28]

4 code rate of 5 . These values can be found in Table 19.13, Chapter 19 of [20]. Several sources exist for the values of β that can be applied in an

LTE or LTE-A system, for our simulations we use the β values given in [32].

Parameter values for our LTE-A simulations are summarized in Table 5.1 and reflect those used in study items that 3GPP technical groups have used for 3GPP Release 12.

As discussed in Section 4.2.2, we use traffic models that generate user arrivals according to Poisson processes. The traffic assignment probability is 0.5 each and in our simulations, the user arrivals rates for the two traffic models, i.e. BE and video, are equal. This ensures that the average number of users generated for each traffic type is the same. The length of the simulation is chosen such that we generate at least 8000 users for each traffic type. This was done to ensure that all the metrics that are reported in this thesis are obtained within a 95% confidence interval of ±10% around the mean value.

46 5.1. Simulation Assumptions

We use offered load per sector and Resource Utilization (RU) as our reference points. This is because for finite buffer traffic models the 3GPP consortium decided to evaluate performance based on the RU values a cel- lular network goes through and we decided to align our methodology with those assumptions. RU is defined as the ratio of the aggregated number of radio resource blocks allocated for data traffic to the total number of ra- dio resource blocks in the system bandwidth available for data traffic [19].

We first ran simulations using the Proportional Fair scheme and determine the offered loads corresponding to RU values between 40% and 70%. Then we ran simulations using the proposed scheme for those offered loads and compare the resulting performance and QoE for both BE users and video users. These offered loads are listed in Table 5.2. It can be seen that for the PF scheme, the offered load per sector values range between 5.88 Mbps per sector and 6.94 Mbps per sector. The 95% confidence interval for the reported RU values is within ±3.2% of the reported values.

For video users, we report the Active Download Time (ADT), the satis-

fied video user percentage and the packet loss ratio of Clean Random Access

NAL units. A user is considered to be satisfied if its MOS is greater than

4. Conversely a user is considered to be unsatisfied if its MOS is lower than

3. Nightingale [9] showed that even a slight degradation in radio conditions, i.e. a packet loss ratio of 3%, is enough to make the Quality of Experi- ence mediocre. Clean Random Access NAL units carry the encoded video data of I-Frames and represent the largest percentage of the bitstream in terms of bit rate. Since the decoding of the whole video sequence is basi- cally reliant on the correct decoding of these LDUs, the packet loss ratio of

47 5.2. Simulation Results and Discussion these LDUs provides a good indication of how much video content becomes non-viewable.

For BE users, we report the absolute values of the average user through- put and the 10th-percentile of the user throughput CDF. We also report the average user throughput in the outer region of every cell. The reason we choose to report the 10th-percentile instead of the 5th-percentile is because much longer simulations would be required to generate results within a 95% confidence interval. As an example, simulations generating on average 16000 users (8000 video users and 8000 BE users respectively) take between 48 to

72 hours of run time. In order to generate results where the 95% confidence intervals of the 5th-percentile of the user throughput are within ±10%, we would need to generate possibly over 30000 users. This could potentially lead to simulation run times of over a week, which is highly impractical. In this thesis, we will refer to the 10th-percentile of the user throughput CDF as the coverage user throughput. A given BE user’s throughput is calculated as the ratio of the total volume of the transferred data to the download time.

For BE users, the download time is defined as the difference between the time instant of the last packet correctly received by the user and the time instant of the first packet transmitted to the user.

5.2 Simulation Results and Discussion

In this section, we present our simulation results and discuss the main find- ings. We will present our results for video users followed by those for BE users.

48 5.2. Simulation Results and Discussion

Table 5.2: Offered Load and corresponding Resource Utilization Offered Load Resource Utilization (Mbps / Sector) (%) 5.88 40.0 PF 6.27 50.0 scheme 6.58 60.0 6.94 70.0 5.88 35.4 FRA-PF 6.27 41.9 scheme 6.58 47.8 6.94 53.7

5.2.1 Results for video users

For the performance evaluation of video users, we consider two metrics. The

first metric that we introduce is the ADT: which is the time a video user spends actively downloading video content. The second metric is the MOS provided by users about their viewing experience.

The 95% confidence intervals for the active download time are within

±6% of the reported values. Fig. 5.1 shows the active download times video users spend downloading video content while they are in the network. Using the Proportional Fair scheme, video users spend between 3.5 seconds and

8 seconds downloading video content (for offered loads between 5.9 Mbps per sector and 6.9 Mbps per sector respectively). These numbers can be explained by the fact that with the Proportional Fair algorithm tries to be fair to all users, video and BE alike. Resources end up being shared by all users. Using our proposed scheme, video users are given higher importance if their transmission queues carry referenced frames. This is due to the barrier functions we introduced in our scheduling framework. Therefore

49 5.2. Simulation Results and Discussion

9

PF 8 FRA−PF

7

6

5

4

3 Video Active Download Time [s]

2 5.6 5.8 6 6.2 6.4 6.6 6.8 7 7.2 Offered Load [Mbps / Sector]

Figure 5.1: Video users’ active download time if a base-station is serving both video and BE users, video users will be prioritized over BE users as long as they have referenced frames to receive.

Resource allocation is focused on video users first, which results in them being served more quickly, as Fig. 5.1 shows. For offered loads between 5.9

Mbps per sector and 6.9 Mbps per sector, video users spend between 2.2 seconds and 4.2 seconds downloading video content. This is very significant as any time video users do not spend downloading video content means that the resources available at that time can be allocated to BE users.

Possibly the most important aspect in the performance evaluation of video services is the MOS which reflects the quality of the viewing experience from the users’ perspective. We are going to look into the MOS that users would give based on the Packet Loss that they experience, which we denote

50 5.2. Simulation Results and Discussion

100

PF − MOS > 4 95 FRA−PF − MOS > 4

90

85

80

75

70

65

60

Satisfied Video User Percentage [%] 55

50 5.6 5.8 6 6.2 6.4 6.6 6.8 7 7.2 Offered Load [Mbps / Sector]

Figure 5.2: Satisfied Video User Percentage as the satisfied video user percentage. The 95% confidence intervals of the satisfied video user percentage results are within ±8% of the reported values.

It was shown in [9] that the MOS is very sensitive to the Packet Loss Ratio

(PLR). The findings in [9] were that for PLRs below 1.5% correspond to a

MOS above 4 (perceptible degradation but not annoying). Assuming that a video user’s MOS is only affected by the PLR it experiences, we can state that the QoE of a video user will be high if the PLR is below 1.5% (i.e., its MOS will be greater than 4, and the video user will be satisfied). The

QoE will be low if the PLR is higher than 1.5% (i.e., its MOS will be lower than 4, and the video user will experience significant degradation). Fig. 5.2 shows the results in terms of video user percentage for which the MOS is greater than 4.

Our proposed FRA-PF scheme leads to a higher percentage of satisfied

51 5.2. Simulation Results and Discussion video users, which is expected as video users have unconditional priority over BE users. As can be seen from Fig. 5.2, for offered loads around 5.9

Mbps per sector, both PF and FRA-PF schemes are able to satisfy over

90% of video users. However the performance of the PF scheme degrades more quickly as the load increases: for offered loads around 6.8 Mbps per sector, the FRA-PF scheme can satisfy over 80% of video users whereas the

PF scheme satisfies less than 60% of video users.

Another aspect that we look into is the percentage of Clean Random

Access (CRA) LDUs lost. I-Frames are typically carried inside CRA LDUs and they represent the most significant portion of the bitstream in terms of bits. Because of the way the video compression process is defined in the H.265/HEVC standard, I-Frames are the frames that are referenced the most throughout a video sequence and the loss of an I-Frame causes error propagation within the decoding process at the receiver end. We aligned our settings for the Intra-Period so that two I-Frames are one second apart from each other [10].

Intuitively, the loss of an I-Frame causes the loss of about one second of video content to the end user because all subsequent B-Frames reference an I-Frame, directly or indirectly. Those B-Frames could, strictly speaking, still be usable by the decoder to produce a picture. The problem is that those B-Frames could potentially be incomplete, i.e. some sections could be missing Luminance or sample information. The whole idea behind H.265/HEVC is to use motion compensated prediction in as many frames as possible. Fig. 5.3 shows the results obtained for CRA LDU loss ratio. Since the proposed FRA-PF scheme is able to locate referenced frames

52 5.2. Simulation Results and Discussion

10

PF 9 FRA−PF

8

7

6

5

4

3 CRA LDU Loss Ratio [%] 2

1

0 5.6 5.8 6 6.2 6.4 6.6 6.8 7 7.2 Offered Load [Mbps / Sector]

Figure 5.3: CRA LDU Loss Ratio and transmit them with higher priority, FRA-PF has a lower CRA LDU loss ratio.

Let us consider the bitstream of the video sequence FourPeople as an example. The original video bitstream contains 9 CRA LDUs and 600 LDUs in total. Since we wrap the bitstream around 6 times, this results in a total of 54 CRA LDUs for a given user. With the PF scheme, the CRA LDU loss ration goes from 1.6% to 9.0% out of the total 54 CRA LDUs as the offered load changes from 5.9 to 6.9 Mbps per sector. This corresponds to at least 1 LDU or at worst 5 LDUs. For offered loads near 7 Mbps per sector, this means that as much as 5 seconds of video content becomes non- viewable because of the loss of CRA LDUs. With the proposed FRA-PF scheme, the CRA LDU loss ratio goes from 0.1% to 1.18% out of the total

54 CRA LDUs as the offered load changes from 5.9 to 6.9 Mbps per sector.

53 5.2. Simulation Results and Discussion

This means that in either case up to 1 LDU is lost. For offered loads near

7 Mbps per sector, this means that as much as 1 second of video content becomes non-viewable because of the loss of CRA LDUs. This highlights how the proposed FRA-PF scheme provides the decoder with the reference frames to facilitate the task of decoding and also how the proposed scheme locates the packets with greater importance for the H.265/HEVC decoder.

Providing referenced frames with greater priority helps maintain continuous playback at the end user and contributes to enhance the viewing experience of video users. From the user’s perspective, non-continuous video playback will always constitute a source of dissatisfaction. Our proposed FRA-PF scheme reduces the loss of packets carrying referenced frames, which will help maintain continuous playback.

5.2.2 Results for Best Effort users

For BE users, we report the absolute gains of the average throughput and the coverage throughput (which we defined in Section 5.1). The 95% confidence intervals of the average throughput and coverage throughput are within ±3% and ±9% respectively of the reported values.

The average throughput is plotted as a function of the offered load in

Fig. 5.4. The offered load values of interest to us are in the range of 5.9 to 6.9 Mbps per sector. From Fig. 5.4, it can be seen that the with the

PF scheduling scheme, BE users can expect to get throughputs on average between 15 Mbps and 10 Mbps. With our proposed FRA-PF scheme, users can expect to get throughputs on average between 16 Mbps and 12 Mbps.

This is explained by the fact that our proposed FRA-PF scheme serves video

54 5.2. Simulation Results and Discussion

16 PF FRA−PF 15

14

13

12 Throughput [Mbps]

11

10 5.6 5.8 6 6.2 6.4 6.6 6.8 7 7.2 Offered Load [Mbps / Sector]

Figure 5.4: Average throughput for Best Effort users traffic more quickly, as shown in Fig. 5.1. As video users are served more quickly, radio resources then become available to BE users. The availabil- ity of more radio resources helps BE users leave the network more quickly and therefore experience higher throughputs. Put simply: allocating the resources to the right users at the right time will benefit all users. This is shown by the results we have obtained in terms of the Resource Utilization by the network and the average throughputs users can get on average.

Fig. 5.5 shows the coverage throughput results for offered load values between 5.9 and 6.9 Mbps per sector. As expected, the coverage throughput is much lower compared to the average throughput. In an LTE-A system us- ing the PF scheduling scheme, users can expect to get coverage throughputs between 4.9 Mbps and 1.5 Mbps. In an LTE-A system using our proposed

FRA-PF scheme, for the same offered load values, users can expect to get

55 5.2. Simulation Results and Discussion

7 PF FRA−PF 6

5

4

3 Throughput [Mbps]

2

1 5.6 5.8 6 6.2 6.4 6.6 6.8 7 7.2 Offered Load [Mbps / Sector]

Figure 5.5: Coverage throughput for Best Effort users coverage throughputs between 6.1 Mbps and 3.6 Mbps. This result is a lot more significant than the average user throughput we have shown earlier.

It shows that 90% of users can expect a throughput of at least 3.6 Mbps, which is more than double the throughput with the PF scheduling scheme.

Because we model a BE type of service, this improvement in throughput translates into latency reduction since the volume of data to download is

fixed. For other services, e.g. Web Browsing, higher throughput can trans- late into noticeably faster loading times and enhanced Quality of Experience.

We stated that the 95% confidence intervals of the coverage throughput are within ±9% of the reported values, this is due to the fact that the statistics of users that experience relatively low Signal to Interference and Noise Ratio

(SINR) are very sensitive. We model random user arrivals in our simula-

56 5.2. Simulation Results and Discussion

Figure 5.6: Illustration of the outer 10% of the coverage area tions, which leads to inter-cell interference that varies with time. For users experiencing low SINR, even slight improvements or degradations can have very significants impacts on the eventual throughput they experience.

Finally, we examine the statistics for users that are geographically lo- cated within the area covering the outer 10% of the coverage area, as de- picted in Fig. 5.6, we will call this region the cell-edge region. The area √ A of a hexagon is calculated as A = 2 3a2, where a is the apothem of the hexagon. Using a hexagonal network deployment as shown in Fig. 4.1 and knowing that the inter-site distance is equal to 500 meters, we can easily

find that the apothem size is then 250 meters. The users in the cell-edge region are those who lie outside the inner hexagon, i.e. outside the hexagon of apothem a0 ' 237 meters. The results are shown in Fig. 5.7. For offered loads ranging from 5.9 to 6.9 Mbps per sector, users in the cell-edge region experience throughputs ranging from 11.4 to 8 Mbps with the baseline PF scheme. With our proposed FRA-PF scheme, users in the cell-edge region experience throughputs ranging from 12.2 to 9.9 Mbps per sector for the same offered load values. The trend is consistent with those for the average throughput and coverage throughput. However it is interesting to note that the average throughput of users in the cell edge region is higher than the

57 5.2. Simulation Results and Discussion

12.5

PF 12 FRA−PF

11.5

11

10.5

10

9.5 Throughput [Mbps]

9

8.5

8 5.6 5.8 6 6.2 6.4 6.6 6.8 7 7.2 Offered Load [Mbps / Sector]

Figure 5.7: Average BE user throughput in Cell-Edge region coverage throughput values reported in Fig. 5.5. This is because we generate users randomly over time, which leads to inter-cell interference varying over time. As a result, a user located in the cell-edge region but not interfered by neighbouring cells will still experience reasonably high throughputs, as Fig.

5.7 shows. Usual Full Queue simulation methodologies operate in a range where every cell in the network is always transmitting at all times, therefore every cell is always interfering users in neighbouring cells at all times. As a result, only an individual user’s radio conditions will determine whether high throughputs are achievable or not. Users located closer to their serving cell would suffer from lower path loss, this would translate to higher average

SNR and higher throughput. Using Finite Buffer simulation methodologies, this is no longer true due to users arriving randomly in the network and be-

58 5.2. Simulation Results and Discussion ing subject to the inter-cell interference that is present during the time the user is in the network. Of course, path loss always plays a significant role in dictating overall performance but this is now tempered by the fact that users arrive randomly in the network, which affects the inter-cell interference.

In order to provide better QoE to all users, resource allocation schemes should target users that require the lowest amount of resources in order to be satisfied. This will help the system deliver better user experience to all users in the network. The QoE of all users improves thanks to the departure of other users and our proposed scheme does that by serving video users faster.

This benefits all users in the network and helps provide a more consistent user experience across the whole network, which is in line with the objectives of future 5G networks.

59 Chapter 6

Conclusions and Future Work

This chapter summarizes the main contributions of the thesis and provides some suggestions regarding for future work.

6.1 Contributions

In this thesis, we addressed the topic of transmitting video content in 4G and beyond-4G networks by exploiting information about the way H.265/HEVC operates. Using knowledge of the coding structures, reference picture lists and the process through which the H.265/HEVC encoder transmits this information to the decoder, we proposed a cross-layer scheduling frame- work which allocates resources to video users that need to receive referenced frames.

Our performance evaluation of H.265/HEVC video-content delivery was made in a mixed-traffic environment using random user arrivals and finite- buffer traffic models. To the best of our knowledge, there is no similar work reported in the literature. Results showed that both video and BE users benefit from the proposed scheduling framework. Video users benefit from

60 6.2. Future Work reduced losses on packets carrying referenced frames while BE users benefit from improved throughput. The improvement for video users is achieved by tracking referenced frames and focusing resource allocation towards video users whenever their transmission queues have packets carrying referenced frames in the video sequence. As long as there are such frames in the transmission queue of a video user our proposed framework prioritizes these users and allocate resources to them. This allows video users to download video content more quickly and allows BE users to access resources more quickly, leave the network more quickly and enjoy higher throughputs on average as a result.

As we go towards 5G networks, the expectation from cellular networks is that they provide a consistent user experience across the coverage area.

Results showed that 90% of BE users can expect to get between 1 Mbps to 2 Mbps higher throughput using FRA-PF, which can potentially be the difference between excellent and mediocre in the Quality of Experience the user is getting. In addition, it was found that BE users in the cell-edge region of each cell actually experience much higher throughputs than the

10th percentile of the user throughput CDF. This shows that users that experience lower throughputs are not necessarily located in the cell-edge region but can in fact be much closer to the base-station.

6.2 Future Work

Several future directions can be pursued, depending on which side of the problem one wishes to focus on.

61 6.2. Future Work

If one were to focus on the communications side, one direction for future work could be to use an air-interface that is actually going to be used in

5G systems. In this work we considered the use of a LTE-A air interface with some 3GPP Release-12 features such as the Release 12 4-Tx Linear

Precoding. This is because at the time the work was undertaken, 3GPP was still working on Release 13 and no air-interface had yet been proposed for 5G systems so we did not have the opportunity to evaluate performance for such systems. Instead we focused more on performance evaluation using realistic traffic models over an up-to-date LTE-A air-interface and look at the performance metrics to be used in 5G networks.

In our performance evaluation, we did not compare our proposed FRA-

PF scheme with a scheduling scheme that would strictly prioritize users requesting video services over best effort users. It would be interesting to see whether such a scheduling scheme achieves improvements for both video users and best effort users. We also did not consider any admission control policies in our traffic models, which would regulate traffic arrival in high load situations and can have a significant impact on user experience.

Another direction for future work could be to look into traffic offloading schemes. Since 3GPP Release 8, the 3GPP community has been introducing support for heterogeneous networks. Smaller base-stations can be deployed in the cell-edge region in order to provide coverage to users with stringent

QoS or QoE requirements. For example: macro base-stations can offload specific users in the coverage area of small base-stations in order to provide better QoE to its own users, and therefore provide a more consistent user experience across the whole network, something that 5G networks will be

62 6.2. Future Work required to provide. The more general problem to address is to design scheduling frameworks which will provide the best user experience and at the same time maximize revenue for carriers.

If one were to focus on the video encoding or video compression side, one direction for future work could be the actual evaluation of subjective quality. No subjective quality testing was performed in our work. The major stumbling block that needs to be overcome is to get the reference implementation of the H.265/HEVC decoder to produce a viewable video sequence of a bitstream with missing LDUs. The reference decoder imple- mentation is not designed to be robust against any form of packet loss and aborts the decoding process at the slightest error or absence of an LDU. If we can reconstitute samples of bitstreams with missing LDUs and output the corresponding video sequence, it would be possible to do subjective quality testing and gain insights into how the loss of specific packets impacts the viewing experience. This will give much clearer insights into how packet loss and Quality of Experience are related for video services, and more specifi- cally how much the loss of packets carrying I-Frames hurts the Quality of

Experience.

63 Bibliography

[1] Cisco, “Cisco Visual Networking Index: Global Mobile Data Traffic

Forecast Update, 2014-2019,” February 2015.

[2] ITU-T, Advanced Video Coding for generic audio visual services - Rec-

ommendation ITU-T H.264. February 2014.

[3] ITU-T, High Efficiency Video Coding - Recommendation ITU-T H.265.

April 2013.

[4] M. Wien, High Efficiency Video Coding - Coding Tools and Specifica-

tions. Springer, May 2014.

[5] N.-G. M. Networks, “NGMN 5G White Paper,” February 2015.

[6] M. Rugelj, U. Sedlar, M. Volk, J. Sterle, M. Hajdinjak, and A. Kos,

“Novel Cross-Layer QoE-Aware Radio Resource Allocation Algorithms

in Multiuser OFDMA Systems,” IEEE Transactions on Communica-

tions, September 2014.

[7] S. Singh, O. Oyman, A. Papathanassiou, D. Chatterjee, and J. G. An-

drews, “Video Capacity and QoE Enhancements over LTE,” IEEE In-

ternational Conference on Communications, June 2012.

64 Bibliography

[8] M. Salem, P. Djukic, J. Ma, and M. Hawryluck, “QoE-Aware Joint

Scheduling of Buffered Video on Demand and Best Effort Flows,” IEEE

International Symposium on Personal, Indoor and Mobile Radio Com-

munications, September 2013.

[9] J. Nightingale, Q. Wang, C. Grecos, and S. Goma, “The Impact of

Network Impairment on Quality of Experience (QoE) in H.265/HEVC

Video Streaming,” IEEE Transactions on Consumer Electronics, May

2014.

[10] F. Bossen, “Common HM test conditions and software reference con-

figuration,” April 2012.

[11] G. Sullivan and T. Wiegand, “Rate-distortion optimization for video

compression,” IEEE Magazine, pp. 74–90, November

1998.

[12] T. Schierl, M. M. Hannuksela, Y.-K. Wang, and S. Wenger, “System

Layer Integration of High Efficiency Video Coding,” IEEE Transac-

tions on Circuits and Systems for Video Technology, pp. 1871–1884,

December 2012.

[13] Y.-K. Wang, R. Even, T. Kristensen, and R. Jesup, RTP Payload For-

mat for H.264 Video. IETF, May 2011.

[14] Y.-K. Wang, Y. Sanchez, T. Schierl, S. Wenger, and M. Hannuksela,

RTP Payload Format for H.265/HEVC Video. IETF, August 2015.

65 Bibliography

[15] F. Kelly, “Charging and rate control for elastic traffic,” European Trans-

actions on Communications, pp. 33–37, 1997.

[16] D. G. Luenberger and Y. Ye, Linear and Nonlinear Programming.

Springer, 3rd ed., 2008.

[17] P. A. Hosein, “QoS Control for WCDMA High Speed Packet Data,”

IEEE International Workshop on Mobile and Wireless Communications

Network, 2002.

[18] R. Srinivasan, J. Zhuang, L. Jalloul, R. Novak, and J. Park, “IEEE

802.16m Evaluation Methodology Document (EMD),” July 2008.

[19] “3GPP TR 36.814 v9.0.0 - Technical Specification Group Radio Ac-

cess Network - Evolved Universal Terrestrial Radio Access (E-UTRA)

- Further advancements for E-UTRA physical layer aspects,” March

2010.

[20] F. Khan, LTE for 4G Mobile Broadband. Cambridge University Press,

2009.

[21] “HM 14.0, HEVC Test Model Reference Implementation.” ://

hevc.hhi.fraunhofer.de/svn/svn_HEVCSoftware/. Accessed: 2014- 09-30.

[22] “IMTAphy, LTE/LTE-Advanced system level simulator.” http://www.

lkn.ei.tum.de/personen/jan/imtaphy/index.php. Accessed: 2014- 05-24.

66 Bibliography

[23] “openWNS, open Simulator, open source system

level simulation platform for performance evaluation and comparison

of wireless and multi-cellular mobile communication systems.” https:

//launchpad.net/openwns. Accessed: 2014-05-24.

[24] “3GPP TR 25.814 v7.1.0 - Technical Specification Group Radio Access

Network; Physical layer aspects for evolved Universal Terrestrial Radio

Access (UTRA),” December 2006.

[25] ITU-R, “Guidelines for evaluation of radio interface technologies for

IMT-Advanced,” December 2009.

[26] “3GPP TR 25.996 v9.0.0 - Spatial channel model for Multiple Input

Multiple Output (MIMO) simulations,” December 2009.

[27] D. Tse and P. Viswanath, Fundamentals of Wireless Communications.

Cambridge University Press, March 2010.

[28] “3GPP TS 36.213 v12.2.0 - Technical Specification Group Radio Ac-

cess Network - Evolved Universal Terrestrial Radio Access (E-UTRA)

- Physical Layer Procedures,” June 2014.

[29] “3GPP TR 36.829 v11.1.0 - Technical Specification Group Radio Access

Network - Enhanced performance requirement for LTE User Equipment

(UE),” December 2012.

[30] A. Roessler, J. Schlienz, S. Merkel, and M. Kottkamp, “LTE-Advanced

(3GPP Rel.12) Technology Introduction - White Paper,” June 2014.

67 Bibliography

[31] “3GPP TS 36.211 v12.2.0 - Technical Specification Group Radio Ac-

cess Network - Evolved Universal Terrestrial Radio Access (E-UTRA)

- Physical channels and modulation,” June 2014.

[32] J. Olmos, A. Serra, S. Ruiz, M. Garcia-Lozano, and D. Gonzalez, “Ex-

ponential Effective SIR Metric for LTE Downlink,” IEEE International

Symposium on Personal, Indoor and Mobile Radio Communications,

September 2009.

[33] W. Lei, T. Shiauhe, and M. Almgren, “A fading-insensitive performance

metric for a unified link quality model,” IEEE Wireless Communica-

tions and Networking Conference, April 2006.

68