<<

Master of Science Thesis in Computer Science Department of Electrical Engineering, Linköping University, 2018

Statistical multiplexing of for fixed distribution A multi- implementation and evaluation using a high-level media processing library

Max Halldén Master of Science Thesis in Computer Science

Statistical multiplexing of video for fixed bandwidth distribution — A multi-codec implementation and evaluation using a high-level media processing library

Max Halldén

LiTH-ISY-EX--18/5142--SE

Supervisor: Harald Nautsch isy, Linköpings universitet Patrik Lantto WISI Norden AB Examiner: Ingemar Ragnemalm isy, Linköpings universitet

Division of Information Coding Department of Electrical Engineering Linköping University SE-581 83 Linköping, Sweden

Copyright © 2018 Max Halldén Abstract

When distributing multiple TV programs on a fixed bandwidth channel, the of each video stream is often constant. Since video sent at a constant quality is typically wildly varying, this is a very unoptimal solution. By instead sharing the total bit rate among all programs, the video quality can be increased by allo- cating bit rate where it is needed. This thesis explores the statistical multiplexing problem for a specific hardware platform with the limitations and advantages of that platform. A solution for statistical multiplexing is proposed and evaluated using the major used for TV distribution today. The main advantage of the statistical is a lot more even quality and a higher minimum qual- ity achieved across all streams. While the solution will need a faster method for bit rate approximation for a more practical solution in terms of performance, the solution is shown to work as intended.

iii

Acknowledgments

I would like to thank everyone at WISI, and especially Patrik Lantto, for giving me the opportunity to do this thesis and helping me along the way, and for be- ing patient when some things were going a little slow. I would also like to thank Harald Nautsch for being my supervisor and Ingemar Ragnemalm for being my examiner. A special thanks to Ingemar for making sure this thesis was actually finished. I would also like to thank my opponent Niklas Norin. And above all a big thanks to my family and friends for all the support along the way, not only during the course of this thesis, but during my entire time studying at the univer- sity. Thank you all.

Linköping, Juni 2018 Max Halldén

v

Contents

1 Introduction 1 1.1 Motivation ...... 2 1.2 Aim ...... 2 1.3 Research questions ...... 2 1.4 Delimitations ...... 3

2 Background 5 2.1 Video coding standards ...... 6 2.2 Statistical multiplexing in broadcasts ...... 6 2.3 Intel Media SDK ...... 7

3 Theory 9 3.1 Video coding ...... 10 3.1.1 Independently and dependently coded frames ...... 10 3.1.2 Decoder models ...... 10 3.1.3 Data dwell time ...... 11 3.2 Rate control ...... 11 3.2.1 Quantization ...... 12 3.2.2 Rate-distortion theory ...... 12 3.2.3 Rate-quantization models ...... 13 3.2.4 Distortion-quantization models ...... 14 3.3 Video multiplexing using MPEG-2 Systems ...... 14 3.3.1 Elementary Streams and Packetized Elementary Streams . 15 3.3.2 Transport Streams ...... 15 3.3.3 System Target Decoder ...... 15 3.3.4 Statistical multiplexing methods ...... 17 3.4 Video quality metrics ...... 19 3.4.1 Peak Signal-to-Noise Ratio and Mean Squared Error . . . . 19 3.4.2 Structural similarity index ...... 20 3.4.3 Other quality metrics ...... 21 3.5 Intel Media SDK ...... 21 3.5.1 Bit rate control ...... 21

vii viii Contents

4 Method 23 4.1 Rate control with Intel Media SDK ...... 23 4.2 Evaluating rate and distortion characteristics for Intel encoder . . 23 4.3 Satisfying rate constraints ...... 24 4.4 Rate control model ...... 25 4.5 Distortion model ...... 25 4.6 Distortion and bit rate control ...... 26 4.7 Initial video stream offset ...... 26 4.8 Evaluation ...... 27

5 Results 31 5.1 Rate and distortion characteristics of Intel encoder ...... 31 5.2 Statistical multiplexing algorithm ...... 32 5.2.1 Rate-quantization and distortion-quantization models . . . 32 5.2.2 Maintaining bitstream conformance ...... 33 5.2.3 Distortion control ...... 34 5.3 Experimental results ...... 35 5.3.1 Off-line tests ...... 36 5.3.2 Live test ...... 37

6 Discussion 39 6.1 Conclusion ...... 41 6.2 Future work ...... 41

A Results of SVT multi format tests 45

B Algorithms 53

C Results of off-line test using ’s open movies 57

D Result of live test 67 1 Introduction

Television content is often distributed over fixed-bandwidth channels. This fixed- bandwidth requirement is at odds with the characteristics of a single video bit- stream. Typically, compressed video is encoded to achieve a constant video qual- ity and this means that the bit rate of properly compressed video will vary with the varying complexity of the video. The video can still be encoded with a fixed bit rate, but the resulting uneven quality will mean a higher bit rate will be needed to achieve a minimum acceptable quality level. When transporting mul- tiple video streams across a single channel with fixed bandwidth, these can be jointly encoded, each with a varying bit rate, to try to achieve the total bit rate of the channel, while maintaining as close to constant quality as possible across all video streams.

Sharing a single among several streams where only one stream is allowed to use the channel at a single point of time, with the purpose of increasing the total utilisation of the channel, is commonly known as statistical time division multiplexing. In this report we will simply refer to it as statistical multiplexing. The increased utilisation that can be gained from statistical multi- plexing is known as the statistical multiplexing gain.

The capabilities of a statistical multiplexer is very much dependent on the en- coder. The encoder used for this project is based on the Intel Media SDK. The Intel Media SDK is a framework for accelerated media processing using Intel hardware. It provides real-time encoding capabilities for developers. [31]

1 2 1 Introduction

1.1 Motivation

The bandwidth available in video distribution scenarios is often limited and the utilisation of the bandwidth is therefore of great importance. With statistical multiplexing, less bandwidth is needed to transmit video of the same quality. Statistical multiplexing is also becoming increasingly important for video distri- bution. With the adoption of the HEVC , the gains from statistical multiplexing are increased compared to previous codecs [44]. Being able to send more video streams on the same channel means that the quality of all streams can be increased by sharing bit rate. It becomes more likely that the high bit rate requirement of one stream can be “cancelled out” by the low bit rate re- quirement of another at the same quality. We can see the increased interest in statistical multiplexing in recent standards: The DVB-C2 for states that: “DVB-C2 shall allow achieving the maximum benefit from statistical multiplex method.” [11]. The DVB-S2X extension to the DVB-S2 standard for allows bonding of multiple channels to increase total throughput with the stated goal to reach higher statistical multi- plexing gains [13]. Intel Media SDK is an emerging technology showing great results for efficient hardware-accelerated video decoding and encoding and with added support for HEVC makes this an important technology for new video distribution technolo- gies. [42, 40] Statistical multiplexing is an on-going area of research which has been getting more relevant due to bandwidth becoming more scarce and the increasing adop- tion of higher resolution video. Most previous research uses the available ref- erence encoder for the relevant codec without regard to the limited availability of encoder internals when using closed-source libraries and applications such as the Intel Media SDK and without regard to the applicability across different video codecs.

1.2 Aim

This report will present a solution to the statistical multiplexing problem for video streams using the Intel Media SDK. The capabilities of the Intel Media SDK is explored in the context of statistical multiplexing. The performance and feasibility of the solution is evaluated with a prototype.

1.3 Research questions

The questions to be answered in this report is as follows: 1. How can the resulting video quality and bit rate, that is valid for use in a statistical multiplexer, be inferred from available data when using the Intel Media SDK? 1.4 Delimitations 3

2. How can a codec-agnostic statistical multiplexer be implemented using the Intel Media SDK and the result from question 1? Specifically, how can an even video quality be achieved, while both utilizing available bit rate and satisfying bit rate constraints? 3. What statistical multiplexing gains is achievable using the Intel Media SDK and the statistical multiplexer from question 2?

1.4 Delimitations

The use of the Intel Media SDK results in some obvious limitations as to what can be achieved by the statistical multiplexer. If a technique cannot be achieved using the Intel Media SDK then it will not be taken into consideration for this report. Further, we are limited in terms of what codecs the statistical multiplexer sup- ports. While the statistical multiplexer should support all codecs currently sup- ported by the Intel Media SDK, no further assumptions will be made in regards to the codec. Apart from the methods that can be applied to all supported codecs, the statistical multiplexing solution will not use those that are codec-specific. The codecs that are actually used for evaluating the statistical multiplexer is the ones most used in broadcasting today.

2 Background

Television programs was initially broadcast as analogue content on a single ana- logue channel using a fixed frequency band. With the adoption of digital trans- mission it became possible to transmit the content compressed a lot more effi- ciently. The need for a universal standard to transmit compressed media con- tent led to the MPEG-1 and MPEG-2 standards. The MPEG standards contain a number of parts, which each define a specific aspect of the transmission and com- pression of media content. MPEG-2 expands upon the MPEG-1 standard and is the one most widely used today. Notably, the MPEG-2 Systems part specifies a content-agnostic way of transmitting multimedia content across various channels. It is the reason why this fairly old standard is still a major part of broadcasting technology today, and an important part of this thesis. While the MPEG-2 audio and video compression has arguably been superseded, the Systems part can be (and has been) extended to include the new compression technologies, allowing new technology to be used on old infrastructure. [48]

A central concept of the MPEG-2 Systems part is the transport stream, which is the format used in broadcasting to send multiple programs in a single stream. The statistical multiplexing operation will produce a constant bit rate transport stream, containing all the multiplexed programs. A program is the video, audio along with eventual subtitles and other ancillary data. As this thesis exclusively focuses on the video stream, a program will sometimes just be referred to as a video stream.

5 6 2 Background

2.1 Video coding standards

The video coding standards used in this thesis are MPEG-2 Video/H.262 [24, 27], MPEG-4 AVC/H.264 [25, 28] and MPEG-H HEVC/H.265 [26, 29]. MPEG is the Moving Picture Experts Group, a working group of ISO/IEC, with MPEG- 2, MPEG-4 and MPEG-H being suits of standards proposed by the organiza- tion. H.262, H.264 and H.265 are recommendations from the Telecommunica- tion Standardization Sector of the International Union, ab- breviated as ITU-T. As mentioned MPEG-2, MPEG-4 and MPEG-H contains a number of standards and H.262 more specifically corresponds to MPEG-2 part 2, or ISO/IEC 13818- 2. H.264 corresponds to MPEG-4 part 10, or ISO/IEC 14496-10. Lastly, H.265 corresponds to MPEG-H part 2 or ISO/IEC 23008-2. H.264 is referred to as Ad- vanced Video coding (AVC) by MPEG and H.265 is similarly referred to as High Efficiency Video Coding (HEVC). For the rest of the report we will just refer to these standards as MPEG-2, AVC, and HEVC respectively. The three standards are all commonly used in broadcasting and can be seen as three milestones in video compression technology. MPEG-2 is the oldest of the three, but is still used extensively. AVC is chronologically between the other two and gives a 50 % efficiency increase over MPEG-2. It is the most common format seen today. HEVC provides yet another 50 % efficiency over AVC. It is currently gaining in relevance, especially with the increasing adoption of Ultra HD resolu- tion content.

2.2 Statistical multiplexing in television broadcasts

Television broadcast is usually done on fixed bandwidth channels, due to hav- ing to be sent modulated on a frequency band. This means that one or several programs are fitted into a single channel with fixed bit rate. When multiplexing a number of streams to create a constant bit rate transport stream, we have generally two alternative ways to do this. The simplest way is to divide the total bit rate among ell elemantary streams and encode all these with a constant bit rate. However, maintaining a constant bit rate is in conflict with the goal to maintain a constant video quality. The more optimal way is to encode all elemantary streams with a variable bit rate while making sure that the total bit rate doesn’t exceed the bit rate of the transport stream. This second alternative is what is called statistical multiplexing. All encoders of video has to solve the problem of rate control to be able to achieve the specified bit rate for the stream. The problem of rate control becomes more complex in the statistical multiplexing scenario due to having to control multi- ple encoders without exceeding the total bit rate. Maintaining constant quality across multiple video programs adds another dimension to the already complex problem of maintaining quality temporally within a single program. 2.3 Intel Media SDK 7

During the construction (multiplexing) of the transport stream, the bit rate of each video stream has to be constrained to make sure that any supported decoder can decode the bit rate. When encoding the video streams with a constant bit rate, this constraint is trivially solved by setting the bit rate to be less than the maximum bit rate as specified by the decoder. To be able to benefit from statisti- cal multiplexing, the video needs to be encoded with a variable bit rate and the buffer level of the decoder has to be tracked to avoid buffer over- and underflow. A typical encoder expects that the decoder receives data at a certain rate and has a certain buffer size. When multiplexing a video stream into a transport stream, the same constraints has to be respected, along with some additional constraints that similarly guarantees that the transport stream can be demultiplexed as well. Statistical multiplexing typically has two different categories of use-cases, with differing requirements. The first category is during the initial encoding of the broadcasted content. Typically the recipients is many enough that emphasis is on getting the best possible quality, cost is less of an issue. The second category is when combining content from multiple sources to be rebroadcast, typically to a cheaper medium and a smaller audience. While quality is still important, this scenario is a lot more cost- and performance-constrained as the cost of a better statistical multiplexing solution is not just automatically offset by the decreased bit rate requirements and there is a desire to not add latency to the network.

2.3 Intel Media SDK

The Intel Media SDK is an SDK provided by Intel to access the hardware-accelerated media processing capabilities of Intel processors. The dedicated hardware for media processing has been a part of the integrated graphics hardware of Intel’s processors since the Sandy Bridge architecture. There is some added confusion around the naming of the SDK, since the library actually called Intel Media SDK [21] has a close sibling called the Intel Media Server Studio [22], which is said to contain the Intel Media SDK. Despite this, there is some obvious differences between the two offerings, such as differing platform support. Although the Intel Media Server Studio’s version of the Intel Media SDK is used in this thesis, it will just be referred to as the Intel Media SDK.

3 Theory

There are several important areas of knowledge needed for the design and imple- mentation of a statistical multiplexer. First of all we need a more thorough ex- planation of what is meant by “multiplexing”, which meaning can be somewhat ambiguous. In this report multiplexing means combining multiple elementary streams with variable bit rate containing video into a transport stream with con- stant bit rate. The “statistical” part is about how to allocate bit rate to the video streams in such a way that the overall video quality is maximised.

There are several questions that need to be answered for us to fully understand this problem. Firstly, the transport and elementary streams need to be defined. The rate allocation needs to be explored and related to this is some details about the video encoding itself, mainly how video encoding techniques influence mo- mentary bit rate. Also, what we mean by quality needs to be defined. Although video quality is often approximated using relatively simple metrics, it doesn’t necessarily correlate that well with subjective quality as perceived by users.

The main problem in statistical multiplexing is to jointly control the bit rate of all individual video streams to achieve high and/or consistent video quality over time as well as across all the streams while maintaining compatibility with the system target decoder described in section 3.3.1. This means that bit rate control in general is most often a part of any proposed solution and specific details of the encoding used is often, to various degrees, included in the algorithm.

This chapter aims to introduce these necessary concepts as well as what has been done in previous research in this area.

9 10 3 Theory

3.1 Video coding

To construct a statistical multiplexer it’s not necessary to construct the encoder as well, and often a statistical multiplexing solution depends on an existing encoder. In this thesis, the Intel hardware encoder is used, and as such we do not have to (and to some extent, can not) reason about the encoder implementation. Still, some high-level knowledge of common video compression techniques are needed, specifically the high-level compression techniques that highly impacts the rate characteristics of video streams. Also, the decoder models used to define the bitstream’s conformance with the codec standard is presented.

3.1.1 Independently and dependently coded frames A typical scheme consists of finding similarities within the image and making sure to only save the similar information once. One reason for what makes video compression even more efficient is that we can also find similarities between frames leading to higher compression rates than what can be found with single images. Differential coding is when a is coded us- ing the difference from the previous frame. Since a decoder should be able to resume playback from anywhere in the stream, we cannot just use differential encoding though. About once or twice a second a key frame, also called I-frame, is sent which is a full frame that doesn’t depend on any other frame. Every frame between I-frames is differentially encoded frames, that depends directly or indi- rectly on an I-frame. This leads to very large spikes in bit rate every time an I-frame is sent. An I-frame followed by all dependent frames are called a group of pictures (GOP). [43] Most video codecs use this system of sending a full frame followed by differen- tially coded frames, meaning that most video streams (and all covered in this the- sis) will have predictable bit rate variations within a GOP. This can be exploited by shifting each stream to avoid overlapping I-frames, as is done by, for example, Polec et al. [39]. Since encoders typically doesn’t have fully synchronized clocks, this solution is impractical in most scenarios when streaming for longer periods of time.

3.1.2 Decoder models To verify the compatibility between encoders and decoders, a decoder model for each video encoding standard is specified. This is a virtual decoder that can be implemented at the encoder to guarantee that all decoders that supports that specific codec will be able to decode the generated bitstream. The decoder model emulates the behaviour of a decoder, by defining a number of buffers and how the data moves between these buffers according to bit rate, buffer levels, and the frame’s timing. The outline of the decoder models used are presented here, but we refrain from going into detail since it’s a bit out of scope for this thesis. We will mainly concern ourselves with the System target decoder (see section 3.3.3), but there is some 3.2 Rate control 11 unavoidable overlap between the system-level decoder models and the decoder models as used by the encoder. Virtual buffer verifier

The MPEG-2 video standard defines the Virtual buffer verifier (VBV), which con- sists of the Elementary stream buffer in the System target decoder model, an instantaneous decoder and a Picture re-ordering buffer for decoded frames. At decode time, the picture will be instantaneously removed from the Elementary stream buffer. If the presentation time is not the same as the decode time, then the decoded picture is kept in the Picture re-ordering buffer until it’s presenta- tion time. [48] Hypothetical reference decoder The Hypothetical reference decoder (HRD) for the AVC and HEVC standards is the equivalent of the VBV for the MPEG-2 video standard. It contains a Coded picture buffer (CPB), an instantaneous decoder and a Decoded picture buffer (DPB). At decode time, a frame is instantaneously removed from the CPB and the decoded data is added to the DPB. The decoded frame can, unlike in the VBV model, be kept in the DPB after it’s presentation time if it is referenced by later frames. The CPB is equivalent to the elementary stream buffer as used by the System target decoder and the VBV model. [48, 10]

3.1.3 Data dwell time The time between when a frame arrives at the decoder and it is decoded is called the data dwell time. It is the time the frame spends in the decoder’s buffer. This is an important metric since it determines the amount of temporal buffering used. The MPEG-2 systems standard gives an upper limit to the data dwell time of 1 second [23]. A user switching to this stream will have to wait this period before the decoder will be able to present the first frame and so this value will have to be limited to keep this waiting period reasonably low.

3.2 Rate control

The bit rate variations of the video, aside from variations within a GOP, are depen- dent on the compression used and the complexity of the current scene. Adjusting the encoding to achieve the wanted bit rate is commonly known as rate control. There is typically two main approaches to rate control: variable bit rate (VBR) and constant bit rate (CBR). There is no strict definition here, but generally VBR tries to maintain constant video quality while varying bit rate and CBR maintains a target bit rate, usually with varying quality as a result. The bit rate is usually controlled by the encoder by setting the quantization step-size. Using VBR also give rise to the problem of actually allocating the bits to achieve an even quality level along with maintaining the correct bit rate. To maintain a constant bit rate, the encoder parameters has to be inferred from the target bit 12 3 Theory rate. For VBR, the current bit rate also usually has to be inferred given a target average bit rate or quality level. This makes it a harder problem to solve than for CBR, but with better utilization of bit rate as a result. 3.2.1 Quantization Quantization means mapping a continuous or discrete signal with more informa- tion onto a discrete signal with less information. The quantization used during video compression is commonly controlled by the quantization step-size. Essen- tially, the bit rate of the compressed video is usually controlled by this value, where a higher quantization means more compression and a lower bit rate. [43] The AVC codec introduced a Quantization Parameter (QP), which controls the quantization. This is an integer between 0 and 51 where the corresponding quan- tization step-size is doubled for every 6th increase in QP. The quantization step- size is close to proportional to the resulting bit rate [52]. 3.2.2 Rate-distortion theory Rate-distortion (R-D) theory is often used for optimal bit allocation by finding the bit rate as a function of the distortion often represented as a rate-distortion curve. The rate-distortion relationship is commonly found using analytic reasoning or a parametric model. In all there is a trade-off between bit rate and distortion and rate-distortion theory is the tool we use reason about this trade-off to be able to make informed decisions for rate control during the encoding. Typically, a rate-distortion curve can be found similar to what is presented in figure 3.1.

Figure 3.1: A typical rate-distortion curve distortion

bit rate 3.2 Rate control 13

When using the QP as the control parameter for rate control, two functions are needed. The rate-quantization function models the relationship between the rate and quantization step-size, often with QP as the dependent parameter. The distortion-quantization function instead models the relationship between distor- tion and quantization step-size.

3.2.3 Rate-quantization models The relationship between quantization and bit rate has been extensively researched. Some of the previously proposed methods are presented below. Quadratic rate-quantization model The quadratic rate-distortion model gives the rate R as a quadratic function of the distortion D as given in equation 3.1. The model is fitted using statistics of previous D and R values to determine the parameters a and b in the model. By approximating the distortion D as the quantization Q used for the frame during encoding we get equation 3.2. [6]

1 2 R(D) = aD− + bD− (3.1)

1 2 R(Q) = aQ− + bQ− (3.2)

Rate-quantization modeling in the ρ-domain The ρ-domain rate model is based on the observed linear relationship between the number of zeros among the quantized transform coefficients and the coded bit rate. Proposed by He and Mitra [18], the experimental results from this method has been good, and is used for rate control in the statistical multiplexing solution by He and Wu [19]. The number of zero transform coefficients are found in an initial encode pass and these are then used during the “real” encoding. Unfortunately, the limitations of the Intel Media SDK means that we do not have access to the transform coefficients without implementing some custom pre- encoding step and thus is not possible for this thesis. Rate-quantization modeling in the λ-domain Given that the allocation is done by finding the optimal point on a number of rate-distortion curves, and also given that the gradient of all these curves are monotonically decreasing, any optimal solution will be found where the gradient (the λ) are the same for all curves. The rationale behind this is that at that point reallocating bits from one curve to any other can’t decrease the total distortion since decreased distortion per bit isn’t greater anywhere else. This extremum is the global distortion minimum for the resulting total number of bits since the gradient of the rate-distortion curves are monotonically decreasing. The tricky part here is to find the correct gradient, or λ, for the given number of total bits. 14 3 Theory

When solving the rate allocation problem using the Lagrangian optimization method, the bit rate is implicitly determined as a function of the Lagrangian multiplier λ. The quantization is then inferred from λ. [35]

Other rate-quantization models

The rate control method used by Changuel et al. [4] is a simple model that works well given that it is only applied on GOP-level. On frame-level, the inter-dependencies between frames make things more complex. The relationship given in equation 3.3 describes the rate R as a function of the Quantization Parameter Q, leaving a and b to be found empirically.

R(Q) = a exp( bQ) (3.3) − 3.2.4 Distortion-quantization models The quantization step-size is intricately linked to distortion, but all we really can say is that increasing the step-size will increase the distortion. When comparing the actual distortion there is a need to model the distortion as dependent on the quantization in some way. There exists some models for finding the distortion, given the quantization level. Guo et al. [15] proposes an analytical quantization-distortion model that goes into some depth of the internal workings of the encoder. It approximates the distortion value fairly well, but given the dependence on access to encoder data and that it approximates a fairly simplistic distortion value, makes it not really valid for our use-case. Many rate control methods just assume the quantization used to be equal to the distortion. It is obvious that a smaller quantization step-size gives a lower distor- tion for any given frame, but this strategy could be problematic when comparing the impact quantization has on different frames. Equaling the quantization to distortion assumes that different quantization levels has the same impact on all frames, which is hardly true. The quantization-distortion relationship also be- comes more complex when taking into account not only the distortion within a single image, but also the varying quality levels between adjacent frames, which might adversely affect perceived quality. Section 3.4 discusses the commonly used quality metrics, but none of these take temporal changes in quality into account.

3.3 Video multiplexing using MPEG-2 Systems

This section covers how the multiplexing of multiple media content streams is done as described by the MPEG-2 Systems standard. This isn’t a thorough expla- nation of the MPEG-2 Systems standard, but will present our specific use-case of sending multiple video streams in a single transport stream. As a consequence 3.3 Video multiplexing using MPEG-2 Systems 15 only the case where we want to multiplex a number of video streams is covered, although a transport stream can contain all sorts of media. 3.3.1 Elementary Streams and Packetized Elementary Streams The encoded bitstream as produced by an encoder is referred to as an elemen- tary stream, with the specific format being dependent on the codec used. When multiplexed and sent over a transport stream, the elementary stream is packe- tized into a packetized elementary stream (PES). The PES header contains general , most importantly timing information about when the frame contained is supposed to be decoded and displayed. The timing information is kept in the PES header for each frame as two values, the Decode Timestamp (DTS) and the Presentation Timestamp (PTS). Both of these values are sampled from a 90 kHz clock. The DTS indicates the time when the frame should be decoded and the presentation indicates the display time for the frame. The decode time is important for determining when the frame leaves the decoder buffer as explained in section 3.3.3. 3.3.2 Transport Streams A transport stream consists solely of 188 byte-sized packets, identified using a unique packet identifier (PID), carried in the header of each packet. Each video stream is associated with a specific PID and the corresponding PES is again pack- etized and carried in the transport packets that belongs to that specific PID. The transport stream packets also carries the Program Clock Reference (PCR) of each of the video streams which indicates the timestamp of the packet as sampled from a 27 MHz clock at the . A transport stream containing multiple programs are sent at a constant bit rate. A constant bit rate transport stream is a different concept than a constant bit rate video stream. Where a video stream are a bit more diffuse about what a constant bit rate is (the main question being over what time period the stream is considered being constant), a constant bit rate transport stream means that the transport stream packets are sent at a fixed interval. When there is no real data to send the stream is padded with null packets. 3.3.3 System Target Decoder The MPEG systems standard [23] defines a system target decoder, which is a virtual demultiplexer and decoder model which defines the timing and bit rate requirements of a conforming transport stream. It serves the same purpose as the decoder models defined by the video codecs being transported and is to be seen as an extension of those models. The system target decoder exists in different versions, the one covered here is the transport stream system target decoder (T-STD). It defines the decoding be- haviour of audio, video and other data, such as program metadata. Figure 3.2 shows a simplified version of the T-STD model. Each transport packet is sent to 16 3 Theory

Figure 3.2: Overview of the transport system target decoder. The empty boxes represents buffers with a well-defined size, input and output bit rate.

Figure 3.3: The video pipeline from the transport system target decoder. the correct pipeline, which contains a number of buffers. Given the buffer sizes, and bit rates between each buffer, conformance is defined by the under- and over- flow requirements of each buffer. The only part that is of interest for this thesis is the video decoding pipeline, which is presented in more detail in figure 3.3. The first buffer is the transport buffer and it contains the transport stream packets. The packets are received by the buffer at the bit rate of the transport stream, and emptied at 120 % of the max bit rate of the video stream. The transport headers are stripped before they are sent to the multiplexing buffer, which are emptied at 100 % of the max bit rate of the video stream. The PES headers are then stripped before reaching the last buffer, which is the elementary stream buffer, and equivalent to the buffers for encoded data in the decoder models for the current codec (see 3.1.2). Each frame is removed from the elementary stream at it’s decode time (defined by it’s DTS).

The purpose of the the first buffer is mainly to limit bit rate reaching the decoder, since the bit rate of the transport stream often is a lot higher than that of an in- dividual video stream. The size of this buffer is thus kept intentionally small at 512 bytes. The multiplexing buffer exists to handle the extra overhead from the PES packetization. It can also provide extra buffering when the size of the ele- mentary stream buffer is less than maximum for the current video stream. Here we assume that the elementary stream is sized as large as possible though. The 3.3 Video multiplexing using MPEG-2 Systems 17 elementary stream buffer handles most of the actual buffering of the video, and as mentioned is the only one directly connected to the video decoder. The requirements for the buffers given the sizes and bit rates is that the transport and multiplex buffers should not overflow. The elementary stream buffer just stops accepting bits from the multiplexing buffer when it is full and thus does not overflow. The transport and multiplexing buffers should also empty once every second. The elementary stream should not underflow, meaning that all frames must be in the elementary stream buffer at it’s decode time. 3.3.4 Statistical multiplexing methods A statistical multiplexing solution combine a number of video streams while mak- ing sure that the bit rate constraints of the individual streams as well as the total bit rate constraint is satisfied. Figure 3.4 shows the typical design of a statistical multiplexer. Often the input frames are put through some kind of pre-processing step (although this is not necessarily needed), then a decision is made as to how to allocate the total bit rate across all frames. There are two kinds of solutions used to do the allocation; either using an open or closed loop. A closed loop uses feed- back from previous decisions to try to extrapolate the rate-distortion behaviour of the future frames. The feedback is indicated in figure 3.4 by the dashed lines. An open loop solution lacks this feedback.

Figure 3.4: Architectural overview of a typical statistical multiplexer for video.

A common method to do frame-level bit allocation, which has been used with some success [36, 3], is to use some kind of complexity measure for the raw frame 18 3 Theory and using this as a weight function for allocating the bits among frames. He and Wu [19] proposes a method for statistical multiplexing of AVC streams. They allocate the total number of bits available to maintain an even video qual- ity by using a simple buffer model that determines the initial buffer level and the total number of output and input bits subsequently, using the current input and output bit rate of the decoder buffer. The allocation problem is setup as a constrained minimization problem aimed at minimizing a global distortion to be achieved across all programs, while fulfilling all buffer constraints using the sim- ple buffer model. Using a ρ-domain rate control method, the expected bit rate for a fixed distortion value is calculated. The bit rate for each program is then given as proportional to the expected bit rate needed. To find the real distortion value to be used for the encoding, different distortion values is tested until one is found that satisfies all constraints. The authors’ testing shows that this method achieves a 40-50 % reduction in needed bandwidth to reach the same quality as compared to a constant bit rate scheme. The method proposed by Changuel et al. [4] is another statistical multiplexing solution for AVC coding, and differs from the others in that is allocates bits on GOP-level and that it uses a PID-controller to maintain buffer levels. By setting up constraints for maximum bit rate, minimum distortion, “smoothness”, mean- ing minimal PSNR variations, and “fairness”, meaning an even quality across the different programs, they proceed to maximize the PSNR value given these con- straints by allocating bit rate per GOP. Pang et al. [38] categorises statistical multiplexing algorithms as either minimis- ing distortion variance or average distortion. They categorise the method pro- posed by He and Wu [19] as the former and proposes themselves an algorithm in the latter category, also for AVC coding. Their experimental results shows the proposed algorithm to be better than previous algorithms that minimises average distortion. The statistical multiplexing solution allocates bits on the frame level and then uses bit rate control to determine the quantisation parameter per macro block. Using a previously proposed model for the rate-distortion relationship between dependent frames, the statistical multiplexing problem is solved over a number of future frames, just like was done by He and Wu [19]. This problem is relaxed into a convex optimisation problem that can be solved efficiently with well-known methods. Blestel et al. [3] proposes a simpler generic statistical multiplexing solution that isn’t dependent on a specific encoder. By assuming the rate control is a given part of the encoder, they propose just a bit allocations scheme, and assume that they can set the bit rate directly. This abstracts away a lot, most often codec-specific, calculations. Still, the calculated distortion for each program is based upon a very simple relationship between distortion, complexity, and bit rate which as- sume the AVC codec is used. The distortion value is provided by a pre-encoding step which partly simulates the real encoding. But given this complexity value and relationship, the problem is solved using Lagrangian optimization such that either maximum or overall distortion can be minimized. 3.4 Video quality metrics 19

The solution differs from the others in that it makes a point of decoupling the dif- ferent parts of the statistical multiplexer. It assumes that both encoder and rate control are opaque entities that can be assigned a target bit rate. It also uses an open-loop solution to avoid the coupling between the multiplexing stage and ear- lier stages. The solution itself uses a pre-encoding analysis stage to determine the complexity of the incoming raw frames. It uses this complexity value for a simple rate-distortion model to find the optimal bit allocation, minimizing PSNR.

3.4 Video quality metrics

So far, the distortion and quality of the video has only been brought up as abstract concepts. This section aims to bring a bit more clarity to the meaning of these concepts, and how they are measured. The only final and absolute metric for video quality assessment is subjective user tests. ITU-T [30] defines a standard method to conduct these tests. Since these kinds of tests are expensive to do, objective video quality metrics are often used instead. These metrics are well-defined measurements that usually measures the distortion in the image. There are three major areas of objective video quality assessment methods: no- reference (NR), reduced reference (RR), and full reference (FR). NR methods uses only the compressed video for quality assessment, without access to the uncom- pressed original. An example of NR methods are those that try to find common compression artifacts in the image. The NR methods have the advantage of be- ing computationally simpler, in that they don’t require any access to reference material, but lacks the performance of the FR and RR methods. The RR and FR methods has access to some, respectively all, of the original video when assessing the video quality. FR yields the best results, with some methods coming close to the “real” subjective metric as defined by user tests, but is not usable in real-time applications due to the computational complexity. Simpler FR methods exists and is commonly used, such as peak signal to noise ratio (PSNR). [7, 32] For the evaluation of rate control algorithms such as those for statistical multi- plexing, an FR metric is commonly used because of it giving the best accuracy and performance not being critical when used simply for evaluation. 3.4.1 Peak Signal-to-Noise Ratio and Mean Squared Error The most commonly used metrics are the Peak Signal-to-Noise Ratios (PSNR) and Mean Squared Error (MSE). Both of these measure the amount of information loss in the signal while treating all information as equal and measures information loss as the squared distance between the original and compressed signals. The difference is that PSNR measures the error, or noise, as relative to the max ampli- tude of the signal and commonly uses the decibel scale, while MSE measures the absolute error. PSNR is defined for a discrete signal, using the decibel scale, as [43] 20 3 Theory

xmax PSNR = 10 log10 P dB (3.4) 1 N (x y )2 N n=1 n − n The definition of MSE is [43]

N 1 X MSE = (x y )2 (3.5) N n − n n=1

In both equation 3.4 and 3.5 the input frame is a vector with pixel samples (x1 . . . xn) of length N and the output signal after compression is a correspond- ing vector (y1 . . . yn) of equal length. xmax is the largest possible value of a single sample. PSNR has been shown to correlate with quality and is usable as a quality compar- ison tool, but only as long as the codec and content remains the same [20]. PSNR isn’t ideal for assessing quality levels across the different video streams in the sta- tistical multiplexing scenario since the different programs typically has varying content.

3.4.2 Structural similarity index The Structural Similarity Index Metric (SSIM) [51] measures image quality by looking at a combination of luminance, contrast, and structure differences be- tween the original and compressed images. This metric has been used extensively in rate optimization research [50, 53, 37, 14, 9] to try to achieve results that cor- relate better to the quality as perceived by the user. The method defines the luminance l(x, y), contrast c(x, y) and structure s(x, y) for input x and output y as

2µxµy 2σxσy σxy l(x, y) = 2 2 , c(x, y) = 2 2 , s(x, y) = (3.6) µx + µy σx + σy σxσy where µx, µy are the mean of x, y, σx, σy are the variance of x, y and σxy is the covariance of x and y.

The metric itself is then defined as a the product of these with added constants C1 and C2 for numerical stability when the denominators are close to zero. Equation 3.7 shows the resulting product of the three metrics, and equation 3.8 shows the definition of the SSIM metric as the same product, modified with the extra constants for numerical stability.

4µxµy σxy l(x, y) c(x, y) s(x, y) = 2 2 2 2 (3.7) × × (µx + µy)(σx + σy ) 3.5 Intel Media SDK 21

(2µxµy + C1)(2σxy + C2) SSIM(x, y) = 2 2 2 2 (3.8) (µx + µy + C1)(σx + σy + C2)

3.4.3 Other quality metrics There exists a lot other quality metrics, as well as several variations of the PSNR and SSIM metrics. This is an on-going area of research and while there exists a plethora of different metrics to choose from, there is no definitive consensus of what best conforms to subjective user tests. A recent paper by Chen et al. [5] is a good resource for the current state of art regarding the measuring of video quality. Also, Netflix has recently chimed in with a quality metric of their own [1], which uses a number of different metrics, weighted by fitting them to a large data set of subjective user tests. It should be noted that none of these more advanced metrics are used in the same extent as SSIM, and especially PSNR/MSE.

3.5 Intel Media SDK

The Intel Media SDK handles the encoding in a more-or-less opaque way. The wanted codec can be chosen, along with some configuration of the rate control method used. There are some possibilities for finer control of the encoding pro- cess, but these apply only for the AVC codec. The Intel Media SDK doesn’t support anything apart from the encoding itself, meaning the multiplexing has to be done by an external part. With an API that is largely the same for different codecs, a statistical multiplexing solution can be found that is generic for the different supported codecs of the Intel Media SDK.

3.5.1 Bit rate control The Intel Media SDK contains several rate control algorithm implementations for various use-cases; both VBR and CBR, as well as variations thereof. It also enables the developer to set the quantization used directly. Below we shortly introduce the specific bit rate control methods of the Intel Media SDK that was used in this thesis. [45] Not all rate control algorithms are supported for all codecs, but all of the algo- rithms covered here support MPEG-2, AVC and HEVC encoding with hardware support. Constant bit rate (CBR) The constant bit rate algorithm tries to maintain a constant bit rate, padding the frames with unused bytes when needed. This results in less bit rate fluctuations. 22 3 Theory

Constant Quantization Parameter (CQP) To also allow for custom rate control algorithms, the Intel Media SDK provides a mode called Constant Quantization Parameter. The Quantization Parameter is a single value that is set to control the quantization, and thereby the “lossy- ness” of the encoding, further described in section 3.2.1. The name is a bit mis- leading, since although the default QP is set during initialization, it can also be dynamically set during encoding. This mode is recommended [45] for more ad- vanced use-cases or when something other than the given rate control algorithms is needed. 4 Method

This section describes how the statistical multiplexing solution was designed, as well as how the tests and evaluations of the Intel Media SDK and the statistical multiplexer was conducted. The high-level design choices of the statistical multiplexing solution is presented and motivated, as well as related to solutions from previous research.

4.1 Rate control with Intel Media SDK

Out of the different choices of rate control methods available in the Intel Media SDK, the CQP method was chosen since it was the only one that fully supported dynamically changing the bit rate, and it is also the most flexible. Using any other rate control method with the Intel Media SDK will make it more difficult to dynamically adjust bit rate. Note that this means that we have a control parameter that is fairly close to the resulting quality, but makes it harder to predict the bit rate. Compare to the solution by Blestel et al. [3], which uses the bit rate as control parameter, and thus mainly focuses on approximating the resulting quality.

4.2 Evaluating rate and distortion characteristics for Intel encoder

To illustrate the rate and distortion characteristics of the Intel encoder for dif- ferent QP a number of tests were done by encoding a lossless sequence using a

23 24 4 Method range of constant QP values. The frame complexity was approximated by the av- erage gradient of the frame as defined by equation 4.1 as Cgradient, for an image of size X Y with pixel values defined for all image positions (x, y) by the func- tion P (x, y×). This is a fairly common and simple complexity metric that has been determined to work fairly well, for example in works by Yao et al. [54] and Wang et al. [49].

X Y 1 X X C = 2P (x, y) P (x 1, y) P (x, y 1) (4.1) gradient X Y − − − − × x=2 y=2 The Intel Media SDK was tested by using the CQP rate control method to encode 5 10-second clips from the SVT Multi Format [16] collection. The format used was 1920x1080, interlaced with 25 Hz. The clips was combined into one continuous sequence and then encoded with the FFmpeg version and com- mand shown in listing 4.1. The sequence was then transcoded using all possible values of QP (1-51).

Listing 4.1: FFmpeg version and options used for encoding test sequences. $ version ffmpeg version− N 86011 g36cf422 Copyright (c) 2000 2017 the FFmpeg developers− − − built with gcc 4.8.5 (GCC) 20150623 (Red Hat 4.8.5 4) configuration : enable libmfx enable nonfree − libavutil−− 55. 62.100− / 55.−− 62.100 − libavcodec 57. 95.101 / 57. 95.101 libavformat 57. 72.101 / 57. 72.101 libavdevice 57. 7.100 / 57. 7.100 libavfilter 6. 89.100 / 6. 89.100 libswscale 4. 7.101 / 4. 7.101 libswresample 2. 8.100 / 2. 8.100

$ ffmpeg s 1920x1080 r 25 pix_fmt yuv420p i 5_oldtowncross− .yuv vcodec− libx264− qmin 1 qmax− 1 b : v 19M minrate:v 19M maxrate:v− 19M bufsize:v− 3M f mpegts− − − − −

The distortion of the resulting compressed video was evaluated using the open source Video Quality Measurement Tool (VQMT) [17].

4.3 Satisfying rate constraints

The bit rate constraints of the multiplexed stream was guaranteed using a packet scheduler that tracks the buffer levels in the T-STD model while constructing the transport stream. The packet scheduler sends the video data of the multiplexed stream with equal priority while making sure data only is sent if the T-STD model 4.4 Rate control model 25 allows it. The packet scheduler only limits bit rate and the other parts of the algorithm have to make sure that the bit rate of each stream is sufficiently small that each frame can be sent on time. The stream-specific parameters of the T-STD model, shown in figure 3.3, was chosen according to the settings and profiles of the codec that is supported by DVB [12]. Table 4.1 summarizes the stream-specific T-STD settings for the used codecs. While the size of the elementary stream buffer can be chosen by the encoder, the implementation in this thesis use the maximum size allowed. This has limited effect on the T-STD, since a smaller elementary stream buffer only means that the multiplexing buffer will be larger to compensate.

Codec Maximum bit rate Elementary stream buffer size MPEG-2 80 Mbit/s 1222.656 KB AVC 24 Mbit/s 3750 KB HEVC 22 Mbit/s 2750 KB Table 4.1: The maximum bit rate and elementary stream buffer size used by T-STD for HD content.

4.4 Rate control model

No simple single-pass rate control method was found to work well enough to avoid violating bit rate constraints due to approximation errors. For this reason, a two-pass rate control method was used, where a very simple rate control model was used to approximate the decrease in bit rate that could be had by increasing the QP for the second pass. The two-pass approach is what gives this method any viability and the model itself is just an obvious step up from a linear model when noting that the rate-quantization relationship is not that linear. The main problem with this approach is the performance penalty of encoding each frame twice. Errors in bit rate estimations is compensated for as soon as the resulting frame is returned by the encoder, limiting the impact of the errors.

4.5 Distortion model

The distortion model was chosen to be approximated by the quantization used by the encoder, assuming that a larger quantization step-size gives a higher dis- tortion in equal measure across all programs and frames. Modeling the distortion more exactly becomes a lot more complex and it was not deemed feasible in the scope of the project, considering that it should be valid for all used codecs and that the Intel encoder does not give us access to all the information typically needed for more exact distortion models. 26 4 Method

The feasibility of this approach was evaluated together with the rest of the algo- rithm, by evaluating if this resulted in an equal distortion level across the multi- plexed programs according to the used video quality metrics.

4.6 Distortion and bit rate control

The core part of the statistical multiplexer is to use the rate-distortion model to set the encoder parameters in such a way that the packet scheduler is able to send each frame on time, while trying to maximize video quality.

This is done by calculating a global distortion setting that will yield a bit rate that can be sent on time, given the rate-distortion model. The algorithm assumes that the the bit rate is mainly limited by the total bit rate of the transport stream, not the bit rate of the individual elementary streams. While the bit rate of the individual streams is taken into account during the calculation, a too high bit rate of a single stream will result in a higher distortion setting for all streams, without full bit rate utilization. Since the maximum bit rate of the individual streams was chosen to be as high as possible (see table 4.1), this was deemed unlikely to happen.

The algorithm was designed to slowly change the distortion setting when it can, to avoid quick quality variations, but quicker changes is allowed when needed to avoid a frame not being sent on time.

4.7 Initial video stream offset

Some consideration was taken for how to handle the bit rate similarities as gen- erated by the GOP structure (see section 3.1.1). To avoid sending the I-frames of different programs at the same time, an offset was added to each video stream in such a way that the position of each program’s I-frame was evenly spaced across the GOP of the first program being sent.

The offset was implemented by discarding Ni frames at the start of each program i (i = 0 is the first program), where Ni is found by equation 4.2.

G · i N = ; G = GOP size,N = number of programs, 0 i < N (4.2) i N ≤

This was not formally included in the statistical multiplexing algorithm since the video streams will have some temporal drift in respect to each other due to different encoders using different clocks. The impact of doing this will be negligible as well since most of the time at least a whole GOP is buffered. The reason this was included in the implementation at all was because it was initially believed to have a larger impact than it had. 4.8 Evaluation 27

4.8 Evaluation

Previous research uses openly available test sequences [38, 4], other video se- quences [3], or both [19, 33] for evaluating the solutions. The statistical multiplexing solution proposed by this report was evaluated using both publicly available test sequences but also using longer testing in tests more similar to a real-world scenario. The test using the publicly available test sequences was done offline, meaning that input and output were both files and there was no requirement on being able to process the input fast enough for live viewing. The main reason for this was the performance impact of gathering the data needed to monitor the video quality impact of the transcoding. When measuring the quality impact, the base- line used for comparison was the same streams encoded using the Intel Media SDK constant bit rate (CBR) rate control instead, set to use an equal share of the total bit rate. The overhead of the transport stream was taken into account when setting the bit rate of the CBR streams, while the overhead of the PES header was assumed to be negligible. Since the PCR was always included in the transport stream header in the implementation used, but nothing more, the overhead can 12 easily be calculated as 188 6.4% (see the transport stream specification [23] for details). ≈ The longer test was done with real input recorded from satellite and terrestrial sources. For this test the statistical multiplexer was receiving and sending data from and to live sources. For performance reasons, the video quality data was not gathered during this test, but the main reason of the test was to verify proper operation of the statistical multiplexer in a more real scenario. The quality was compared using the PSNR and SSIM metrics, again using the VQMT [17] software. The choice of PSNR was based on it being the current de facto standard for rate control evaluation. To get a more accurate assessment of perceived quality, the SSIM values was presented as well. Newer, more advanced, metrics were rejected because of the lack of consensus among these and what re- ally corresponds to visual quality as perceived by humans. In short, measuring PSNR makes it possible to compare the result with the majority of the other re- search out there, while SSIM gives a hint of what the perceived quality looks like as opposed to just PSNR. Hardware and software specifications The statistical multiplexer was running on an Intel NUC Kit NUC6i7KYK using CentOS 7.2 with Intel Media Server Studio Community Edition 2017 R1. Test video sequences The test files used for the offline test were four freely accessible movies: 1. [41] 2. [47] 28 4 Method

3. [8] 4. Elephants Dream [34] These were encoded from lossless data using the Intel-based encoder integrated in FFmpeg, which uses the Intel Media SDK. The software version of the Intel Me- dia SDK and the hardware were the same as used with the statistical multiplexer. The output in listing 4.2 shows the version of FFmpeg used as well as the encoder options. To test the statistical multiplexer with up to eight programs, the movies were also encoded in reverse.

Listing 4.2: FFmpeg version and options used for encoding test sequences. $ ffmpeg version ffmpeg version− 2.6.8 Copyright (c) 2000 2016 the FFmpeg developers − built with gcc 4.8.5 (GCC) 20150623 (Red Hat 4.8.5 4) configuration : prefix=/usr bindir=/usr/bin − datadir=/usr/share/ffmpeg−− −−incdir=/usr/include/ffmpeg −− libdir=/usr/lib64 mandir=/usr/share/man−− arch=x86_64 −− o p t f l a g s =’ O2 g −−pipe Wall Wp, D_FORTIFY_SOURCE=2−− −−fexceptions− fstack− − protector− −strong− − param=ssp buffer− s− i z e =4 grecord− gcc switches m64 −−mtune=generic− ’ −enable b− z l i b disable− − crystalhd− − enable gnutls −−enable−ladspa−− enable− l i b a s s −−enable− l i b c d i o −− enable− libdc1394−− enable− l i b f a a c −−enable−nonfree −−enable−libfdk aac−− enable− nonfree −−disable− indev=jack−− enable− libfreetype− −− enable− libgsm −−enable −libmp3lame −−enable −openal enable−− libopenjpeg− −−enable−libopus enable−− l i b− p u l s e −−enable −libschroedinger −−enable− l i b s o x r −−enable−libspeex −−enable− l i b t h e o r a −−enable− l i b v o r b i−− s enable− l i b v 4 l 2−− enable− libx264 −−enable−libx265 −−enable l i− b x v i d −−enable x11grab− −−enable− a v f i l t e r −− enable− avresample−− enable− postproc −−enable−pthreads −−disable− s t a t i c enable−− shared− −−enable−gpl disable−− debug− disable−− s t r i p− p i n g −− shlibdir=/usr/lib64− −− − enable−−runtime−cpudetect −−libavutil 54. 20.100−− / 54.− 20.100− libavcodec 56. 26.100 / 56. 26.100 libavformat 56. 25.101 / 56. 25.101 libavdevice 56. 4.100 / 56. 4.100 libavfilter 5. 11.102 / 5. 11.102 libavresample 2. 1. 0/ 2. 1. 0 libswscale 3. 1.101 / 3. 1.101 libswresample 1. 1.100 / 1. 1.100 libpostproc 53. 3.100 / 53. 3.100

$ ffmpeg i "lossless/%05d.png" vcodec libx264 b : v − − − 4.8 Evaluation 29

19M minrate 19M maxrate 19M bufsize 3M x264 params " nal −hrd=cbr " g− 15 bf 2 pix_fmt− yuv420p− p r o− f i l e : v high− l e v e l 4.2− f mpegts− −muxrate 25M − cbr19M_libx264_tscbr25M− − . ts −

Table 4.2 shows which channels are used for the longer live test and also which source each belongs to. The video streams were all originally encoded with a variable bit rate and streams from the same source were previously processed by a statistical multiplexer.

Program Source SVT1 HD Öst Linköping Vattentornet 618 MHz (Terrestrial) [46] SVT2 HD Öst Linköping Vattentornet 618 MHz (Terrestrial) [46]

HR Fernsehen Astra 19.2◦E 10891 MHz Horizontal (Satellite) [2]

RBB Brandenburg Astra 19.2◦E 10891 MHz Horizontal (Satellite) [2] Table 4.2: Programs used in the live test. All were encoded at 720p and 50 frames per second.

5 Results

The results of the thesis is presented in three parts: a section with the results of the initial testing of the rate and distortion characteristics of the Intel encoder, an explanation of the statistical multiplexing algorithm, and the results of the evaluation of the statistical multiplexer.

5.1 Rate and distortion characteristics of Intel encoder

In this section a number of smaller tests are presented where the Intel encoder is used for encoding the SVT multi format [16] test sequences for different QP. The 5 test sequences has been concatenated and encoded as one sequnce. Every test sequence is 10 s long, and the coding difficulty assumed by SVT is presented in table 5.1. Note that the Intel HEVC hardware encoder didn’t have support for “normal” P-frames, only Generalized P- and B-frames, meaning all those were marked as B-frames for these tests. Figure A.2 shows that the coding complexity as given by SVT in table 5.1 is spot on and the first three sequences has similarly bit rates compared to the last two, which has a significantly lower bit rate. The complexity as approximated by the gradient in figure A.1 follows this behaviour fairly well, apart from the third sequence, where the gradient is a lot lower compared to the resulting bit rate. It is notable that the bit rate of the MPEG2 encoded stream is a lot lower than for AVC and HEVC. The MPEG-2 codec responds differently to the same QP, where MPEG-2 compresses the video more. As is shown in figure A.3 and A.4, the

31 32 5 Results

Test sequence Name Difficulty 1 CrowdRun Difficult 2 ParkJoy Difficult 3 DucksTakeOff Difficult 4 IntoTree Easy 5 OldTownCross Easy Table 5.1: By SVT assumed coding difficulty of the SVT multi format test sequences. [16]

distortion for the MPEG2 stream is also a lot higher. Figures A.5, A.6 and A.7 indeed shows this difference more clearly. The bit rate curve in figure A.2 shows very similar bit rates for AVC and HEVC, and figure A.5 also shows a similar behaviour for different QP values, although HEVC shows somewhat lower bit rates for low QP values. As for distortion, the PSNR and SSIM paint two different stories when looking at the distortion variations in figure A.3 and A.4. The most notable variations in PSNR is an overall higher quality for the less complex scenes for MPEG2 and some variations between the different frame types. SSIM on the other hand, shows a higher quality for the more complex scenes instead and especially AVC and HEVC seems to have a quality that is proportional to the bit rate. SSIM also shows some quality variations in distortion between the different frame types.

5.2 Statistical multiplexing algorithm

This section presents the algorithm that, given the rate-quantization and distortion- quantization models, determines the least possible distortion that can be main- tained across all programs while satisfying all bit rate constraints as defined by the T-STD. The implementation of the algorithm comes with some caveats as it simplifies some things, mainly regarding the rate-distortion modeling. The statistical multiplexing algorithm can be divided into three parts: a rate- distortion model, a mechanism for limiting the bit rate to the decoder, and a control algorithm that determines the smallest distortion setting that fits into the total transport stream bit rate.

5.2.1 Rate-quantization and distortion-quantization models The statistical multiplexing algorithm needs a rate-distortion model for the en- coder. Since the control parameter for the encoder’s bit rate is the QP in this scenario, this relationship will be defined as a rate-quantization and a distortion- quantization function. This gives both the rate and distortion as functions of the 5.2 Statistical multiplexing algorithm 33

QP, which transitively defines the R-D relationship. To find the bit rate for a certain QP the encoding was set up as shown in figure 5.1. The encoding is done two times. The first pass will give an initial point on the rate-quantization curve and from this point, the bit rate for a higher QP is calculated using the simple relationship shown in equation 5.1. By using the initial point of the first pass, the constant α can be disregarded when calculating the second point. Since the quantization is doubled for every 6th QP (see section 1 3.2.1), the constant β was set to 6 .

Figure 5.1: The rate control used to set the target bit rate.

The distortion is simply approximated to be equal to the quantization used, as shown in equation 5.2.

1 R(Q) = α2 βQ β = (5.1) − 6

D(Q) = Q (5.2)

5.2.2 Maintaining bitstream conformance The multiplexing is controlled by a packet scheduler that prioritizes the stream with the least number of frames sent and verifies conformance for each packet using the T-STD model. It does this by keeping track of the state of the buffers in the model for each stream and updates the state each time a packet is sent. When no packet from any program can be sent, a null packet is sent instead. Figure 5.2 shows a high-level overview of the logic flow for the packet scheduler. As explained further in section 5.2.3, the quantization used is determined by simulating the transmission of the frames waiting to be sent to determine what kind of bit rate is allowed for the next frames to be encoded and this simulation starts at the time of the latest sent frame. Therefore, the packet scheduler also updates the time of the last frame sent together with the T-STD buffer state at that time for each frame sent. Algorithm B.1 details the packet scheduling algorithm for the transport stream 34 5 Results

Figure 5.2: Overview of the packet scheduler algorithm. multiplexer and algorithm B.2 shows how T-STD conformance is checked when adding a packet. Note that the T-STD is simplified by assuming instantaneous arrival of each transport packet and by not allowing the Elementary stream buffer to overflow into the Multiplexing buffer. The performance of the packet scheduler algorithm scales well overall, but it is still a fairly heavy operation. It does not scale that well with the number of pro- grams, because of the sorting and updating every packet, but these are typically no more than 10. Otherwise it’s linear to the number of packets sent which is as good as can be expected. That being said, it is a fairly high amount of computa- tions to be done for each packet and the packet rate is usually very high (almost 20000 packets per second for a 30 Mbit/s stream). A simple way to lower the computational cost is to send a bunch of packets in-between every program up- date and sort, at the cost of some of the precision in the T-STD model (frames not being sent instantaneously and program priority not updated every packet). 5.2.3 Distortion control This part of the algorithm controls the distortion setting for the current frame, making sure that each frame can be sent on time given the total bit rate of the transport stream as well as the maximum bit rates of the video streams. The algorithm keeps a global distortion target for all streams. Initially this distor- tion target is used to infer the size of each frame currently considered using the rate-distortion model. Starting at the time the last frame was sent, the sending of 5.3 Experimental results 35 each frame is approximated using the total and stream bit rates to get the small- est data dwell time. If the smallest data dwell time is larger than some value, the distortion level is used for the encoding of the next frame. If not, the distortion is increased and the same check is done again. Every time the distortion has to be increased for a frame, that frame is allowed a smaller data dwell time. This means that fewer frames are buffered temporarily to lower the distortion increase needed for bit rate spikes. If no distortion increase is needed and if the smallest data dwell time is larger than some value, the global distortion target is decreased by a small value. If the distortion instead had to be increased, the global distortion target is increased. Otherwise it is left unchanged. This check can be done for a "look-ahead", meaning that the algorithm considers a number of future frames when setting the distortion for one frame. The solution in this thesis uses a look-ahead of 15 frames (1 GOP).

Figure 5.3: Overview of the algorithm for controlling the distortion setting.

Figure 5.3 shows a high-level overview of the logic for setting the distortion while algorithm B.3 describes this in a more concise form with the hard-coded parame- ters used by the implementation.

5.3 Experimental results

The experimental results from the tests evaluating the statistical multiplexer are presented here. The tests are separated as a more thourough off-line test with Blender’s open movie projects and a live test closer to what would be real usage, with input from channel broadcasts. 36 5 Results

There is a number of aspects to a good statistical multiplexer, which the tests in this section try to evaluate. Does it maintain all bit rate constraints of the individual video streams without underflows? Is the available bit rate of the transport stream being utilized fully? How well does it maintain an even and high quality across the different programs? The packet sheduler used in the algorithm guarantees that a packet is never sent if it would violate any of the constraints of the T-STD model. It can still under- flow, meaning that a frame is not sent to the decoder in time. The distortion control uses the rate-distortion model when it calculates whether underflow will occur. An inaccurate rate-distortion model will result in having to set the mini- mum data dwell time higher to avoid skipped frames due to bit rate estimation errors. The simplistic rate-distortion model used meant that the minimum data dwell time had to be set to a relatively high value of 125 ms, but the testing never showed any underflows with this setting. 5.3.1 Off-line tests The off-line tests are presented with a closer look at the case where 4 programs were multiplexed and compares this case to the constant, and variable bit rate control algorithms in the Intel Media SDK. The statistical multiplexing gain to be had is then presented using data from the statistical multiplexer with 1, 2, 4, 6, and 8 programs. The figures reference the movies using the numbering from section 4.8, namely in the order: Big Buck Bunny (1), Tears of Steel (2), Sintel (3) and Elephants Dream (4). Figure C.4 shows the QP used by the statistical multiplexer during the test where 4 programs were multiplexed. Figures C.1-C.3 shows the bit rate of 4 multiplexed programs. It shows that the algorithm manages to utilize the bit rate fully, although it has regular dips. These dips are mostly because the total coding difficulty of the video sequences has be- come lower, and along with it the bit rate. The algorithm takes some time before it has decreased the initial distortion target enough to fully utilize the total bit rate. The MPEG-2 result is a bit different though, in that it has larger bit rate dips that isn’t a result of temporarily lower coding difficulty. Here we see the result of the assumption in the algorithm that the total bit rate will be the constraining fac- . Since the distortion setting is global for all streams, when one single stream’s bit rate has to be lowered due to that individual stream’s constraints, all streams’ bit rates are lowered instead. Due to the smaller buffer used for MPEG-2 streams, individual streams constrain the total bit rate to a much larger extent. Figure C.5 shows the Elementary stream buffer fullness for two of the multiplexed pro- grams, and this shows clearly how the multiplexer regularly reaches the limit of the buffer size in the MPEG-2 case, but not in the AVC and HEVC case. Tables C.1 and C.2 show the average PSNR and SSIM of streams coded with a constant bit rate and the PSNR and SSIM of streams coded using the statistical multiplexer. While the statistical multiplexer hardly shows better values than in the constant bit rate case here, the quality differences between programs are 5.3 Experimental results 37 decidedly higher when using a constant bit rate. Indeed, observing the quality changes over time in figures C.6-C.11 shows both a higher spread in quality val- ues between programs, but also over time. Especially the big quality loss at the start of program 4 is mitigated by also lowering the quality of the other programs. To further show that the gain to be had seems to be a more even quality, rather than an overall increase in quality, the plots in figure C.12 and C.13 shows how the variance of the first program (Big Buck Bunny) varies with an increasing num- ber of multiplexed programs, with the constant bit rate control algorithm from the Intel Media SDK as reference point. The plot shows that a large decrease in quality variance can be had with statistical multiplexing, but does not show any definitive gains after the first 4 multiplexed programs. After the first 4 pro- grams, the variance increases again and the tests seem to indicate that the coding difficulty of the added programs matter more than the number of multiplexed programs. Figures C.14 and C.15 shows the same plot, but for average PSNR and SSIM. The plots show that although the average quality increases with more multiplexed programs, it is mostly worse than the constant bit rate case. MPEG-2 shows some results that are better for the statistical multiplexer, but only for the SSIM value. Otherwise, the curve looks similar to the one for the variance, with a large in- crease up until 4 programs, with a smaller dip afterwards. 5.3.2 Live test Figure D.1 shows the data from the live test with the real broadcast streams from table 4.2. This shows that the statistical multiplexer manages to constrain a com- bined bit rate of 28-40 Mbit/s for 4 programs from 2 different statistically multi- plexed to a target bit rate of 30 Mbit/s in a live scenario while fully utilizing the bit rate. There were minor dips resulting from a quick decrease of coding difficulty as also could be seen in tests from section 5.3.1. Note that the dips in bit rate is not dips in video quality. During the hour the test went on the result was similar to what’s shown in figure D.1 and there were no missed frames.

6 Discussion

The initial tests in section 5.1 was based on the somewhat naive notion that a working, simple rate-distortion model for the Intel Media SDK should be devel- oped in this thesis for use with the statistical multiplexer. While the results from those tests show that there may be some way to use the gradient or some other metric to use for a rate-distortion model, the author was not able to use it for the statistical multiplexing solution. The simpler two-pass rate control was used instead to be able to evaluate the algorithm as a whole, given a decent rate- distortion model.

The results of the evaluation of the statistical multiplexer show that the algorithm manages to fully utilize the total bit rate with an even distortion across all chan- nels. This gives above all less severe compression artifacts since a single stream can use a lower compression level when it is sharing the bit rate with the other streams.

While the statistical multiplexer works as intended, it has some shortcomings. The most obvious one is the large dips in bit rate utilization that can be seen in figure C.1. The smaller buffer of the MPEG-2 streams means that the target data dwell time which the algorithm tries to achieve can not fit into the buffer at high bit rates and a single stream will increase the global distortion setting unnecessar- ily. This is a result of the assumption that the transport stream bit rate will be the constraining factor, although it is also due to the fact that the algorithm tries to maintain a certain number of frames in the decoder buffer, regardless of bit rate. The problem could easiest be resolved by adding a per-stream constraint that does not impact the global distortion setting, but another alternative is to set the data dwell time target lower, although that would require a better rate-distortion model.

39 40 6 Discussion

This leads us into the other main issue with the statistical multiplexer, namely the rate-distortion model. Due to a combination of time constraints, the limited ac- cess to the Intel Media SDK internals and the fact that an effective rate-distortion model probably has to be more codec-specific, the rate-distortion model used was in the end very simple. The problem with this is two-fold: the bit rate used when determining whether all frames can be sent on time is not the real bit rate and the distortion that is set the same on all programs will not actually be the same. The distortion part is arguably not that big of a problem. Setting the QP constant is sometimes used as a rate control method for achieving near-constant quality and the alternative used in most prior research of inferring the PSNR value is not really better (see section 3.4). Improving the distortion model would benefit most from taking into account things such as scene changes and video motion to try to keep distortion changes to scene transitions and decrease the distortion in scenes with low motion, where it is more noticeable.

The bigger problem is the bit rate estimation which mainly works because the encoding is done twice, leading to excessive resource usage, and with which we still have to buffer a fairly high amount of frames, leading to a higher distor- tion setting than strictly necessary. The resource usage could be improved by re-implementing the two-pass encode to only be a single-pass encode with the possibility to re-encode frames that miss the target. This is not currently possi- ble to do generally in the Intel Media SDK, but could very well be possible in the future. Improving the estimation errors from the initial encode is a bigger project, which probably needs more codec-specific knowledge and would benefit from more feedback from the Intel Media SDK encoder after the initial encode.

Since the (assumed) distortion across all programs is always the same, we can argue that as long as the transport stream bit rate is fully utilized, the result is in some sense optimal. And as can be seen from the results, this is the case apart from the dips from the MPEG-2 result as was previously explained. A problem with this reasoning though, is that the results also show fairly large distortion variations. Large variations in video quality can be quite distracting to the viewer, and it is a good idea to explore the idea of setting the distortion instead to the lowest level that can be maintained over a longer time period. The algorithm presented in this thesis can be altered to give a smoother quality level by simply making the distortion regulation a lot slower. The drawback is that temporary quality degradations become less temporary. The results from this thesis uses a regulation fast enough to use the full transport stream bit rate most of the time, mainly to avoid the comparison to the constant bit rate case becoming too skewed.

All in all, statistical multiplexing is a great way to reduce quality variations in video, but one should keep in mind that the main gain comes from combining a less complex program with a more complex program, where the more complex program can be allocated a higher bit rate. The multiplexed programs can (and will) have simultaneous complex scenes, which means that statistical multiplex- ing have limited applicability when it comes to guaranteeing a minimum quality 6.1 Conclusion 41 level. Still, there is an obvious value to getting an increased quality level most of the time and the chance that we get the simultaneous difficult scenes decreases with the number of multiplexed programs. A surprising result is that the AVC and HEVC codecs were so close in all the results. Theoretically, the HEVC codec should be superior, but this might just indicate a difference in maturity of the two codec implementations in the Intel Media SDK.

6.1 Conclusion

The bit rate of the individual streams vary greatly at the same quality, and exploit- ing this by jointly encoding all programs has great benefits. The result shows how a statistical multiplexer can be implemented using the Intel Media SDK and shows that it gives a gain mainly in the evenness of the quality among the pro- grams. Dips in video quality are a lot more limited in amplitude as opposed to the case where each stream is encoded at a constant bit rate. The biggest problem with the statistical multiplexer is that it currently needs to encode every frame twice and an overly simplistic rate-distortion model. Statistical multiplexing enables broadcasters to send more programs on the same channel which can help broadcasters to keep up with the adoption of higher res- olution content.

6.2 Future work

Setting the distortion individually per stream based on constraints is a not-too- complex task to improve the statistical multiplexer as it is now. Initially, the work done by Changuel et al. [4] brings up a lot of different constraints that also could be applied here. Exploring whether to optimize for an even, or higher average, quality should be explored as well. Using the same statistical multiplexing algorithm with a better rate-distortion model or even a completely different encoder would be interesting to see how much can be gained from a more accurate model and to explore the general ap- plicability of the algorithm.

Appendix

A Results of SVT multi format tests

Figure A.1: The average gradient per frame over time for the SVT multi format sequences.

45 46 A Results of SVT multi format tests

Figure A.2: The compressed frame size over time for the SVT multi format sequences, encoded with a constant QP of 15. 47

Figure A.3: The PSNR value over time variations for the SVT multi format sequences, encoded with a constant QP of 15. 48 A Results of SVT multi format tests

Figure A.4: The SSIM value over time variations for the SVT multi format sequences, encoded with a constant QP of 15. 49

Figure A.5: The average frame size when encoding the SVT multi format sequences with different QP. 50 A Results of SVT multi format tests

Figure A.6: The average PSNR when encoding the SVT multi format se- quences with different QP. 51

Figure A.7: The average SSIM when encoding the SVT multi format se- quences with different QP.

B Algorithms

Algorithm B.1: Packet scheduler for transport stream multiplexing while running do sort programs on smallest current clock (measured in frames sent); for all programs do /* See algorithm B.2 describing the T-STD procedure for checking T-STD conformance. */ if no packet sent and packet can be sent without violating T-STD then send packet from program; break; end end if no packet sent then send null packet; end increment time according to packet rate of transport stream; for all programs do update program’s T-STD model to get buffer state after sending packet; if a whole frame has been sent on this program then // Used by rate control algorithm update latest sent frame time; update latest buffer state; end end end

53 54 B Algorithms

Algorithm B.2: Check T-STD conformance. /* Input bitrate to Transport stream buffer is approximated by the instantaneous adding of each Transport stream packet at a constant rate. */ TS current number of bytes in Transport stream buffer; ← MX current number of bytes in Multiplexing buffer; ← ES current number of bytes in Elementary stream buffer; ← TB buffer size of Transport stream buffer; ← MB buffer size of Multiplexing buffer; ← EB buffer size of Elementary stream buffer; ← R output bitrate for Transport stream buffer; x ← R output bitrate for Transport stream buffer; bx ← t time since Transport stream buffer last was empty; TS_prevempty ← t time since Multiplexing buffer last was empty; MX_prevempty ← ; // Add packet and check that all constraints hold. TS TS + 188; if T← S > T B then return false; end // R > R = t _ t _ x bx ⇒ MX empty ≥ TS empty t TS ; TS_empty Rx ← MX+t · R t TS_empty x ; MX_empty Rbx ←EB ES tES_f ull − ← Rbx // Transport stream buffer has to empty once per second. if tTS_empty + tTS_prev_empty > 1 s then return false; end // Multiplexing buffer has to empty once per second. if tMX_empty + tMX_prev_empty > 1 s then return false; end if tES_f ull < tMX_empty then /* Strictly allowed to keep on filling the Multiplexing buffer in this case, but then we have to keep track of when the next frame decode time arrives. So instead we fail in this case. */ return false; end /* If we have arrived here then tES_f ull tMX_empty, meaning that the Multiplexing buffer is empty≥ before the Elementary stream buffer is full and we can safely send the packet. */ return true; 55

Algorithm B.3: Algorithm determining the distortion level for each frame. // Values determined experimentally. target distortion level initial distortion level; initial data dwell time← 0.5 s; maximum data dwell time← 0.75 s; minimum data dwell time ← 0.125 s; data dwell time decrement← (0.75 - 0.125) / 5; distortion increment 1 QP;← ← target distortion increment 1 QP; ← fps target distortion decrement 1 QP; ← fps while running do distortion level target distortion level; target data dwell← time initial data dwell time; found distortion level ←false; while found distortion← level is false do for all frames in look-ahead do determine frame size for current distortion level; end /* Simulation done by finding at which time frames are sent when sending all streams with an equal priority and bounded by each stream’s maximum bitrate as well as total bitrate constraint. */ simulate sending of all frames in lookahead to determine current (smallest) data dwell time; if current data dwell time < target data dwell time then if target data dwell time > minimum data dwell time then target data dwell time target data dwell time - data dwell time decrement;← end distortion level distortion level + distortion increment; ← else found distortion level true; end ← end if distortion level > target distortion level then // We failed at least the initial test. target distortion level target distortion level + target distortion increment;← end if smallest time difference > maximum data dwell time then target distortion level target distortion level - target distortion decrement;← end distortion setting for next frame distortion level; ← end

C Results of off-line test using Blender’s open movies

Program / Codec Statistical Constant multiplexer bitrate Big Buck Bunny / MPEG-2 42.14434 42.22109 Tears of Steel / MPEG-2 42.2498 43.08860 Sintel / MPEG-2 43.76623 45.42964 Elephants Dream / MPEG-2 43.76084 43.65863 Big Buck Bunny / AVC 45.13747 45.89122 Tears of Steel / AVC 44.64001 44.70832 Sintel / AVC 46.54626 47.85670 Elephants Dream / AVC 46.75368 47.04290 Big Buck Bunny / HEVC 45.12697 45.72026 Tears of Steel / HEVC 44.59571 44.78775 Sintel / HEVC 46.45703 47.50737 Elephants Dream / HEVC 46.64704 47.18808 Table C.1: Average PSNR values for the different programs encoded with different codecs using the statistical multiplexer compared to using constant bitrate. The total bitrate for the statistical multiplexer was 30 Mbit/s and the constant bitrate was set to 7.021 Mbit/s for each stream.

57 58 C Results of off-line test using Blender’s open movies

Program / Codec Statistical Constant multiplexer bitrate Big Buck Bunny / MPEG-2 0.9722508 0.9696305 Tears of Steel / MPEG-2 0.9700671 0.9719962 Sintel / MPEG-2 0.9763353 0.9781216 Elephants Dream / MPEG-2 0.9733235 0.9651548 Big Buck Bunny / AVC 0.9851294 0.9856823 Tears of Steel / AVC 0.9811257 0.9812160 Sintel / AVC 0.9870029 0.9876287 Elephants Dream / AVC 0.9860603 0.9823546 Big Buck Bunny / HEVC 0.9850769 0.9854498 Tears of Steel / HEVC 0.9806641 0.9805977 Sintel / HEVC 0.9867202 0.9868424 Elephants Dream / HEVC 0.985583 0.9817002 Table C.2: Average SSIM values for the different programs encoded with different codecs using the statistical multiplexer compared to using constant bitrate. The total bitrate for the statistical multiplexer was 30 Mbit/s and the constant bitrate was set to 7.021 Mbit/s for each stream.

Figure C.1: The combined bitrate of all programs and total transport stream utilization using the MPEG-2 codec. 59

Figure C.2: The combined bitrate of all programs and total transport stream utilization using the AVC codec.

Figure C.3: The combined bitrate of all programs and total transport stream utilization using the HEVC codec. 60 C Results of off-line test using Blender’s open movies

Figure C.4: The QP used by the statistical multiplexer for the different codecs. 8 programs is multiplexed in this example. Note that AVC and HEVC is mostly overlapping.

Figure C.5: The ES buffer fullness for two of the programs when multiplex- ing 4 programs using different codecs. 61

Figure C.6: PSNR and SSIM values when transcoding 4 programs with the MPEG-2 codec using constant bitrate.

Figure C.7: PSNR and SSIM values when transcoding 4 programs with the MPEG-2 codec using the statistical multiplexer. 62 C Results of off-line test using Blender’s open movies

Figure C.8: PSNR and SSIM values when transcoding 4 programs with the AVC codec using constant bitrate.

Figure C.9: PSNR and SSIM values when transcoding 4 programs with the AVC codec using the statistical multiplexer. 63

Figure C.10: PSNR and SSIM values when transcoding 4 programs with the HEVC codec using constant bitrate.

Figure C.11: PSNR and SSIM values when transcoding 4 programs with the HEVC codec using the statistical multiplexer. 64 C Results of off-line test using Blender’s open movies

Figure C.12: Standard deviation of the PSNR value for the Big Buck Bunny movie multiplexed with an increasing number of other movies at a total bi- trate of 7.021 Mbit/s per movie. The dashed lines are the same standard deviation for the movie encoded at a constant bit rate of 7.021 Mbit/s.

Figure C.13: Standard deviation of the SSIM value for the Big Buck Bunny movie multiplexed with an increasing number of other movies at a total bi- trate of 7.021 Mbit/s per movie. The dashed lines are the same standard deviation for the movie encoded at a constant bit rate of 7.021 Mbit/s. 65

Figure C.14: Mean PSNR for the Big Buck Bunny movie multiplexed with an increasing number of other movies at a total bitrate of 7.021 Mbit/s per movie. The dashed lines are the mean PSNR values for the movie encoded at a constant bit rate of 7.021 Mbit/s.

Figure C.15: Mean SSIM for the Big Buck Bunny movie multiplexed with an increasing number of other movies at a total bitrate of 7.021 Mbit/s per movie. The dashed lines are the mean SSIM values for the movie encoded at a constant bit rate of 7.021 Mbit/s.

D Result of live test

Figure D.1: Live test of 4 multiplexed programs, 2 from one and 2 from another. The programs were already statistically multiplexed coming from the source. The total input bitrate varied between 28-40 Mbit/s. The shown data is a 10-minute extract from 30 minutes into the test.

67

Bibliography

[1] Anne Aaron. Toward A Practical Perceptual Video Quality Metric. June 6, 2016. url: http://techblog.netflix.com/2016/06/toward- practical-perceptual-video.html (visited on 01/26/2017). [2] Astra. Astra. Astra. 2017. url: http://www.onastra.se (visited on 12/07/2017). [3] M. Blestel, M. Ropert, and W. Hamidouche. “Generic statistical multiplexer with a parametrized bitrate allocation criteria”. In: 2016 IEEE International Conference on Image Processing (ICIP). 2016 IEEE International Confer- ence on Image Processing (ICIP). Sept. 2016, pp. 2127–2131. [4] Nesrine Changuel, Bessem Sayadi, and Michel Kieffer. “Predictive encoder and buffer control for statistical multiplexing of multimedia contents”. In: IEEE Transactions on Broadcasting 58.3 (2012), pp. 401–416. [5] Yanjiao Chen, Kaishun Wu, and Qian Zhang. “From QoS to QoE: A tutorial on video quality assessment”. In: IEEE Communications Surveys & Tutori- als 17.2 (2015), pp. 1126–1165. [6] Tihao Chiang and Ya-Qin Zhang. “A new rate control scheme using quadratic rate distortion model”. In: IEEE Transactions on Circuits and Systems for Video Technology 7.1 (Feb. 1997), pp. 246–250. [7] Shyamprasad Chikkerur et al. “Objective video quality assessment meth- ods: A classification, review, and performance comparison”. In: IEEE Trans- actions on Broadcasting 57.2 (2011), pp. 165–182. [8] Colin Levy. Sintel. 2010. [9] Wei Dai et al. “SSIM-based rate-distortion optimization in H. 264”. In: Acoustics, Speech and (ICASSP), 2014 IEEE International Conference on. IEEE, 2014, pp. 7343–7347. [10] Sachin Deshpande et al. “An improved hypothetical reference decoder for HEVC”. In: ed. by Amir Said, Onur G. Guleryuz, and Robert L. Stevenson. Feb. 21, 2013, p. 866608. [11] Broadcasting (DVB). Digital Video Broadcasting (DVB); Im- plementation Guidelines for a second generation digital cable transmission system (DVB-C2). Aug. 2015.

69 70 BIBLIOGRAPHY

[12] Digital Video Broadcasting (DVB). Digital Video Broadcasting (DVB); Spec- ification for the use of Video and Audio Coding in Broadcasting Applica- tions based on the MPEG-2 Transport Stream. Nov. 2016. [13] Digital Video Broadcasting (DVB). White Paper on the use of DVB-S2X for DTH applications, DSNG & Professional Services, Broadband Interactive Services and VL-SNR applications. Mar. 2015. [14] W. Gao et al. “SSIM-Based Game Theory Approach for Rate-Distortion Op- timized Intra Frame CTU-Level Bit Allocation”. In: IEEE Transactions on Multimedia 18.6 (June 2016), pp. 988–999. [15] L. Guo et al. “A Novel Analytic Quantization-Distortion Model for Hybrid Video Coding”. In: IEEE Transactions on Circuits and Systems for Video Technology 19.5 (May 2009), pp. 627–641. [16] Lars Haglund. “The SVT high definition multi format test set”. In: Swedish Television Stockholm (2006). [17] P. Hanhart and R. Hahling. Video quality measurement tool (VQMT). Sep, 2013. [18] Zhihai He and Sanjit K. Mitra. “A unified rate-distortion analysis frame- work for ”. In: IEEE Transactions on Circuits and Systems for Video Technology 11.12 (2001), pp. 1221–1236. [19] Zhihai He and Dapeng Oliver Wu. “Linear rate control and optimum sta- tistical multiplexing for H. 264 video broadcast”. In: IEEE Transactions on Multimedia 10.7 (2008), pp. 1237–1249. [20] Quan Huynh-Thu and Mohammed Ghanbari. “Scope of validity of PSNR in image/video quality assessment”. In: Electronics letters 44.13 (2008), pp. 800–801. [21] Intel®. Media SDK. 2017. url: https://software.intel.com/en- us/media-sdk (visited on 06/05/2017). [22] Intel®. Media Server Studio. 2017. url: https://software.intel. com/en-us/intel-media-server-studio (visited on 06/05/2017). [23] ISO/IEC JTC 1/SC 29. 13818-1:2015 - Information technology – Generic coding of moving pictures and associated audio information – Part 1: Sys- tems. July 2015. [24] ISO/IEC JTC 1/SC 29. 13818-2:2013 - Information technology – Generic coding of moving pictures and associated audio information – Part 2: Video. Oct. 2013. [25] ISO/IEC JTC 1/SC 29. 14496-10:2014 - Information technology – Coding of audio-visual objects – Part 10: Advanced Video Coding. Sept. 2014. [26] ISO/IEC JTC 1/SC 29. 23008-2:2015 - Information technology – High effi- ciency coding and media delivery in heterogeneous environments – Part 2: High efficiency video coding. May 2015. [27] ITU-T. H.262 : Information technology - Generic coding of moving pictures and associated audio information: Video. Feb. 2012. [28] ITU-T. H.264 : Advanced video coding for generic audiovisual services. Oct. 2016. [29] ITU-T. H.265 : High efficiency video coding. Dec. 2016. BIBLIOGRAPHY 71

[30] ITU-T. “Subjective video quality assessment methods for multimedia appli- cations”. In: ITU-T RECOMMENDATION (1999). [31] M. Jeffrey. 2017 Release: What’s New in Intel® Media Server Studio | Intel® Software. Sept. 1, 2016. url: https://software.intel.com/en- us/blogs/2016/07/27/2017-release-whats-new-in-intel- media-server-studio (visited on 01/05/2017). [32] Vipin Kamble and K. M. Bhurchandi. “No-reference image quality assess- ment algorithms: A survey”. In: Optik - International Journal for Light and Electron Optics 126.11 (June 2015), pp. 1090–1097. [33] Rahim Kamran, Mehdi Rezaei, and Davoud Fani. “A frame level fuzzy video rate controller for variable bit rate applications of HEVC”. In: Journal of Intelligent & Fuzzy Systems 30.3 (Mar. 2016), pp. 1367–1375. [34] Bassam Kurdali. Elephants Dream. 2006. [35] X. Li et al. “Laplace Distribution Based Lagrangian Rate Distortion Opti- mization for Hybrid Video Coding”. In: IEEE Transactions on Circuits and Systems for Video Technology 19.2 (Feb. 2009), pp. 193–205. [36] Sandro Moiron et al. “Statistical multiplexing of transcoded IPTV streams based on content complexity”. In: International Conference on Mobile Mul- timedia Communications. Springer, 2010, pp. 60–73. [37] T. S. Ou, Y. H. Huang, and H. H. Chen. “SSIM-Based Perceptual Rate Con- trol for Video Coding”. In: IEEE Transactions on Circuits and Systems for Video Technology 21.5 (May 2011), pp. 682–691. [38] C. Pang et al. “Dependent Joint Bit Allocation for H.264/AVC Statistical Multiplexing Using Convex Relaxation”. In: IEEE Transactions on Circuits and Systems for Video Technology 23.8 (Aug. 2013), pp. 1334–1345. [39] Jaroslav Polec et al. “A Content-Based Optimization of Data Stream Tele- vision Multiplex”. In: World Academy of Science, Engineering and Tech- nology, International Journal of Electrical, Computer, Energetic, Electronic and Communication Engineering 7.10 (2013), pp. 1275–1280. [40] Atul Puri. “Deliver High Quality, High Performance HEVC via Intel® Me- dia Server Studio”. In: (2008). [41] Sacha Goedegebure. Big Buck Bunny. 2008. [42] Sally Sams. Intel® Media Server Studio HEVC Codec Scores Fast Transcod- ing Title | Intel® Software. Oct. 23, 2015. url: https : / / software . intel.com/en-us/blogs/2015/10/23/intel-media-server- studio-hevc-codec-wins-transcoding-title (visited on 01/05/2017). [43] Khalid Sayood. Introduction to . Newnes, 2012. [44] Patrick Seeling and Martin Reisslein. “Video traffic characteristics of mod- ern encoding standards: H. 264/AVC with SVC and MVC extensions and H. 265/HEVC”. In: The Scientific World Journal 2014 (2014). [45] M. Surbhi. Bitrate Control Methods (BRC) in Intel® Media SDK | Intel® Software. Aug. 19, 2014. url: https://software.intel.com/en- us/articles/common- bitrate- control- methods- in- intel- media-sdk (visited on 12/01/2016). [46] Teracom. Frekvenstabeller tv. TeracomB2B. 2017. url: http : / / www . teracom.se/privat/tv/frekvenstabeller-tv/ (visited on 12/07/2017). 72 BIBLIOGRAPHY

[47] Tomás Potocný. Tears of Steel. 2012. [48] Jan Van der Meer. Fundamentals and Evolution of MPEG-2 Systems: Paving the MPEG Road. John Wiley & Sons, 2014. [49] Miaohui Wang, King Ngi Ngan, and Hongliang Li. “An Efficient Frame- Content Based Intra Frame Rate Control for High Efficiency Video Coding”. In: IEEE Signal Processing Letters 22.7 (July 2015), pp. 896–900. [50] S. Wang et al. “SSIM-Motivated Rate-Distortion Optimization for Video Coding”. In: IEEE Transactions on Circuits and Systems for Video Tech- nology 22.4 (Apr. 2012), pp. 516–529. [51] Zhou Wang, Ligang Lu, and Alan C. Bovik. “Video quality assessment based on structural distortion measurement”. In: Signal processing: Image com- munication 19.2 (2004), pp. 121–132. [52] Thomas Wiegand et al. “Overview of the H. 264/AVC video coding stan- dard”. In: IEEE Transactions on circuits and systems for video technology 13.7 (2003), pp. 560–576. [53] Chun-Ling Yang et al. “An SSIM-optimal H. 264/AVC inter frame encoder”. In: Intelligent and Intelligent Systems, 2009. ICIS 2009. IEEE International Conference on. Vol. 4. IEEE, 2009, pp. 291–295. [54] W. Yao, L. P. Chau, and S. Rahardja. “Joint Rate Allocation for Statistical Multiplexing in Video Broadcast Applications”. In: IEEE Transactions on Broadcasting 58.3 (Sept. 2012), pp. 417–427.