<<

A Live Broadcast 4K System Utilizing the MPEG-DASH Adaptive Bit Rate Protocol and Technologies

The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters

Citation Bresnahan, Brian Patrick. 2018. A Live Broadcast 4K Television System Utilizing the MPEG-DASH Adaptive Bit Rate Protocol and Internet Technologies. Master's thesis, Harvard Extension School.

Citable link https://nrs.harvard.edu/URN-3:HUL.INSTREPOS:37364545

Terms of Use This article was downloaded from Harvard University’s DASH repository, and is made available under the terms and conditions applicable to Other Posted Material, as set forth at http:// nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of- use#LAA

A Live Broadcast 4K Television System Utilizing the MPEG-DASH Adaptive Bit Rate

Protocol and Internet Technologies

Brian Patrick Bresnahan

A Thesis in the Field of Software Engineering

for the Degree of Master of Liberal Arts in Extension Studies

Harvard University

May 2018 Copyright © 2018 Brian Patrick Bresnahan Abstract

This thesis proposes that an alternative to the existing system could be created using the MPEG-DASH protocol (Dynamic Adaptive Streaming over

HTTP) and other readily available internet technologies including web servers and content delivery networks (CDN).

MPEG-DASH is a relatively new client-server protocol used for video streaming that leverages existing web server, file storage, and codec technology. The alternative television system proposed would be capable of streaming 4K resolution high-definition video using MPEG-DASH. MPEG-DASH is capable of interoperating with numerous streaming devices such as , smart TVs, and streaming devices such as

Google . Transmitting live TV using MPEG-DASH over a packetized IP network would be more effective at reaching more device types. In addition, the MPEG-

DASH protocol is designed to use the highest possible video resolution based on dynamic network analysis. Thus, with MPEG-DASH the user enjoys the best possible picture and sound on more devices.

The current methods of high-definition television over-the-air and on wire-based cable TV networks are efficient for service providers but require an RF or set-top-box to allow the user to view a broadcast. The proposal reduces the number of RF tuners required in the television system. Although the alternative television system would require a transition from a broadcast system to a unicast system and unicast systems require more network , current trends in internet bandwidth growth at the edge of the internet should make it feasible. The first goal of the

thesis is to explore the feasibility, practicality, and scalability of this alternative MPEG-

DASH television system. The second goal is to design and develop a working prototype of the server side of the system.

Dedication

This thesis is dedicated firstly to my wife, Michelle, for helping me in so many ways during my ALM degree endeavor. Your support while we raised children, worked and studied made it all possible. Thank you for listening. You are my rock.

This thesis is dedicated secondly to my children, Ian and Owen. Our discussions about science, the universe, mathematics, chemistry, and physics are a bright spot in my life. You inspire me. I strive to inspire you.

This thesis is dedicated thirdly to my deceased mother, Yvonne. You earned a

Psychology degree while raising four children and I know how difficult that was.

You took me to a Boston Computer Society show when I was a teenager and fed my computer curiosity. Thank you for letting me roam Geisel library when I was fifteen, where I found a book on Basic programming, saw a program for temperature conversion and the hair on the back of my neck stood up.

Lastly, this thesis is dedicated to my mother’s parents, Alvine and Harry

Gahagan, for being there when we needed them and for teaching us the value of hard work. Pepere, your basement workbench was my first exposure to a soldering iron, resistor color bands, an oscilloscope and the great mystery of a television with its cover removed.

v

Acknowledgments

I would like to acknowledge the fine assistance of Dr. James L. Frankel throughout my thesis process. Professor, your comments and keen insights helped immeasurably during the exploration, software development, writing, and review stages.

You are a true educator. I feel that you have imparted to me better technical judgment, the patience to question observations, and an appreciation for the diligent research of others. Your computer architecture course was both challenging and inspirational. When you walked into class one night with a piece of core memory, I knew you were cool.

I would like to thank Dr. Jeff Parker for being my thesis research advisor.

Professor, our lively discussions helped considerably in deciding on the feasibility of my topic.

I would also like to thank my old friend Todd Tiberi, Esq., for assisting with the review of the thesis.

vi

Table of Contents

Abstract ...... iii

Dedication ...... v

Acknowledgments...... vi

List of Tables ...... xii

List of Figures ...... xiii

List of Equations ...... xvi

Chapter 1. Introduction ...... 1

1.1 Motivation and Goals ...... 1

1.2 Thesis Outline ...... 2

Chapter 2. Background ...... 4

2.1 Terminology: Bandwidth, Bit Rate, and Throughput ...... 4

2.2 Television Signaling Methods ...... 5

2.2.1 Over-the-Air Broadcasts ...... 5

2.2.2 Cable Television System...... 8

2.3 Video Streaming Evolution...... 10

2.4 MPEG-DASH Protocol Overview ...... 11

2.5 Bandwidth Requirements and Codecs ...... 13

2.5.1 MPEG-DASH and Codecs ...... 17

2.6 DASH Adaptive Bit Rate Algorithms...... 17

2.6.1 Adaptive Bit Rate Algorithm Research ...... 19

2.7 DASH Network Component Details ...... 21

2.8 DASH Streaming Data Model Details ...... 24

vii

2.8.1 Media Presentation Timeline ...... 24

2.8.2 Media Presentation Description File ...... 27

2.8.2.1 Example of Method 2 ...... 28

2.8.2.2 Example of Method 3 ...... 29

2.9 Trick Mode Play ...... 30

2.10 Management ...... 31

Chapter 3. Prior Academic Research ...... 36

3.1 GPAC – Project on Advanced Content ...... 36

3.2 Moscow State University ...... 36

3.3 Theses and Papers ...... 37

Chapter 4. Industry Development of DASH ...... 40

4.1 DASH Industry Forum ...... 40

4.2 and Chromecast ...... 40

4.3 ...... 44

4.4 Embedded in ...... 45

4.5 Bitmovin ...... 45

4.6 Akamai ...... 46

4.7 Periscope and YouTube LiveStream ...... 46

Chapter 5. A DASH Based Television System ...... 47

5.1 Applying DASH to the Television System ...... 47

5.1.1 Home Network Requirements ...... 49

5.2 Virtualized Television System ...... 50

5.2.1 Bandwidth Requirements...... 52

viii

5.3 ...... 58

5.4 CDN Scale Requirements ...... 64

5.4.1 Commentary ...... 69

5.5 Network Latency ...... 70

Chapter 6. Design and Implementation of Functioning Prototype ...... 73

6.1 Requirements ...... 73

6.1.1 Two Resolutions: 3840x2160 (4K) and 854x480 ...... 74

6.1.2 Create a Virtual Set-Top Box ...... 74

6.2 Project Operating System and Software Development Tools...... 74

6.2.1 Operating System ...... 74

6.3 Compiler and Tools: gcc, emacs, gdb and valgrind ...... 75

6.4 FFMPEG Codec and Scaler Library ...... 76

6.5 ffplay and mediainfo Utility Programs ...... 76

6.6 Segmenter Utility Program ...... 77

6.7 Hardware Choices ...... 77

6.7.1 4K Camera ...... 78

6.7.2 CPU ...... 78

6.7.3 4K Television ...... 79

6.8 Software Design ...... 79

6.8.1 Logic Flow ...... 79

6.8.2 Data Flow ...... 80

6.8.3 Instance Diagram ...... 82

Chapter 7. User’s Guide and Human Interactions ...... 83

ix

7.1 Command Line Interface ...... 83

7.2 Statistics and Monitoring ...... 84

7.3 Camera Image Acquisition and Resolutions ...... 85

tv_channel ...... 85

7.4 Web Server and CORS ...... 85

7.4.1 Directory Structure and Files ...... 86

7.5 Virtual Set-Top Box JavaScript Application ...... 86

7.6 Sample Execution ...... 88

Chapter 8. Verifications and Results ...... 95

8.1 Evaluation Methodology ...... 95

8.1.1 XML Verification ...... 96

8.1.2 Broadcast Clock ...... 96

8.1.3 Macroblocking Check ...... 98

8.1.4 4K Chromecast...... 99

8.1.5 The DASH Industry Forum reference Client ...... 100

8.1.6 Network Conditions Feature in Chrome ...... 100

8.1.7 CPU Utilization ...... 102

8.2 DASH Client Verification...... 105

8.3 Chromecast Verification ...... 114

Chapter 9. Summary, Contributions and Future Work ...... 118

9.1 Summary ...... 118

9.2 Contributions...... 120

9.3 Future Improvements and Next Steps to the tv_channel Program ....122

x

9.4 Future Work ...... 124

Appendix 1. Legal issues with Rebroadcasting ...... 127

Bibliography and References ...... 128

Glossary ...... 138

xi

List of Tables

Table 1. Video Codec Bit Rates ...... 15

Table 2. MPEG4 Audio Codec Bit Rates ...... 15

Table 3. Chromecast Supported H.264 Codec Profiles ...... 44

Table 4. Average Connection Bit Rate over IPv4 by State...... 54

Table 5. Average Peak Connection Bit Rate over IPv4 by State ...... 55

Table 6. Web Server Directories and Files ...... 86

xii

List of Figures

Figure 1. Example of RF Tuners in Typical Household ...... 7

Figure 2. Rate Map of Buffer Based Algorithm ...... 20

Figure 3. DASH Network Elements ...... 22

Figure 4. Thesis Implementation of DASH Network Elements ...... 24

Figure 5. Media Presentation Timeline ...... 25

Figure 6. Segments for one Representation ...... 26

Figure 7. Example Manifest with Discrete URLs ...... 29

Figure 8. Example Manifest with Variables ...... 30

Figure 9. Example Manifest with Content Protection ...... 33

Figure 10. DRM Interaction Diagram ...... 34

Figure 11. Chrome Developer Tools ...... 42

Figure 12. Chromecast Console Window ...... 43

Figure 13. RF Tuner Reduction in Future Home ...... 49

Figure 14. Virtualized Broadcast Television Ecosystem ...... 50

Figure 15. Average Fixed Connection Download Speed...... 56

Figure 16. Maximum 4K Streams per Internet Access Type...... 57

Figure 17. Maximum 4K Streams per Internet Access Type...... 57

Figure 18. Basic Content Delivery Network...... 61

Figure 19. Content Delivery Network Abstraction ...... 62

Figure 20. Content Delivery Network Graph ...... 65

Figure 21. CDN Tier Sizing ...... 69

xiii

Figure 22. Prototype tv_channel Data Flow ...... 81

Figure 23. Software Instance Diagram ...... 82

Figure 24. tv_channel Usage ...... 83

Figure 25. tv_channel Statistics ...... 84

Figure 26. Virtual Set-Top Box GUI ...... 87

Figure 27. Chromecast Device Selection ...... 88

Figure 28. Sample Execution Output ...... 94

Figure 29. Test Network ...... 95

Figure 30. File Download Speed Analysis ...... 97

Figure 31. Uncompressed Image ...... 98

Figure 32. Compressed Image with Macroblocking ...... 99

Figure 33. Forcing Network Congestion Conditions ...... 101

Figure 34. Network Throttle Points ...... 102

Figure 35. CPU Utilization ...... 103

Figure 36. CPU Utilization in Running State ...... 104

Figure 37. Test Manifest .mpd File ...... 106

Figure 38. Initial View of the Channel ...... 107

Figure 39. Initial Low-Resolution Image Files ...... 108

Figure 40. Transitioning to High Resolution ...... 109

Figure 41. Player Characteristics by Time ...... 109

Figure 42. Initiating Network Congestion ...... 111

Figure 43. Transitioning Back to Low Resolution...... 112

Figure 44. Switching Representations ...... 113

xiv

Figure 45. Select Channel on Set-Top Box Application...... 114

Figure 46. Visual Verification of Chromecast ...... 115

Figure 47. Chromecast Starts at Low Resolution ...... 116

Figure 48. Chromecast Transitions to High Resolution ...... 117

xv

List of Equations

Equation 1. Uncompressed 1K 24 fps Bandwidth Requirement ...... 16

Equation 2. Uncompressed 4K 24 fps Bandwidth Requirement ...... 16

Equation 3. Aggregate MPEG-DASH Data Throughput ...... 59

Equation 4. CDN Theoretical Throughput ...... 66

Equation 5. CDN Calculated Throughput ...... 67

Equation 6. CDN Last Tier Server Count ...... 67

Equation 7. CDN Aggregate Throughput to Last Tier ...... 68

Equation 8. Theoretical Number of CDN Tiers ...... 68

Equation 9. DASH Live Network Latency Formula ...... 72

xvi

Chapter 1. Introduction

In this chapter, I will describe the motivations and goals of the thesis project.

Then I will outline the thesis.

1.1 Motivation and Goals

Video streaming over the internet has transformed how we watch movies and has replaced physical media (e.g. VHS tapes and DVDs). It has also changed the learning experience for many all over the world. The success of Khan Academy is an example of this (Khan, 2018). A next step in the evolution of video streaming is to stream live broadcasts over the internet at very high resolution. Examples of live broadcasts are news programming, on-the-scene weather reports, educational programming, sports events, etc.

The goal of this thesis is to understand how to use the MPEG-DASH protocol and internet technologies to create such a system. I will explore whether it is technically feasible, scalable, presents a practical alternative to the existing cable system and, if so, how it would work.

As part of the goal, I designed and developed a working prototype executable program called tv_channel. tv_channel reads raw 4K video frames from a 4K camera, encodes the video at two different resolutions, and places the video files in a directory hosted by a web server. I also wrote a simple virtual set-top box

JavaScript application that lets a user select a channel to view and commands a

Chromecast device attached to a 4K TV to stream and play the channel’s video files

1

shared by the web server. This prototype is described in detail in the chapter Design and

Implementation of Functioning Prototype.

1.2 Thesis Outline

The outline of the thesis is as follows. The Background chapter describes the existing cable network system and explains why this system could be improved. The

Background chapter also describes the MPEG-DASH protocol in detail and explains how the protocol could be applied to the construction of a new television system. The Prior

Academic Research chapter discusses research and research groups working with the

MPEG-DASH protocol or on video streaming infrastructure. The Industry Development of DASH chapter reviews how a few companies are using the MPEG-DASH protocol in their products.

A DASH Based Television System proposes how the DASH protocol could be used to create an alternative to the existing cable television system. Design and

Implementation of Functioning Prototype explains the thesis project requirements, design, and implementation. This chapter includes a description of the hardware and software choices for the project.

User’s Guide and Human Interactions explains the human interface to the project software, tv_channel, i.e., command line options and expected output. This chapter further explains the user interactions with the virtual set-top box GUI-based application.

In the Verifications and Results chapter, I analyze the execution of the project by examining the network bandwidth requirements of different resolutions and the server

CPU utilization. The results at the 4K display screen are presented.

2

Finally, Summary, Contributions and Future Work concludes with a summary of the thesis and answers the fundamental thesis question as to whether or not an alternative to the existing cable television system could be built using the MPEG-DASH protocol.

3

Chapter 2. Background

This chapter provides background on the existing cable television system to better understand the thesis. It briefly reviews the video streaming evolution that has occurred in the last decade. This chapter also explains codec usage and offers in-depth background on the MPEG-DASH protocol, its data formats, and basic algorithm. I introduce how MPEG-DASH can be used to build a new television system, which is explained in more detail in the subsequent chapter A DASH Based Television System.

2.1 Terminology: Bandwidth, Bit Rate, and Throughput

To enhance the understanding of the thesis, it is important to define the terms bandwidth, bit rate, and throughput. They are related but have subtle differences.

Bandwidth has two main definitions. First, it is the measure of the width of a band of RF spectrum and in this case, it is written with units Hz (cycles per second). For example, a U.S. video channel from frequency 575 to 581 has bandwidth of 6 MHz.

Second, bandwidth refers to the number of bits per second that can be transmitted on a communications link. In this case, it will appear in units of bits per second (L. Peterson,

2011). For example, a fiber link may have 10 Gbps of bandwidth. The term bit rate is synonymous with the second definition. We can say the same fiber link has a 10 Gbps bit rate (Merriam-Webster, 2018).

Throughput refers more to the measured performance of a communications channel (L. Peterson, 2011). For example, a 10 Mbps Ethernet link has 10 Mbps of

4

bandwidth but after accounting for layer encapsulation and other factors, it may only have 8 Mbps of usable throughput.

2.2 Television Signaling Methods

Traditionally, the options in the United States for television signaling to the home have been: (1) over-the-air broadcasts, (2) the cable television system, and (3) the system (this thesis does not review the satellite system). Alternatives have arisen in the last decade as demonstrated by the success of Netflix and other over- the-top video streaming companies such as Sling and . This trend appears to be following the transition of the last mile of our system from analog signaling on dedicated copper wires to the packetization of voice within the home and transportation of those packets through the consumer’s internet service provider’s network and voice gateways. The voice transition is partly due to economic reasons as argued by Fryxell in his thesis Local Access Networks: An Economic and Public Policy Analysis of

Cable and ADSL where he found that cable access has a lower capital cost than internet access over traditional copper (Fryxell, 2002).

2.2.1 Over-the-Air Broadcasts

The over-the-air broadcast system is fundamentally based on modulating a video signal on a specific carrier frequency, transmitting that signal via an RF transmission tower, receiving that signal at an RF tuner in a television set or tuner device, demodulating the signal, and then rendering the original video on a television screen. In

1941, the black and white system was designed to transmit an within a 6

5

MHz block of spectrum. Each 6 MHz block of spectrum carried one viewable channel.

In 1951, the color system was designed as an overlay on top of the black and white system (Fink, 1976). This design had a remarkable duration until the redesign of the system in the 1990s to support high-definition transmission. The channels are transmitted unencrypted, i.e., in the clear, and receivable by any television with an RF tuner. Although a major benefit to consumers, there are typically no more than a dozen over-the-air broadcasts in a geographic area. This typically includes the major four broadcasters: ABC, NBC, CBS, and PBS. In contrast, cable television typically offers a better channel selection with one hundred to two hundred channels thus giving the consumer an incentive to pay a premium for cable television service.

Three observations on the over-the-air television system follow. First, an RF tuner is required to receive the signal which adds cost to a consumer device. The television system proposed in the thesis seeks to eliminate the tuner. Second, the consumer is limited to the number of local broadcasts because of the geographic limits of transmitting over-the-air. This number may be less than ten channels as compared to one or two hundred channels available with cable television. Third, the RF transmission has no inherent way to be time-shifted since it is a continuously broadcast signal that is demodulated, decoded, and rendered in real-time. Time-shifted video is a term used to describe the way consumers can view a video at a time that is convenient for them. This is typically achieved by persistently recording the video either at the service provider or on a device controlled by the user, such as the set-top box or digital video recorder

(DVR). This functionality has been a major convenience for most consumers. As I will explain in the chapter A DASH Based Television System, the file-based nature of MPEG-

6

DASH brings the possibility of time-shifted video to the alternative television system presented in the thesis.

Figure 1, below, shows the number of RF tuners in use in a typical home having three televisions and two hand-held devices. In this example, two televisions use cable set-top boxes and one television uses an over-the-air HD , accounting for three tuners. The fourth tuner is in the . Thus, this single consumer pays for four tuners.

HD Broadcast Over-The-Air Input TV 1 HD RF to the Home Tuner

Screen TV 2 Device 1 Set-Top-Box RF Cable Cable TV Tuner Modem Wireless input to RF Router Screen TV 3 Tuner Home Device 2 Set-Top-Box RF Tuner

Figure 1. Example of RF Tuners in Typical Household

Ideally, there would be only one tuner or demodulator that provides an internet connection to which a television service could be connected, thus eliminating three of the four tuners. We will revisit this concept more in the following subchapter Applying

DASH to the Television System.

7

2.2.2 Cable Television System

In 1948, the cable television system began when twin-lead wires were strung from a common antenna and then from house to house in Astoria, Oregon. In 1950, was used in a similar wiring scheme in Lansford, Pennsylvania. The motivation for community antenna television (CATV) was to get a better signal to the homes. The

CATV wired approach helped when the over-the-air signal at the home was weak or suffered from interference from tall buildings (Laubach, Farber, & Dukes, 2001).

The initial RF signaling on the CATV system was simply an amplified form of the signal received at a central antenna. Since the over-the-air broadcasts were 6 MHz per viewer channel, 6 MHz signals appeared on the cable as an analog signal. By 2001, the MPEG group standardized a way to digitize the analog signal into a digital signal with the benefit of error correction and signal compression. The digital signal is also modulated onto a 6 MHz channel (Laubach, Farber, & Dukes, 2001).

As MPEG evolved, it became possible to encode multiple program streams together into a transport stream and modulate the transport stream on to a 6 MHz channel

(Jack, 2007). In effect, MPEG encodes multiple viewer channels (program streams) into the 6 MHz spectrum that could carry only one analog signal in the 1950 era system.

There are benefits and drawbacks to this method. The benefit is that the cable operator can offer more viewing channels to the consumer. The drawback is that to fit more program streams into the transport stream, the cable operator must increase the amount of compression. Since the MPEG2 and MPEG4 compression schemes used are schemes, some signal quality is lost and therefore the consumer may see some artifacts in the display. Watkinson states in a subchapter on compression, “Whilst

8

mild compression will be undetectable, with greater compression factors, artifacts become noticeable.” (Watkinson, 2004) Moreover, he cautions, “If compression is to be used, the degree of compression should be as small as possible, i.e., use the highest practical bit rate.” The quandary here for the cable provider is that the more program streams they try to fit into a transport stream, the more compression is needed and thus the consumer will see artifacts such as macroblocking.

We now turn to the functionality of the set-top box. At the set-top box in the consumer’s home, the multiplexed MPEG4 transport stream is first demodulated by an

RF tuner. The transport stream for a watched or recorded channel is filtered out from the unwatched channels. The program stream is then extracted and decrypted (Jack, 2007).

The program stream is then decompressed by a hardware decoder and the image is displayed on the consumer’s screen as what he perceives as a viewable channel, say,

“channel 4.”

One approach to counteract the need for program streams is to use more 6 MHz channels. As Laubach explains in his review of RF spectrum band plans of cable operators, the cable network consists of many electronic parts each of which has a frequency limit. Operators have historically had maximum frequency limits of 300 MHz,

550 MHz, 750 MHz, and now 1.2 GHz (Laubach, Farber, & Dukes, 2001). Therefore, even though the cable operator has the entire RF spectrum available on the wire, he can only use a portion of the spectrum. For example, if an operator’s equipment can modulate from 390 MHz to 750 MHz, he is limited to 60 6 MHz wide channels.

To summarize issues mentioned, the cable operator must use compression to provide a large number of program streams to a consumer’s home. Since this is a

9

broadcast system the operator must send hundreds of program streams to the consumer’s set-top box even though he will only watch or record a few programs at any given time.

A second issue is that the cable signal requires a specialized set-top box device to demodulate and decode the signal. This device is an expense to the cable operator that is then passed on to the consumer as a fee.

2.3 Video Streaming Evolution

Video streaming methods have evolved in the last decade from proprietary methods to a standard called MPEG-DASH, MPEG using Dynamic Adaptive Streaming over HTTP (Sodagar, 2011). As video streaming has evolved, a number of proprietary or

“closed” systems have been developed. For example, YouTube for many years used

Adobe Flash to stream video. However, in 2015, YouTube dropped Flash in favor of

HTML5 for its default web player. YouTube engineering reported on their technology blog that:

“Adaptive Bitrate (ABR) streaming is critical for providing a quality video experience for viewers - allowing us to quickly and seamlessly adjust resolution and bitrate in the face of changing network conditions. ABR has reduced buffering by more than 50 percent globally and as much as 80 percent on heavily-congested networks.” (Leider, 2015)

Until 2013, Netflix teamed with and used the Microsoft Silverlight streaming video player in the Netflix application. However, Netflix dropped Silverlight in favor of HTML5. It did so even though Silverlight supported adaptive bit rate streaming because Microsoft plans to end-of-life Silverlight in 2021 (Park, 2013).

Apple developed HLS, HTTP Live Streaming. The Apple developer web page gives the handful of benefits of HLS including adaptive bit rate playback:

10

“Send live and on‐demand audio and video to iPhone, iPad, Mac, Apple TV, and PC with HTTP Live Streaming (HLS) technology from Apple. Using the same protocol that powers the web, HLS lets you deploy content using ordinary web servers and content delivery networks. HLS is designed for reliability and dynamically adapts to network conditions by optimizing playback for the available speed of wired and wireless connections.” (Apple Inc., 2018)

Apple has made HLS publicly available as RFC 8216 (Pantos, 2017) and it does not appear to support MPEG-DASH on iPad or iPhone (Weil, 2017).

The problem with this type of proprietary evolution is that it requires the client side of video playback to be aware of multiple proprietary protocols and creates client complexity. This is an unsustainable growth model for an internet-scale system with billions of connected devices. Video streaming over the internet requires standardization.

2.4 MPEG-DASH Protocol Overview

In order to standardize a streaming method, in 2009 the Motion Pictures Expert

Group (MPEG) solicited proposals for standardizing streaming (Sodagar,

2011). In 2012, the first draft of a standard titled Dynamic Adaptive Streaming over

HTTP was released as ISO specification 23009-1 (ISO, DASH Specification, 2012).

The functional goals of the specification are simple: (1) provide video at maximum resolution, (2) Allow the user to view video on as many device types as possible, (3) use whatever network is available (e.g. cable, fiber, wireless, cellular), (4) offer basic video functions (pause, rewind, fast forward), and (5) support two-channel or surround-sound multichannel sound. Supporting this common set of functional goals for video with a royalty-free specification and common web server technology for the packet transport should make MPEG-DASH successful.

11

The main benefit of DASH is that it adapts to changing network throughput conditions. The reason this is important is that any given internet connection can have varying throughput for numerous reasons including internet core congestion, edge network congestion, and congestion on the home network a device is attached to such as a wireless home network. The DASH protocol attempts to maintain a consistent quality of experience for the end user if network throughput changes.

DASH uses multiple stored media encodings on the server side. The encodings are created in successively higher resolutions, which have greater data sizes and therefore need progressively more network throughput. The video content is stored in DASH segment files. The specific information about the video, the bit rates it is encoded to and the codec required, the audio languages supported, and the digital rights management information is stored in an XML file called the manifest file. The manifest file typically has an “mpd” extension so sometimes it is referred to as “the MPD file.” As the DASH player plays a video, it analyzes the network throughput at the ingress point of the player and uses the maximum resolution video encoding available on the server that will “fit” the available throughput without exceeding it. The TCP/IP packets containing the DASH segments will travel through numerous routers from the host HTTP server to the end user’s player (the receiver). The slowest link on this path will effectively set the maximum throughput. The DASH player is constantly monitoring the download throughput and executing an algorithm to determine if it must change the encoding in use. If the network throughput goes down, the DASH player detects this and switches to an encoding requiring less throughput and consequently lower resolution. If the network throughput goes up, the DASH player will switch to an encoding with higher resolution

12

up to the maximum available. Thus, the video system adapts to the changing throughput of the network. The types of algorithms that could be used by a client are presented in the following subchapter DASH Adaptive Bit Rate Algorithms.

2.5 Bandwidth Requirements and Codecs

Since the principle of DASH is to adapt to network conditions, it is important to understand the amount of data transmitted for video streams and how that data size is significantly reduced by the use of codecs.

A codec is a two-part system that encodes and decodes data. The encoded data can be stored persistently or transmitted over a network. Some codecs convert data from the analog domain to the digital domain and back (Stallings, 1990), but codecs are commonly used in audio and video communications to significantly reduce the data size of audio or video data prior to storage or transmission, i.e., compress data. Codecs can be implemented in hardware or software. An encoder is needed to compress the video data by encoding it in MPEG4 format. A complementary decoder is needed to decode the data and thus decompress it for playback. Combining the terms encoder and decoder we derive the industry term, codec (Watkinson, 2004). In most cases of video, the encode and decode processes are “lossy,” meaning some of the exact original content on a bit- level basis is lost during encoding. The overall desire is to minimize this loss and maximize quality of experience to the end user. A good example is the codec needed for the H.264 MPEG4 compressed video format (International Standards Organization,

2015).

13

MPEG compression defines three types of video frames: I-frames, P-frames, and

B-frames where “I” means intra-picture, “P” means predicted picture and “B” means bidirectional predicted picture. The encoder portion of the MPEG codec processes each original video frame and creates either an I, P or B frame in the compressed version of the original video. An I-frame contains enough compressed video data to render a full visual frame when decompressed. P frames and B frames are “predicted” interpolations between I-frames and cannot be rendered on their own without decompressing with nearby frames (Peterson & Davie, Computer Networks, 5th Edition, 2012). Therefore, the more I-frames present, the higher quality the decompressor can achieve, but the tradeoff is that I-frames require more data space.

The MPEG codec has a setting for the maximum distance between I-frames and this setting is one of a few that can be used to tune quality versus output size. As an example, a frame sequence for one second of a 30 frames-per-second (fps) video where the maximum number of frames between I-frames is ten may have a frame sequence such as the following where an I-frame appears at the first, tenth and twentieth frames:

IBBPBBPBBPIBBPBBPBBPIBBPBBPBBP.

The following video encoding table presents some common video encodings and the network bit rate/throughput they require. Video codecs used for video streaming typically utilize a variable bit rate, which means the amount of bandwidth needed to transport the media being encoded can vary based on the nature of the media. For example, if while filming a scene the camera pans, the bit rate required will be higher than average. If the scene is stable, i.e., not changing often, then the bit rate should be lower than average. The table shows a range of network throughputs needed to support

14

different resolutions for live streaming (Youtube/Google Inc., 2018). More efficient codecs are being developed that will decrease the required throughput rates without decreasing quality, for example, Google’s VP9 and a royalty-free consortium-based codec called AV1.

Video Approximate Average Resolution Resolution Frames Codec Bit Rate Range Bit Rate Width Height Per Required (pixels) (pixels) Second H.264 20 - 51 Mbps 35.5 Mbps 3840 (“4K”) 2160 60 H.264 13 - 34 Mbps 23.5 Mbps 3840 (“4K”) 2160 30 H.264 6 - 13 Mbps 9.5 Mbps 2560 1440 30 H.264 3 - 6 Mbps 4.5 Mbps 1920 1080 30 H.264 1.5 - 4 Mbps 1.75 Mbps 1280 720 30 H.264 500 - 2,000 Kbps 1250 Kbps 854 480 30 H.264 400 - 1,000 Kbps 700 Kbps 640 360 30 H.264 300 - 700 Kbps 500 Kbps 426 240 30

Table 1. Video Codec Bit Rates

The following table presents some common audio encodings and the bit rates they require.

Codec Bit Rate Channels MP4 audio 64 -128 Kbps 2 MP4 audio 32 Kbps 2

Table 2. MPEG4 Audio Codec Bit Rates

15

The importance of compression by codecs cannot be overstated as demonstrated by the following basic math.

A 1K high-definition video at a resolution of 1920 x 1080 pixels with 24-bit color depth to be played at 24 frames per second (fps) requires 1.2 Gbps of bandwidth when uncompressed as shown in the following simple calculation (Peterson & Davie,

Computer Networks, 5th Edition, 2012).

1080 pixels x 1920 pixels x 24 bits x 24 frames/sec = 1.2 Gbps

Equation 1. Uncompressed 1K 24 fps Bandwidth Requirement

A 4K high-definition video at a resolution of 3840 x 2160 pixels with 24-bit color depth to be played at 30 frames per second would require approximately 6 Gbps of bandwidth if uncompressed as shown in the following calculation.

3840 pixels x 2160 pixels x 24 bits x 30 frames/sec

=747 MB/sec

= 5.972 Gbps

Equation 2. Uncompressed 4K 24 fps Bandwidth Requirement

Since most last-mile internet data services and cellular data services are typically one Gbps or less, compression is therefore mandatory for video streaming. Last mile internet services and their bandwidth capabilities will be presented in more detail in the subchapter

Last Mile Internet Access Bandwidth Requirements.

16

2.5.1 MPEG-DASH and Codecs

DASH does not perform the encoding of media, nor does it enforce the use of specific codecs. DASH does not introduce any new codecs, but it does, however, specify the codec needed to play the encoded media via the manifest (MPD) file. As long as the

DASH player application and operating system support the codec needed, DASH can stream the media and the player application can render it. With this open-ended design, as more codecs are designed and developed, DASH does not need to change, just the manifest MPD file and device codec would change (and the source media would need to be encoded using the new codec).

2.6 DASH Adaptive Bit Rate Algorithms

If the available network throughput is insufficient for a video’s required bit rate, the end user will experience pauses in the video playback as the player buffers more video data. To solve this problem, the video system player (client) should dynamically adapt to the network conditions and automatically select the highest possible resolution the network can support. The general concept is called adaptive bit rate streaming

(ABR). The player selects the rate at which video is streamed. This is fundamentally different from broadcast protocols or protocols like RTP (real-time protocol) that transmit at a fixed rate with tight timing constraints and do not guarantee quality of service. ABR improves upon these protocols by following this general algorithm:

1. Encode the media at various bit rates 2. While playing media, the client monitors the rate at which it can download segments into a playback buffer. If the buffer becomes empty and the playback

17

stalls, the player will pick a lower bit rate encoding based on the stated required bit rate in the manifest file for the encodings. 3. If the playback is not stalling and the player algorithmically concludes there is bandwidth available such that the player can use a higher bit rate encoding, the player will switch to the higher bit rate encoding. (Algorithms are presented in a subsequent subchapter.)

It is not entirely clear who first conceived of adaptive bit rate streaming. A speculative statement on Wikipedia claims it “was created by the DVD Forum at the

WG1 Special Streaming group in October 2002” (Wikipedia, 2010). My own search of patents related to the topic of adaptive media streaming yielded one patent from 1998,

Network Media Streaming US6389473B1, that describes creating “segments” from a media file and transmitting them to a client over HTTP or FTP (USA Patent No.

US6389473B1, 1998). It does not specifically appear to mention an adaptation method.

Another patent from 2008, Adaptive Video Streaming System and Method

US20100074324A1, is much closer to how MPEG-DASH is described in the current ISO specification (USA Patent No. US8228982B2, 2008). This 2012 patent states multiple methods including:

... 4. The method of claim 3, wherein encoding the source video into selectable layers includes encoding the source video temporally into selectable temporal layers.

5. The method of claim 3, wherein encoding the source video into selectable layers includes encoding the source video spatially into selectable spatial [sic] layers. ...

18

The “selectable layers” that are “spatial” are equivalent to the representations described by MPEG-DASH.

2.6.1 Adaptive Bit Rate Algorithm Research

The type of algorithm the player uses to maximize the user quality of experience has received attention by many researchers. In his paper A Comparative Case Study of

HTTP Adaptive Streaming Algorithms in Mobile Networks, Karagkioules explorers three types of algorithms: throughput-based adaptation, buffer-based adaptation, and time- based adaptation (Karagkioules & Concolato, 2017). He concluded that buffer-based algorithms outperform the others based on an extensive research paper, A Buffer-Based

Approach to Rate Adaptation: Evidence from a Large Video Streaming Service (Huang,

2014). A graph from that paper helps illustrate the finding:

19

Figure 2. Rate Map of Buffer Based Algorithm

A summary of the buffer occupancy algorithm is as follows. Memory buffer occupancy is expressed on the X-axis. The set of bit rates required per representation is expressed on the Y-axis as set R. As the buffer occupancy increases, the algorithm selects the next higher indexed representation from set R. The aggressiveness of the selection is dictated by the slope of f(B) and the red line indicates a slope beyond which quality of experience is poor.

Using a different algorithm, Mao, Netravali, and Alizadeh describe in their paper

Neural Adaptive Video Streaming with Pensieve a player algorithm that attempts to optimize quality of experience based on a neural network that uses reinforcement learning. In the words of the authors:

“... the decision policy guiding the ABR algorithm is not

20

handcrafted. Instead, it is derived from training a neural network. The

ABR agent observes a set of metrics including the client playback

buffer occupancy, past bit rate decisions, and several raw network

signals (e.g., throughput measurements) and feeds these values to the

neural network, which outputs the action, i.e., the bit rate to use for

the next chunk. The resulting QoE is then observed and passed back

to the ABR agent as a reward. The agent uses the reward information

to train and improve its neural network model.”

The conclusion of the authors was that their method outperformed other adaptive bit rate algorithms (Netravali, Mao, & Alizadeh, 2017).

For the project portion of this thesis, I used the Google Chromecast player. It contains a code module called Simple ABR Manager that uses download rate data to estimate the network bit rate. It then picks the representation that best matches the estimated bit rate (Google, Inc., 2018). Note that the representation bandwidth is specified in the manifest file per representation. In my experience with the Chromecast, the drawback with this method is that if the bit rate in the manifest file is not correct, it directly affects the algorithm in the Chromecast. If the bit rate in the manifest is set too high, the Chromecast will attempt to read ahead too far and live data may not yet be available from the source of the data, such as a camera in my case.

2.7 DASH Network Component Details

The MPEG-DASH group began accepting proposals for standardizing ABR in

2009. The result in 2012 was the publication of the DASH specification, ISO 23009-1.

21

DASH is backed by industry leaders and promoted by the “DASH Industry Forum,” an organization backed by large corporations such as Akamai, Adobe, Dolby, Microsoft,

Netflix, Qualcomm, and Samsung. It also includes many other smaller companies as supporters (DASH Industry Forum).

The DASH network model appears in the following diagram copied from the

DASH specification (ISO, DASH Specification, 2012).

Figure 3. DASH Network Elements

Operational Steps

The basic operational steps of DASH are as follows. The term “DASH Client” is synonymous with DASH player.

1. A back-end process is used to prepare and encode the media. This is the box on the left titled DASH Media Presentation Preparation. This step needs to be done

22

before any clients can view the media. In this thesis, I created the “media preparation” server software I call tv_channel. 2. The DASH client requests from the DASH server a “manifest” XML file called the MPD (Media Presentation Description) file. Much like a ship’s manifest document describes the cargo of the ship; this file contains information regarding all available encodings and their bit rates for a particular video. This will be described in more detail later, but for now, it is important to know that it contains the URLs of the encoded video data segments. 3. The media playing software application uses the DASH client to select a video and audio bit rate from the manifest and begins downloading both separately with HTTP. The audio and video streams were synchronized in time at the encoding stage by placing time stamp interval MPEG4 boxes in the headers of the segment files. 4. The DASH client is codec agnostic so it passes the received video frames and audio packets to the media application to render them with the codecs associated with the operating system on which the application is running. 5. The DASH client runs the adaptive bit rate algorithm previously described in DASH Adaptive Bit Rate Algorithms. If the video or audio segments are downloading well, the player may switch to the next higher bit rate. The process of switching to a different media stream using DASH is described in more detail later. Similarly, if the segment download rate is insufficient for the chosen bit rate, the player will select a lower bit rate stream. 6. The DASH client repeatedly does HTTP GETs for segments of video and audio data until the video and audio playback is complete.

In this thesis, I implemented the Media Presentation Preparation box

(tv_channel segment server program) on the left. I also implemented a virtual set-top box as the MPD Delivery Function. They are the two boxes in the following diagram that have hatch marks.

23

MPD Delivery MPD Function (Virtual Set- top Box)

DASH Media Presentation Preparation (tv_channel segment DASH DASH server) Segment Segments Client Delivery Function HTTP (HTTP Server) Cache

Thesis Legend: Implementation

Figure 4. Thesis Implementation of DASH Network Elements

2.8 DASH Streaming Data Model Details

In this subchapter, I will describe how the DASH protocol data model supports multiple encoding resolutions with the concept of the DASH representation and how the protocol associates the representations with time.

2.8.1 Media Presentation Timeline

The DASH data model starts with a common timeline for all encodings called the

Media Presentation Timeline. It is divided into Periods. The following diagram depicts the division of a 3-minute video into 3 Periods of 60 seconds each.

24

Media Master Media Presentation 0 60 120 180 Timeline (sec.)-> Period 1 Period 2 Period 3

Figure 5. Media Presentation Timeline

Each Period is in turn divided into multiple Adaptation Sets where one such set may describe the available video encodings of the media and another such set may describe the audio encodings. For example, referring to the video encoding bit rate table previously presented, the Adaptation set may include a 1 Mbps encoding and a 2 Mbps encoding. Each of these encodings is called a representation where it is defined as a

“deliverable encoded version of one or several media content components” (ISO, DASH

Specification, 2012). Therefore, Representation 1 may require 1 Mbps of bandwidth and

Representation 2 may of higher quality and require 2 Mbps of bandwidth. The next layer of the data model is the Segment. This is where the data model ties back to HTTP since each Segment must have an URL that can be used to either retrieve the entire segment or a portion of the segment using a byte range partial HTTP GET.

Extending the prior diagram helps to visualize the connection between time and segments. Figure 6 below depicts each 60-second Period divided into three 20-second segments per representation. For illustration, I have given sample URLs based on a fictitious server and encoded file set. I specifically show the segments for representation

1, but not 0 or 2. Representation 2 is named “trick mode” and is used for fast-forward and rewind. Trick mode is explained in more detail in the subchapter Trick Mode Play. 25

Media Master Media Presentation 0 60 120 180 Timeline -> Period 1 Period 2 Period 3 Adaptation set 0 (video) Adaptation set 1 (audio) Representation 0 (1 Mbps) Representation 1 (2 Mbps) Representation 2 (trick mode)

Segment 0 Segment 1 Segment 2 Http:// Http:// Http:// streamer.harvard.edu/ streamer.harvard.edu/ streamer.harvard.edu/ vid1_2mbps_0 vid1_2mbps_1 vid1_2mbps_2

Figure 6. Segments for one Representation

To facilitate a client switching from one representation to another, it is mandatory that the codec encoded timing of segment N in each video and audio representation is the same. For example, segments encoded with the H.264 codec have a numerical PTS

(Presentation Time Stamp) value encoded in the header of each segment. The time-base of the PTS values are in units of ticks per second. As a DASH encoder creates segment

N for each representation, it must set the PTS value in each segment N header to be the same. The PTS value is used by the player to precisely time the playback of the encoded video and audio data. For example, if segments were 20 seconds long and the PTS time base was 300 ticks per second, segment 0 would begin at PTS 0, segment 1 would begin at PTS 6000, segment 2 would begin at PTS 12000, etc. If the player switched from

26

representation 1 segment 1 to representation 0 segment 2, it must find a PTS value of

12000 in the header of representation 0 segment 2.

2.8.2 Media Presentation Description File

As mentioned in the overview, the MPD file is an XML-formatted file the client downloads to learn the attributes of the media it is being asked to play. It includes the

Adaptation Set, the Representations (available bit rates), and most importantly the URL or URLs used to retrieve Segment data. The DASH specification goes into great detail for most of the XML tags (ISO, DASH Specification, 2012). We are mainly interested in how the DASH data model is expressed with XML, therefore in Figure 7 below I have created a MPD file based on an example in the annex of the DASH specification and highlighted the two video representations. DASH supports three main methods of retrieving video and audio data using URLs:

1. By specifying an exact URL to a file. The file needs to be retrieved with an HTTP GET. In this approach, the DASH client must get the entire file. The files are explicitly named in the MPD file.

2. By specifying an exact URL to a file with the expectation that an HTTP partial GET will be used to retrieve data from a start byte to an end byte. In this approach, the DASH client can read successive sections of the same file as time progresses (Fielding, et al., 1999).

3. By a method called “Template based Segment URL Construction” (ISO, DASH Specification, 2012). In this method, variables RepresentationId and Number are placed in the MPD file. The DASH client uses them to dynamically create the URL for a data segment. By doing so, the MPD may logically define thousands

27

of URLs without explicitly enumerating them in the manifest file. The example below illustrates this method.

Methods 2 and 3 above are the most useful, so I will provide an example for each of them.

2.8.2.1 Example of Method 2

In the example in Figure 7 below, the MPD has 2 separate URLs for retrieving 1

Mbps encoded or 2 Mbps encoded media from the fictitious server:

1. streamer.harvard.edu/course_vid_2_1mbps.mp4

2. streamer.harvard.edu/course_vid_2_2mbps.mp4

In this method, the DASH client will use the HTTP partial GET retrieval method.

In this method, an HTTP client can request a specific portion of a file on the web server by specifying the file name, the beginning byte offset, and the ending byte offset. A complete MPD file example appears below in Figure 7.

http://cdn1.example.com/

28

7657412348.mp4 3463646346.mp4

streamer.harvard.edu/course_vid_2_1mbps.mp4

streamer.harvard.edu/course_vid_2_2mbps.mp4

Figure 7. Example Manifest with Discrete URLs

2.8.2.2 Example of Method 3

Figure 8 below demonstrates method 3 where the variables

$RepresentationID$ and $Number$ are used. Only the relevant portion of the manifest file is shown.

29

...

Course_Lecture5_

index="$RepresentationID$.sidx"> ...

Figure 8. Example Manifest with Variables

From the above BaseURL tag and Segment Template, the DASH player can synthesize the following numerically sequenced URLs over time:

1. Course_Lecture5_1130kbps_00000.mp4

2. Course_Lecture5_1130kbps_00001.mp4

3. etc.

I found method three to be very flexible for live video streaming. Having the

DASH player synthesize the video file name by incrementing an index number simplifies the system because the .mpd file does not need to be updated in real-time. This was a great design choice by the authors of the DASH specification. I used this method in the project portion of the thesis.

2.9 Trick Mode Play

Trick mode play is a term used in the MPEG-DASH specification to describe fast forward and reverse. An MPEG encoder can be leveraged to create a trick mode video

30

stream. As an example, assume there is an uncompressed 30 fps source video with 1K resolution. The encoder can be instructed to use the source video to create a trick mode representation with attributes of 1 frame every 10 seconds, a much smaller resolution

“thumbnail” resolution, and do so with all I-frames. The frame sequence would appear as

“III...”

Applying this method to MPEG-DASH, Sodagar has described that the thumbnail video can be used to create a trick mode representation to complement the main video representations (Sodagar, 2011). The trick mode representation would be defined as a specific representation in the manifest file. The MPEG-DASH player needs to have the functionality to display the trick mode images, allow the user to scan through them and select an image to continue playback at the point in time related to the thumbnail image selected. Following the PTS timing rule previously discussed, the trick mode representation must have the same timing as the other video representations. Therefore, the image selection sequence can be used to create a target segment index in the future or the past, allowing the user to perform fast-forward or reverse.

2.10 Digital Rights Management

Digital Rights Management (DRM) refers to the methods by which a video can be encrypted and played by someone with the legal right to play the video. This right is typically obtained by purchasing the right to play the video from the host of the video, obtaining an electronic key, and using that key to play the video either for a fixed time or in perpetuity. The management of the key is hidden from the user by the player device or software. For example, one can purchase the right to watch a movie on Prime

31

Videos. Without any form of DRM, video piracy would make it difficult for a movie company or other video production company to earn revenue from their work.

A DRM system typically is comprised of an account and key management system, an encoder that is capable of encrypting video with keys and a video player that can access a key and decrypt the encrypted video (Bitmovin, Inc., 2018). Some systems such as Widevine also work in hardware chips because decryption is much faster in hardware than software and decryption typically reads every byte of a video stream

(Widevine, Inc., 2018). Many companies selling internet-based video or music services have their own DRM system. Google has Widevine (Google purchased the company

Widevine), Microsoft has PlayReady, Adobe has PrimeTime and Apple has Fairplay.

In U.S. law, the Digital Millennium Copyright Act (DMCA) of 1995 banned the development and distribution of technology designed to sidestep DRM. The law also bans any technology or the proactive design of a technology that circumvents DRM and allows access to works that are under copyright protection (Encyclopedia Britannica,

2018).

Since MPEG-DASH is a protocol for streaming video and audio over a network, it must have a notion of DRM to be commercially viable. MPEG-DASH does not create a new type of DRM, but rather has provisions in its manifest file to support many types of

DRM systems in use today. The MPEG-DASH numerical identifiers web page currently has identifiers for twenty-two different DRM methods (DASH Industry Forum, 2018).

The MPEG-DASH specification states that the manifest file may have an XML element called ContentProtection and an associated URI used to identify a content protection scheme. The manifest file should also use the associated value field to specify the DRM

32

system, encryption algorithm, and key distribution scheme(s) employed. These attributes allow a client to determine whether it can play the protected content or not. A sample section of a manifest with DRM appears in Figure 9.

http://MoviesSP.example.com/protect?license=jfjhwlsdkfiowkl

http://MoviesSP.example.com/protect?content=mslkfjsfiowelkfl

Figure 9. Example Manifest with Content Protection

The following interaction diagram shows a Player communicating with its local

DRM client module and a DRM Service. The diagram is from the MPEG-DASH book

Guidelines for Implementation: DASH-IF Interoperability Points (DASH Industry

Forum, 2017).

33

Figure 10. DRM Interaction Diagram

The player starts at step (1) by downloading an MPD (manifest) file that contains the previously described ContentProtection XML entry. At step (2), the player verifies if it supports the specified DRM type. At step (3), the player downloads the initialization video segment, which may contain the MPEG track encryption box, tenc, and license box, pssh. At step (4), the player contacts the DRM service to obtain a license at step

(5). At step (6), the player pushes the license to the DRM Client, which extracts the key and possibly configures hardware to support decryption. At step (7), the player downloads an encrypted segment and passes it at step (8) to the DRM client. The DRM

34

client passes the decrypted frame back to the player at step (9). Lastly, at step (10) the player renders the decrypted frame.

35

Chapter 3. Prior Academic Research

This chapter will describe some academic research I found related to MPEG-

DASH and relevant to my project.

3.1 GPAC – Project on Advanced Content

GPAC (project on advanced content) began as a startup in the United States in

1999 with a goal of creating MPEG software (GPAC, 2017). The software evolved to be open source and was inherited by the Telecom ParisTech University (Ecole Nationale

Supérieure des Télécommunications) as part of the research work of the Multimedia

Group. They continued with the work and released a large code library for multimedia software. They call the code base “GPAC.” Their website is https://gpac.wp.mines- telecom.fr/home/. GPAC contains many types of multimedia software and includes some software for MPEG-DASH. Specifically, it has a DASH segmenter written in C. I read the code but did not use it in my project. GPAC stores all of their software at https://github.com/gpac/gpac.

What I did find useful at GPAC was a utility called MP4Box (GPAC MP4Box,

2017). I downloaded the code, built it and ran it a number of times to understand what a

DASH segmenter should do for a static video file. It was a useful instructional utility.

3.2 Moscow State University

While reading about video quality measurement methods in Video Encoding by the Numbers by Jan Ozer (Ozer, 2017), I found that Moscow University has developed a

36

video test tool through their Graphics and Media Lab Video Group. This group has done various types of video research including codec and video quality research (Ozer, 2017).

They have a spinoff product called the MSU VQMT (Moscow State University Video

Quality Measurement Tool). The tool is expensive and I did not use it, but it intrigued me because it does automated quantitative analysis of video images. Developing my tv_channel system, I found there are many H.264 input parameters that can change the quality of the output images and therefore to test a system at scale, you would need an automated approach. This observation applies to both non-DASH and DASH-based video systems.

3.3 Theses and Papers

This section will review theses and research papers I found on MPEG-DASH as of the year 2017. I only found documents related to the testing of MPEG-DASH, I did not find any documents related to the development of a 4K MPEG-DASH system other than the papers previously presented in the Adaptive Bit Rate Algorithm Research section.

In the first document, a thesis, Compliance Procedures for Dynamic Adaptive

Streaming over HTTP – (DASH) (Mazhar, 2011), the author proposes formal methods to test DASH. From it, I extracted specific items to test. The paper drew my attention to the number of permutations of the video encodings that DASH could support including static and live encodings. The thesis outlined some fundamental system test points including:

37

1) Testing the XML format of the MPD file as described in subchapter 3.3.2, XML

Schema Validation. The MPD contains a hierarchy of elements that must have

proper XML format. The paper contains a pointer to an online XML test page

at http://www.xmlvalidation.com/ and an XML test tool from Oracle, Inc.

2) Testing for the minimal presence of mandatory MPD file elements including the

“BaseURL,” at least one “Representation ID,” and at least one “AdaptationSet”

3) Testing for valid, DNS resolvable and HTTP accessible segment URLs

specified in the MPD files.

One takeaway from this document was to use the XML tool to validate my .mpd file, which I describe in the following testing section. A second takeaway was that the content of the MP4 “boxes” was important to MPEG DASH playback. I realized this before starting development and had to learn the MP4 file format in order to create the correct file header boxes, “chop” the segment files on I-frame boundaries, and create the proper file termination boxes. These MPEG details I found to be in MPEG Specification

14496-12. Information Technology – Coding of Audio Visual Objects. Part 12: ISO

Base Media File Format (International Standards Organization, 2015).

The next paper I thought was useful was another testing research paper related to

MPEG-DASH being used on a home , Evaluations of 4K/2K Video

Streaming Using MPEG-DASH with Buffering Behavior Analysis (Harada, Kanai, &

Katto, 2015). This paper focuses on the client/player side congestion issues. The authors express the number of players versus the maximum network throughput mathematically.

38

It is one of the few papers I found discussing DASH and 4K resolution video as opposed to lower resolutions.

The authors proposed three main testing scenarios with a mathematical definition as follows.

“To understand the relationship between available bandwidth and representations, which correspond to multiple bitrates used for video encoding in MPEG-DASH, we can classify congestion degrees into the next [sic] three cases:

R max < B / n

R min < B / n < R max

B / n < R min

where Rmax is the highest bitrate of the representation, Rmin is the lowest bitrate of the representation, B is available bandwidth, and n is the number of users.”

This is an interesting model because the whole goal of DASH is to make playback smooth for the number of users in case two. Case two occurs when the available bandwidth is distributed among the n users and the representations they are using are between the max and the min. Their conclusion after running cases with n equal to 7 and

2.4 GHz and 5.0 GHz networks was that a DASH client successfully adapts to the network congestion and uses representations within the R min and R max bounds.

The final document I found, a master’s thesis, Implementation and Analysis of

User Adaptive Mobile Video Streaming Using MPEG-DASH (Jagannath, 2014) covers the client side DASH player, not the server side, but it was still useful. It did not have a great deal of impact on my project but reinforced the relationship between encoding parameters, the bandwidth they require and the usability perceived by the end user.

39

Chapter 4. Industry Development of DASH

This chapter will review some relevant industry work on the MPEG-DASH protocol and devices that are using it. It was particularly useful to see if there was a successful hardware DASH client and software DASH client since I used a client as a

“black box” in my thesis project. I had control over the server side as I wrote that software as the thesis project. I found the software DASH client at the DASH Industry

Forum website.

4.1 DASH Industry Forum

To promote DASH, an organization called the DASH Industry Forum was created

(http://dashif.org/). The DASH Forum hosts a website that promotes meetings, promotes the specifications, hosts the test client player, and hosts test media. The website also hosts an engineering type of test DASH player that has great statistics and instrumentation that can be used to test DASH media. The client is hosted at http://reference.dashif.org/dash.js/. I used this player extensively in my project test phase.

The DASH forum hosted IEEE and ACM sponsored meetings in 2017.

4.2 Google and Chromecast

The Google Chromecast media player supports streaming a few types of media, including DASH. There are three generations of the , but I used the 4K latest generation Chromecast in my project. Google has been steadily investing in the

40

development of the Chromecast since the first version was released. The evidence of this is that when I started this thesis project in mid-2017, there was no “YouTube TV” but by the end of 2017 YouTube TV was announced (Google owns YouTube). This indicates

Google is pushing to be on the cutting edge of television evolution.

In addition, Google has invested in the DASH method of streaming by developing a test player, test media material, and code in the Chromecast device that can play DASH media files. The APIs to control the device are publicly available at this Google site: https://developers.google.com/cast/docs/developers (Google Inc., 2017). The project name for the JavaScript code in the Chromecast is “Shaka.” I did not change any Shaka code for this project since my work is on the server side, but for reference, the code design is hosted here: https://shaka-player-demo.appspot.com/docs/api/tutorial- architecture.html (Google, 2018) and the code repository is here: https://github.com/google/shaka-player.

The fundamental idea of the Chromecast is that it is very inexpensive and simple to operate. Its price is either $30 or $60 depending on the 1K or 4K model. It does not have a GUI; therefore, it must be commanded from some other application. Google releases what they call “Cast Enabled” java code that can be built into a Java application and that is exactly what I used in my virtual set-top box application. By placing the “Cast

Enabled” code in a web app, the app can then control a Chromecast and instruct it to download and play specific media.

Chromecast is designed to allow developers to write software that interacts with it and to help this process, a developer mode is provided. In order to use a Chromecast in developer mode, you must register your device with Google and they grant you a key.

41

That key must be placed in your code and when you run your Cast Enabled code, the key is passed to a Google server. Google may use this registration mechanism to monitor the usage rate of the keys to see what types of applications are being developed for

Chromecast while it is in developer mode. The good part of registering it was that the

Chrome browser developer tools are Chromecast aware and you can access the Java console of the Chromecast after enabling developer mode. I found the console to be extremely useful to see what video files it was loading and what errors it was experiencing. The following shows the Chrome browser open to my

Chromecast Enabled set-top box application and Developer Tools displayed.

Figure 11. Chrome Developer Tools

42

By selecting the Chromecast Ultra (4K) device in the lower right and clicking on

“Inspect,” one can see console of the Chromecast device and the video files it is downloading and playing as shown in Figure 12.

Figure 12. Chromecast Console Window

43

The Chromecast supports the following codec types (Google, Inc., 2018). In my work, I had to prove all of them did or did not work at 4K resolution. I concluded I had to use high profile 4.1, avc1.640029, for my final test results.

Profile Type Codec Supported Yes/No

3.0 baseline avc1.42E01E Yes

3.1 baseline avc1.42E01F Yes

3.1 main avc1.4D401F Yes

4.0 main avc1.4D4028 Yes

4.0 high avc1.640028 Yes

4.1 high avc1.640029 Yes

5.1 high avc1.640033 No

Table 3. Chromecast Supported H.264 Codec Profiles

4.3 Roku

I did not use a Roku player for this project, but my research indicates it supports the DASH protocol (Roku, 2018). Roku is a successful streaming video device that has more functionality than Chromecast as it has its own GUI and can record and playback video. Notably, there is a public SDK for the Roku hosted here: https://sdkdocs.roku.com/display/sdkdoc/Audio+and+Video+Support (Roku, 2018).

I found it interesting that for 4K resolution video, the Roku only supports the

HEVC (H.265) and VP9 codecs. The Chromecast supports only the H.264 codec. This may indicate a limitation of the hardware decoder in the Chromecast. It also indicates that to build a publicly available television system supporting all DASH devices, the 44

server side must encode for both H.264 and H.265 at a minimum. Since the DASH protocol is codec “agnostic,” it is unable to enforce that all player types use the same codecs. The .mpd manifest file indicates the codec used for every representation and that is how the same .mpd file could be used for both device types. However, the DASH encoder must encode the source media for both codec types, H.264 and H.265.

4.4 Embedded in Televisions

A positive sign of the adoption of the DASH protocol is that some television manufacturers have embedded a DASH player in their TVs. This includes Sony,

Samsung, Philips, LG and Panasonic (Unified Streaming, 2018).

4.5 Bitmovin

There are a few companies already in the video encoding, storage and playback industry that are working on DASH related products. The most progressive company I found was Bitmovin in Austria, https://bitmovin.com/. They have developed a multi- faceted approach to video streaming technology including an HTML 5 DASH player, cloud-hosted video encoding and public APIs to access their cloud encoder. Their technology is not specific to DASH as they also support Apple HLS and Smooth

Streaming (https://developer.bitmovin.com/hc/en-us/articles/115001080013-Supported-

Input-Output-Formats).

45

4.6 Akamai

Akamai is a Content Delivery Network infrastructure company spun out of MIT in the late 1990s. Akamai was formerly one of Netflix’s CDN partners until Netflix built their own CDN (Forbes, Inc., 2012). In the product area of segmented video distribution,

Akamai supports the distribution of multiple video encoding types including HTTP Live

Streaming, HTTP Dynamic Streaming, Microsoft Smooth Streaming and Dynamic

Adaptive Streaming over HTTP (Akamai, Inc., 2017). Their specific product offering that supports the above protocols is Adaptive Media Delivery (Akamai, Inc., 2017).

Akamai will be presented in more detail in the subsequent subchapter Content Delivery

Network.

4.7 Periscope and YouTube LiveStream

There is a category of video streaming that I call transient streaming because it is not like a that has an expectation of being available nearly 24 hours per day. Two examples of transient streaming are Periscope and YouTube Livestream. Both allow a user to broadcast the video from his cell phone or onto the internet. A user can make a “channel” with either, but the channel is intended to be used for a short- term broadcast. I mention them for differentiation purposes only. The DASH-based system I am proposing can support multiple channels simultaneously and would run 24 hours per day.

46

Chapter 5. A DASH Based Television System

With an understanding of the DASH protocol, prior research and industry video streaming options behind us, we can move on to the design of a DASH-based television system.

5.1 Applying DASH to the Television System

Let us apply DASH functionality to the television system. As presented in the thesis abstract, an alternative method for building a television system compared to the existing cable television network and over-the-air broadcasts would be to use the MPEG-

DASH protocol and stream content over the internet to playback devices. MPEG-DASH could stream content encoded with the H.264 codec. Most tablets and PCs are capable of decoding H.264 in hardware since it is a well-defined codec. Apple iPads support H.264

(Apple Inc., 2018). Intel supports H.264 decoding in hardware in numerous x86 CPU types with graphics support (Software, Primo, 2017).

In the case of an HD TV that is not “internet ready,” it requires a device such as the Google Chromecast, Roku, or Amazon Firestick to render the DASH file content onto an HDMI interface that is connected to the TV. Therefore, DASH can expand the reach of “broadcast” television to more device types and more people. This would also extend the reach of the emergency notification systems such as the

(EAS) used to alert people of life-threatening severe weather conditions and the Amber

Alert for missing children. This is a positive social aspect of a television system with further reach than the existing system. Monroe Electronics, a manufacturer of EAS

47

equipment, has announced it now supports the MPEG-DASH protocol (Monroe

Electronics, 2017).

In the television system proposed by the thesis, the specialized set-top box RF tuner discussed in the Over-the-Air Broadcasts subchapter can be removed from the system. (It is recognized that the proposed system depends on a high-speed internet connection.) The HD RF tuner in a television can be removed also. Both tuners would be replaced with a more general-purpose cable modem tuner or fiber-optic demodulator which would be the basis of the IP “pipe” that can be used for many purposes, not just video transmission. The cable modem tuner or fiber optic demodulator becomes layer 1 in the OSI network model (Peterson & Davie, Computer Networks, 5th Edition, 2012).

Layer 2 would be the MAC layer for addressing the physical device. Layer 3 is the IP network layer and layer 4 is HTTP. This is a much more flexible model and allows for many simultaneous digitized services to flow through a single IP connection including email, web functions via HTTP, voice over IP, the proposed television service using

MPEG-DASH, etc.

The model is depicted in Figure 13. RF Tuner Reduction in Future Home below.

By eliminating the set-top-boxes and their RF tuners and the RF tuners in televisions, the cost of the television system decreases. The decrease per household may be small, but across millions of households, it would be large. The laws of economics suggest such a

48

transition may occur.

HD Broadcast X Over-The-Air Input TV 1 to the Home DASH Player Not Needed Software

Screen Device 1 TV 2

DASH Player Cable Cable TV Software Modem Wireless input to Router RF TV 3 Home Tuner

Screen DASH Player Device 2 Software

Figure 13. RF Tuner Reduction in Future Home

5.1.1 Wireless Home Network Requirements

Although the home network could be wired, it could also be wireless. To be effective, the wireless home network will require the 802.11n wireless networking protocol for sufficient network throughput. In Figure 13, there are five viewing devices.

Assuming each device requires 23 Mbps for a 4K HD video, 115 Mbps of throughput is required. 802.11n supports bandwidth from 6 Mbps to 600 Mbps depending on the channel count and configuration. Using 64-QAM modulation, 600 Mbps can be achieved (IEEE Computer Society, 2009). Since 802.11ac can establish a 1 Gbps connection, it would also be sufficient. 802.11b and 802.11g would be insufficient since

802.11b has a maximum bandwidth of 11 Mbps (Jangeun, 2003) and 802.11g has a maximum bandwidth of 54 Mbps.

49

5.2 Virtualized Television System

Now that we have discussed the home connectivity network, let us now look at the bigger picture of virtualizing the television system. The following diagram depicts the network components in the virtualized television system.

End User

4K Digital Camera DASH CDN Web Segmenter Server Server DASH Screen Internet Client Virtual Player 4K Device MPEG4 Set Top DASH Content Box Segment File System Storage

Figure 14. Virtualized Broadcast Television Ecosystem

Using DASH to create a television broadcast system will require an injection point for the source video and a multi-layered content delivery network (CDN) to distribute the video. On the left of the diagram, the input to the system can be from 4K cameras or from static video content. One or more DASH Segmenter Servers process the

50

input video to create the desired number of representations (resolutions and quality of encoding within those representations). The segment files would be stored in a file storage file system, possibly one with redundancy to prevent system outages. In the next step, the DASH segment files must be transmitted to the CDN. The CDN tree may have multiple layers optimized by the CDN network configuration and topology. The video segments are pushed to the edge of the CDN network where they are accessible by the

End User’s Player Device. The CDN is described in detail in the chapter Content

Delivery Network below.

The End User accesses a virtual set-top box application hosted in the cloud using a web or phone app. The End User selects a channel or movie to watch. The DASH client device is instructed by the virtual set-top box to stream the selected channel by downloading and processing the manifest file for the channel from the DASH Segment web server. (The manifest file was described in chapter 2.8.2 above.) The client parses the manifest file and extracts or synthesizes the URLs for the related segment files that contain the actual video data. The player then starts sequentially downloading and displaying the segment files. The End User then sees the selected channel on the Screen.

The DASH protocol will monitor the download rate and use the highest possible encoded resolution based on the internet throughput available.

Assuming the manifest contains a representation for a 4K resolution encoding and the network bandwidth is available, the result is a 4K display on the user’s screen. To the end user, the multi-tiered method by which a statically stored DASH video or a live

DASH broadcast reaches him would be indistinguishable from a live broadcast over the

51

existing cable television network. That would be the ultimate proof of the validity of using DASH for this alternative television system.

Concerning the network throughput needed to achieve the goal of 4K resolution, on average 23 Mbps of throughput is required as compared to approximately 6 Mbps for a present-day 1080p HD resolution video. The proposed MPEG-DASH television system is a unicast system and therefore has a high throughput requirement to handle the HTTP traffic load for many end users. Content Delivery Network providers such as Akamai have been solving this type of networking problem as demonstrated by the success of the

Netflix video streaming service in the last ten years. The logic behind using a CDN and a computation of the throughput requirements of the CDN is presented in the Content

Delivery Network subchapter below. Netflix has used three CDNs in its lifetime. It started by using Limelight Networks as its CDN but then switched to Akamai’s CDN

(Chicago Tribune, 2012). Shortly thereafter, Netflix started migrating to their own CDN

(Forbes, Inc., 2012) called Netflix Open Connect (Netflix, Inc., 2018).

5.2.1 Last Mile Internet Access Bandwidth Requirements

An important consideration in the network is the last mile connection provided by an internet service provider from its network to the End User. The connection bit rates offered vary from less than 1 Mbps to 1 Gbps depending on the access technology used.

Even though a service provider may advertise a specific maximum bit rate, we are more interested for practical reasons in the actual bit rates resulting from analysis or tests on the live internet. While researching average achievable bit rates, I found two

52

comprehensive sources of rates from Akamai and www.speedtest.net, but in comparison, the bit rates published vary considerably.

In their quarterly report, Akamai’s State of the Internet Q1 2017 (Akamai, Inc.,

2017) Akamai reported that the average connection bit rate on IPv4 networks for the top ten states was 23 Mbps. They reported the average peak connection bit rate on IPv4 networks for the top ten states was 104 Mbps. Speedtest.net reported the average bit rate to be 64 Mbps, much higher than 23 Mbps.

Theoretically, the large discrepancy may be attributed to the methods used to collect the data. Akamai collects data continuously as part of their normal CDN operations. Since a main function of a CDN server is to accept and fulfill file download requests, a CDN server is a good place to acquire download rate metrics. They can track the amount of time it took a host machine to download a specific file with a specific byte size. From the time and the byte size of the file, a download rate can be computed.

Speedtest.net, on the other hand, attracts people interested in knowing exactly what their download bit rate is and it is possible these types of internet users are more sophisticated than average users and may have paid for high-speed internet service. Someone who purchases the most basic internet service may not care exactly what his rate is and may never run such a test. Therefore, the speedtest.net results may be skewed and the Akamai results are more realistic.

In Akamai’s published report from Q1 2017, the average connection bit rate over

IPv4 for the top ten states is provided, as shown in Table 4.

53

Table 4. Average Connection Bit Rate over IPv4 by State

From this data, I computed the average to be 23 Mbps. Unfortunately, the report does not reveal the average for all states.

In the same report, Akamai provides the average peak connection bit rate over

IPv4 for the top ten states as shown in Table 5.

54

Table 5. Average Peak Connection Bit Rate over IPv4 by State

From this data, I computed the average peak bit rate to be 104 Mbps.

Unfortunately, the report does not reveal the average for all states.

www.speedtest.net and Ookla Inc. collated a good source of information for average internet speeds using actual speed test results from 14 million test executions using cellular connections and 111 million test executions using fixed connections

(speedtest.net, 2017). The average fixed connection speed at the end of 2017 was 64.17

Mbps. Moreover, Figure 15 below shows the average test result increasing over 2017.

55

Figure 15. Average Fixed Connection Download Speed

64 Mbps is enough throughput to support two 23 Mbps 4K video streams and nearly enough to support three. The maximum rate available is much higher since 1

Gbps service is offered by both and Verizon FIOS in some locales. As reported in the Fiercetelecom trade publication a number of service providers are testing and offering 1 Gbps home service (Buckley, 2018).

Some service providers offer multiple bandwidths options at different price points as shown in Figure 16 below. The DSL rates are from documented product offerings by

Verizon (Verizon, Inc., 2018). The Fiber rates are from documented product offerings by

Verizon FIOS (Fiber Optic Service) (Verizon, Inc., 2018). The cable rates are from documented product offerings from Comcast, Inc. (Comcast, Inc., 2018). Using the 56

average throughput requirement of 23 Mbps per 4K video stream, we can predict how many streams would be available in a household.

Access Type Maximum Bandwidth Number of 23 Mbps 4K Video Streams DSL 1 Mbps 0 DSL 15 Mbps 0 Fiber 100 Mbps 4 Fiber 940 Mbps 40 Cable 10 Mbps 0 Cable 70 Mbps 2 Cable 250 Mbps 10 Cable 1 Gbps 43

Figure 16. Maximum 4K Streams per Internet Access Type

An observation may be drawn that 4K video will not work over the typical DSL bandwidth offerings. However, fiber and cable would support multiple 4K video instances.

A more practical analysis is in the following table. It shows the average number of 23 Mbps video streams based on the Akamai and speedtest.net data.

Data Source Average Bandwidth Number of 23 Mbps 4K Video Streams Akamai 23 Mbps 1 speedtest.net 64 Mbps 2.8

Figure 17. Maximum 4K Streams per Internet Access Type

57

A final observation is that both the speedtest.net and Akamai reports state that the internet download rates are continuously improving over time. This is partly due to the demand by consumers. However, it is also due to government funding in some states to increase broadband data rates, especially to rural areas. Akamai cites New York,

Minnesota and New Mexico as funding the growth of broadband using government financial resources (Akamai, Inc., 2017). The flexibility of the proposed DASH television system works well with the growth of the internet. This is because DASH lets you create representations for different bit rates. The representations created and used could be tuned to match the measured bit rate of a geographic area. Moreover, as the average bit rate increases year by year, the representation resolution and quality can be increased to match. In this manner, the proposed television system quality improves along with the improvement of internet access over time.

5.3 Content Delivery Network

Consider a theoretical service provider with 25 million subscribers. This number is slightly greater than the number of video subscribers at one of the largest U.S. video and internet service providers, Comcast, Inc. In the fourth quarter of 2017, Comcast had

22.36 million video subscribers (Statista, Inc., 2017). Let us assume each subscriber has five 4K video stream in use simultaneously, matching the previously presented diagram

Example of RF Tuners in Typical Household. Using the approximate average 4K bandwidth from table Video Codec Bit Rates of 23 Mbps per 4K stream, the total required bandwidth is computed as:

58

25 M subscribers x 5 streams x 23 Mbps / stream

= (25 x 106) x 5 x (23 x 106) bps

= 2875 x 1012 bps

= 2.9 x 1015 bps

= 2.9 petabits/second

Equation 3. Aggregate MPEG-DASH Data Throughput

If the traffic type was TCP/IP packets, it could be distributed as IP by routers, but the traffic type is file-based data and therefore is persistent in some type of file system. Routers do not keep persistent packet data. The requirements of the MPEG-

DASH data distribution system are: 1) it must handle a very high data rate based on the above calculation, 2) the system must handle data that is file-based and is, therefore, persistent and 3) it must cover a large geographic area. The most suitable type of network for distributing this scale of data is a Content Delivery Network (CDN). (Note that another term for the same type of network is Content Distribution Network (Peterson

& Davie, Computer Networks, 5th Edition, 2012)).

Defining a CDN first requires some background to the general problems it solves.

Leighton describes in his paper Improving Performance on the Internet, the higher the distance between nodes across the internet, the higher the amount of packet loss. Packet loss in turn negatively affects TCP performance and thus negatively affects overall throughput (Leighton, 2008). Moreover, Peterson describes the system throughput of a web server as the average number of requests that can be satisfied in a given time period

59

and the system throughput can quickly be exceeded during a flash-crowd type of network event where a large number of users access a small number of web pages. In this situation, the system throughput is exceeded and user requests cannot be satisfied

(Peterson & Davie, Computer Networks, 5th Edition, 2012). Considering the nature of the MPEG-DASH television system where one or more video files will be accessed by a very large number of users across a geographically dispersed area, the system will have both of the problems described above. Fortunately, a CDN solves both of these problems.

A CDN is a type of network consisting of a backend server and surrogate servers.

The surrogate servers cache web pages and files that would normally only reside on backend servers (Peterson & Davie, Computer Networks, 5th Edition, 2012). The surrogate servers are geographically dispersed across the internet and often within

Internet Service Provider (ISP) networks. Depending on the CDN functionality and policies, the surrogate servers pull web pages or files from the backend servers on- demand or web pages and files are proactively pushed from the backend servers to the surrogates. There can be multiple levels or tiers of surrogates. The following diagram from U.S. patent Distributed Content Delivery Network Architecture shows the nature of connectivity of nodes in the CDN (US Patent No. 9232240B2, 2012).

60

Figure 18. Basic Content Delivery Network

In Figure 18, 101 is a “master streaming video server,” 103, 105, 107, and 109 are

“sub-master streaming video servers,” i.e., the first tier surrogate servers. The hierarchy continues with 111, 113, 115, etc. being “slave streaming video servers,” i.e., the second tier surrogate servers. 135, 137, 139, etc. are “user station groups,” i.e., in the MPEG-

DASH model, devices with the DASH player.

Given this distributed network architecture, an HTTP client requesting a page needs to know from which surrogate server to retrieve a web page. Another entity in the

61

network called a redirector handles this task. The redirector is depicted on the right in

Figure 19 below.

Surrogate

...

Surrogate Redirector Surrogate Surrogate Surrogate HTTP Backend ...... Client Server Surrogate Surrogate ...

Surrogate Surrogate

...

Surrogate

Figure 19. Content Delivery Network Abstraction

The redirector can use multiple factors and multiple methods to associate a page request from a specific HTTP client to the ideal surrogate server. Locality is a main factor, based on the statement above from the Leighton paper regarding packet loss increasing as a function of distance across the internet. The rule is that the redirector should pick a surrogate in close proximity to the requester. Another key factor is surrogate server load. If a particular surrogate server is overloaded, another page request sent to it is likely to fail. Therefore, the redirector must account for load balancing between surrogate servers (Peterson & Davie, Computer Networks, 5th Edition, 2012).

62

This means the page request load should be balanced among the available surrogate servers. Load balancing could be done by using one of multiple algorithms.

Some basic load-balancing methods are 1) a simple round-robin approach, 2) locality or minimal -hop distance much like basic routing algorithms, 3) hashing the page request to available servers. All three of these methods suffer from the possibility that one or more surrogate servers could become overloaded and others underloaded. A better algorithm would be to use an estimate of server load and proximity together. As

Peterson states (Peterson & Davie, Computer Networks, 5th Edition, 2012), the Cache

Array Routing Protocol (CARP) does this. The redirector has knowledge of the current surrogate server load based on how many requests have been recently sent to it. The algorithm works as follows:

“Upon receiving a URL, the redirector hashes the URL plus each of the available servers and sorts the resulting values. This sorted list effectively defines the order in which the redirector will consider the available servers. The redirector then walks down this list until it finds a server whose load is below some threshold. The benefit of this approach compared to plain consistent hashing is that server order is different for each URL, so if one server fails its load is evenly distributed among the other machines.”

The above algorithm is designed to work in the typical random type of HTTP request environment. However, in the case of the proposed MPEG-DASH television system, the filename for a particular MPEG-DASH period (for example one second of real-time video) for a particular will be the same filename for all requestors. There is a high probability that the same URL will be requested by many clients in a small time window. (End users pausing their video or end users with time- shifted viewing habits will cause some skew in the time the bulk of requests occur for a

63

particular filename.) Therefore, the above algorithm should be augmented to include periodic real-time load status in the redirector’s hash list. The algorithm could also include the requesting IP address in the hash and increase the probability that the same requester would be associated with the same surrogate for subsequent file requests. This will keep any network switch ARP caches populated and router route caches populated with the path from the surrogate server to the requesting HTTP client. These enhancements could be added to the CARP described above for the benefit of the MPEG-

DASH protocol.

5.4 CDN Scale Requirements

The surrogate servers and connections between them in the CDN comprise a mathematical acyclic graph. This graph can be drawn as a multi-tiered tree with the root at the source of the video data and leaves as video player devices. In the CDN, each surrogate server is connected to a set of other surrogate servers in the direction of the leaves. For the purposes of calculating an upper bound on the required number of tiers, let us call the size of this set the tier surrogate count (TSC). The tree is depicted in

Figure 20 below.

64

Surrogate (2,1,1) ...

Surrogate (2,1,N) Re- Surrogate director (1,1,1) Surrogate (2,2,1) Surrogate (depth, DASH Backend ...... group, Client Server Surrogate (1,1,2) index) Surrogate ... (2,2,N)

Surrogate Surrogate (1,1,TSC) (2,3,1) ...

Surrogate (2,3,N)

Figure 20. Content Delivery Network Graph

The diagram’s surrogate instances are labeled with a 3-tuple to uniquely identify a surrogate. The 3-tuple consists of (1) the depth level in the tree from left to right, (2) the group within the level and (3) the 1-based index within the group.

The total throughput of the CDN is a function of a number of factors including:

(1) the server I/O throughput, (2) the network connection throughput between servers, (3) the TSC and (4) the depth of the tree. Applying discrete math, the throughput of the

CDN grows exponentially as the tier depth increases. A simple formula that calculates the CDN throughput follows.

65

CDN Throughput (bps) = Server I/O bps x TSCdepth

Where:

TSC = tier surrogate count

depth = depth of CDN tiers

Equation 4. CDN Theoretical Throughput

In the prior Content Delivery Network subchapter, we computed a large-scale system would require 2.9 petabits per second of throughput coming out of the CDN at the leaf surrogate servers. Now we can compute the tier depth of the CDN to see if the number of surrogate servers is practical.

Let us assume that a surrogate may have 90 Gbps of network I/O as Netflix has recently demonstrated with their CDN appliance (Netflix, Inc., 2017). This includes not just the network interface I/O, but also the file storage system I/O and in the Netflix demonstration, they used fast SSD disk drives. A portion of the I/O must be used by a server to receive MPEG-DASH files from a CDN peer and part of that I/O must be used to either distribute the files to other surrogates or, in the case of “leaf” servers, to satisfy

HTTP requests for the files.

A practical computation would be that of the network throughput needed to support 500 viewable channels. In this case, the output throughput of the leaf surrogate servers at the edge of the CDN will not change because the clients download only the

66

content they need. However, the amount of data that has to be distributed by the CDN increases by a factor of 500 for the full channel set and also increases to support multiple

MPEG-DASH representations (resolutions) for a particular channel. Let us assume five representation based on the bit rates in Table 1. Video Codec Bit RatesVideo Codec Bit

Rates. We compute the total bandwidth of all channels and their representations as follows.

Total bandwidth

= 500 channels x

(23.5 Mbps + 4.5 Mbps + 1.75 Mbps + 1.25 Mbps + 0.7 Mbps)

= 500 x 31.7 Mbps

= 15.85 Gbps

Equation 5. CDN Calculated Throughput

We compute the number of servers at the rightmost tier to achieve 2.9 petabits/second.

Total surrogate servers at last tier

= (2.9 x 1015 bits/second) / ((90 – 15.85) Gbps x 109 bits/second)

= 39,109

Equation 6. CDN Last Tier Server Count

67

Assuming the 15.85 Gbps needs to be distributed to all last tier surrogate servers, we can compute the aggregate throughput to those servers.

15.85 Gbps x 39,109 servers

= 619.9 Tbps

Equation 7. CDN Aggregate Throughput to Last Tier

We can compute the tier depth in the CDN for the 500 channel system. The first server has 90 Gbps of I/O. Therefore, each server at tier N can distribute to 4 servers in tier N+1. ((90 – 15.85) Gbps / 15.85 Gbps = 4.6 servers). Therefore, the TSC is 4.

TSCdepth-1 = 39,109

4depth-1 = 39,109

log( 4depth-1 ) = log( 39,109 )

depth-1 x log( 4 ) = log( 39,109 )

depth-1 = log( 39,109 ) / log( 4 )

depth-1 = 7.63

depth = 8.63

Equation 8. Theoretical Number of CDN Tiers

We need to round up fractional tiers; therefore, the depth of the CDN is 9 tiers.

Figure 21 below shows the input bandwidth per tier index. We can see that to reach

68

619.9 Tbps input to the last tier we must have slightly more than 8 tiers, therefore we must have 9 tiers.

CDN Tier Sizing 1200

1048.576 1000

800

600 620 620 620 620 620 620 620 620 620

400 262.144 200

Aggregate Input Bandwidth (Tbps) Input Bandwidth Aggregate 65.536 0 0.016 0.064 0.256 1.024 4.096 16.384 1 2 3 4 5 6 7 8 9 CDN Tier Count

Server Input per Tier Last Tier Target

Figure 21. CDN Tier Sizing

The CDN discussed in this subchapter is theoretical and assumes a perfect distribution to the edge of the CDN. An actual system would be different in that the

CDN may be deeper in metropolitan areas to handle more end users. The CDN may also have fewer tiers than computed in rural areas.

5.4.1 Netflix Commentary

It may seem like 39,109 servers at the edge of the CDN is a high number of servers, but it is not impractical. Akamai has over 240,000 servers in its global CDN 69

(Akamai, Inc., 2018). The scale of the Akamai network can be reviewed by looking at the “Facts and Figure” Akamai web page. At the time of this writing, the details were as follows (Akamai, Inc., 2018):

Network Deployment: Akamai has deployed the most pervasive, highly- distributed content delivery network (CDN) with more than 240,000 servers in over 130 countries and within more than 1,700 networks around the world.

As stated earlier, Netflix is creating their own CDN called Open Connect and they are implementing part of it by giving servers to qualifying ISPs at no cost. The servers will be loaded with video content file updates during off-peak hours (Netflix, Inc., 2018).

I was able to glean the server type by reading some technical information from the

Netflix web pages that describe how to engage in this CDN program and set up a server.

The server type is a Supermicro SUPERSERVER 1027R-N3R, an inexpensive commodity x86 server (Netflix, Inc., 2018). The fact that Netflix is giving away the servers to ISPs indicates the high value to them of having the servers as close to the end user as possible. It also shows they are following the rule presented earlier from the

Leighton paper that locality is important and the CDN should be as close to the client device as possible. By getting their CDN into the ISP network, Netflix is following this rule.

5.5 Network Latency

Another aspect of the DASH system worth addressing is how much latency exists in the system from the point of the camera to the end user’s screen. Since the DASH- based television network is a type of store-and-forward system, it takes time to store a

70

segment and distribute that segment. For example, if the live broadcast segments are one second long, it takes at least one second to finish writing the initial file. Once the file is complete (either written to hard disk or RAM disk) it must be propagated throughout the content delivery network. The segment will propagate through several nodes in the

CDN.

There are numerous network factors involved in the propagation time. If we consider the CDN to be a series of network links and file servers, the factors are: the bandwidth of the network links between the file servers, any network congestion in the switches or routers between the file servers, the speed of the disk system on a server and the processing time of the CDN logic for determining how to propagate a given segment.

Finally, the DASH device will download the file and store it in its local memory, parse the MPEG headers and start playing it. Let us call the sum of these delays the DASH

Live Network Latency. The latency in milliseconds could be approximated by the DASH

Live Network Latency formula as described in

Equation 9:

71

𝐷퐴𝑆𝐻 𝐿푖𝑣푒 𝑁푒𝑡𝑤𝑜𝑟푘 𝐿푎𝑡푒𝑛푐푦 =

𝑆푒푔𝑚푒𝑛𝑡𝐷𝑢𝑟푎𝑡푖𝑜𝑛

+ ∑ 𝐶𝐷𝑁𝑆𝑡𝑜𝑟푒𝑇푖𝑚푒 푖 + 𝐷푒𝑣푖푐푒𝑆𝑡𝑜𝑟푒𝑇푖𝑚푒 푖=1

+ 𝑀푃𝐸𝐺𝑝푎𝑟𝑠푒𝑇푖𝑚푒 + FrameDecodeTime

Where:

N = the number of tiers in the CDN

i = index of a particular CDN tier

SegmentDuration = the millisecond duration of the original segment

CDNStoreTime = the number of milliseconds it takes the CDN to receive and store the segment.

DeviceStoreTime = the number of milliseconds it takes for the DASH client device to retrieve the segment from the network and store in RAM

MPEGparseTime = the number of microseconds or milliseconds it takes for the DASH client device to parse the MPEG file header and read a video frame

FrameDecodeTime = the number of microseconds or milliseconds needed for the video decoder to decode and render a video frame

Equation 9. DASH Live Network Latency Formula

At the moment the MPEG header parsing is complete and the first frame is fully pushed into the decoder, the user starts seeing the beginning of the original segment stored from camera data.

72

Chapter 6. Design and Implementation of Functioning Prototype

With the large-scale system design proposed in the preceding chapter in mind, this chapter will explain the portion of the MPEG-DASH television network I chose to prototype. I will explain the prototype system requirements, design, and implementation.

Since the entire DASH television architecture is large, there are parts I could not implement. I did not implement or use a Content Delivery Network, a complex entity on its own. I did not implement the DASH player since that is built into the DASH client device. I focused on the real-time server that produces DASH video segments.

The execution of the system will be explained in the chapter User’s Guide and

Human Interactions.

6.1 Requirements

My general requirement was to process 4K raw data from a 4K camera in real- time at 30 frames per second into two separate MPEG-DASH representations, i.e., two resolutions. This would allow me to prove the viability of distributing 4K video to two different device types at the same time. This also allows for one device to alternate between two resolutions using an adaptive bit rate algorithm. Providing multiple resolutions is a significant departure from the current cable television system as it broadcasts one resolution only and televisions must downscale internally if necessary.

Processing true 4K data, which amounts to 747 megabytes per second, also forced me to prove that a common x86 processor could handle the amount of data that 4K images create. Further requirements follow.

73

6.1.1 Two Resolutions: 3840x2160 (4K) and 854x480

4K was the high resolution and was mandatory to prove the thesis. For the low resolution, I chose 854x480 since it is a 16:9 aspect ratio image like 3840 x 2160. I chose the same aspect ratio since it is easier for the human eye to compare the screen image quality given the same aspect ratio. Having different aspect ratios would add an additional and unnecessary element to the comparison.

6.1.2 Create a Virtual Set-Top Box

I made it a requirement to create a virtual set-top box application because a goal of the project was to virtualize where possible and eliminate the RF tuner from the system design. It was not a requirement to create a complex set-top box application with a scrolling schedule.

6.2 Project Operating System and Software Development Tools

In this section, I will describe my operating system choice and development tools.

6.2.1 Operating System

The tv_channel computer’s host operating system is Ubuntu Linux. I began prototyping the system on RedHat Enterprise Linux (RHEL) 7.2 on a 2-core Xeon x86, but when I built a faster 12-core x86 computer with an Intel i7-8700K processor, I needed a Linux kernel more current than that in RHEL to get video driver support for the i7-

8700K CPU. Ubuntu version 17 uses kernel 4.13 which has the required driver support.

74

It is likely that Fedora, a bleeding edge version of RedHat would work also. It is unlikely that Centos would have the graphics support because Centos usually trails RedHat in functionality.

I chose Linux over Windows because the Linux operating system has provisions for running as a real-time OS (Love, 2010). In addition, the development tools for Linux including gcc, emacs, gdb, and valgrind are free.

6.3 Compiler and Tools: gcc, emacs, gdb and valgrind

I decided to write the code in C-like style but use the C++ gcc compiler. This enforced stricter compilation type checking. I also decided to use C++ instead of python because I needed execution to be as fast as possible to process 747 MB per second.

My editor was emacs. It easily handles indentation rules for the C and C++ languages and it can color code the XML text in .mpd DASH manifest files.

A symbolic debugger was mandatory to debug some complex problems with integrating with the FFMPEG library. The only choice here is gdb. I was able to build the libraries with gdb debug symbols, which allowed me to single step from my code into the ffmpeg library APIs.

Valgrind is a memory leak detection tool. When I first got the system execution loop running, the system would leak memory and cause Linux to fail after a few minutes.

Even after fixing obvious memory malloc and free issues, I was still leaking some memory. However, running the system in valgrind, it pointed out some subtle memory leaks. After making some deallocation fixes, the system no longer leaked memory.

75

6.4 FFMPEG Codec and Scaler Library

The prototype system requires a codec, specifically the H.264 codec in order to function with the Chromecast device. Writing a codec is a large task and was not a focus of the project so I used the public domain FFMPEG libraries to leverage a functioning codec (FFMPEG Organization, 2018). I downloaded and compiled the FFMPEG code so that I could use gdb to walk the code to understand its interfaces. H.265 (HEVC) has a better compression ratio than H.264, but Chromecast does not support it, therefore H.264 was the only practical codec choice.

To create the 854x480 images from 4K images, I used the FFMPEG scaler. This was a stage of processing prior to encoding with the H.264 encoder. The processing stages are described below in the design subchapter.

6.5 ffplay and mediainfo Utility Programs

ffplay is a very useful utility that is available when one builds the FFMPEG libraries. I used it for debugging purposes. Specifically, it has a mode where you can input a raw video data file, tell it the format of the data, the frames per second rate and it will render the raw data as a movie clip. I found this to be immensely useful as I was debugging my system.

A second utility I must give credit to is “mediainfo.” It is downloadable to Linux as a package. It simply parses a .mp4 video file and tells you many attributes of that video file including the encoder used, the frames per second, the average bit rate, etc.

The reason this was so useful was that I could check a video file my tv_channel program produced to a first order without actually playing the video through the system.

76

If the video file passed this first check, then it was worth playing. This saved time since it is an easier check that copying the file(s) to the web server, modifying the .mpd file and viewing them through Chromecast.

6.6 Segmenter Utility Program

At the outset of the development phase of this project, I started by writing a static

MPEG4 image file DASH utility I call “segmenter.” This utility let me segment a large file into smaller DASH segments. It parses the MPEG4 box structures within the file. It has a mode where it just dumps the box names it finds. This was particularly useful in the subsequent part of the project where I was creating live segment files and debugging them. Running “segmenter” on those files allowed me to validate their MPEG4 box content. The MPEG4 specifications I used for this were the MPEG Specification 14496-

12. Information Technology – Coding of Audio Visual Objects. Part 12: ISO Base

Media File Format (International Standards Organization, 2015) and MPEG

Specification 14496-14. Information Technology – Coding of Audio Visual Objects. Part

14: MP4 File Format (International Standards Organization, 2015)

6.7 Hardware Choices

I made some very specific hardware choices and I will explain them.

77

6.7.1 4K Camera

The 4K camera had to be compatible with Linux. At the time of the writing, there were very few 4K cameras at a reasonable price point to make them viable for use in the project. In Logitech documentation, I found the Linux operating did not technically support the Logitech Brio 4K camera. It is only supported by Logitech with the

Windows operating system (Logitech, 2018). However, I found a comment on a Reddit bulletin board by someone who stated he had successfully read video frames from the

Brio using a Linux driver. I purchased a Brio camera and wrote code to read raw 4K frames from it using both a Linux driver and an FFMPEG device API. From this low- level access, it worked well.

6.7.2 CPU

I began the project with a 2-core Intel Xeon processor. This worked fairly well in the prototype phase with a 640x480 camera but was far too slow when I tried using the

4K camera. The read rate from the camera was less than one frame per second, far short from the 30 frames per second I wanted to achieve to simulate a television system. I constructed a new computer using an Intel i7-8700K Coffee Lake 3.7 GHz CPU. This

CPU has 6 cores and 6 hyperthread cores. Ubuntu Linux is able to utilize the hyperthreads so this was a great choice albeit expensive at $300 for just the CPU. I used

16GB of low-latency DDR4-3200 RAM to complement the speed of the CPU.

78

6.7.3 4K Television

The 4K television I used was a Samsung UN43MU6290. The Chromecast device was connected to the television through an HDMI port.

6.8 Software Design

Since most “backend” server systems are run from within a shell, I did not want to introduce a GUI. All arguments must be passed on the command line. Therefore, I developed a command line executable that could be invoked from a script. This is a common approach for systems infrastructure software.

6.8.1 Logic Flow

The basic logic flow of the tv_channel program is as follows:

1. read arguments from the CLI passed through the shell

2. initialize a number of H.264 encoders (in my case two)

3. establish a timer to read the camera at a 30 frames-per-second rate

4. read a raw video data frame from the camera

5. downscale the image for the low-resolution encoding

6. pass the image to the high-resolution encoder and the downscaled image

to the low-resolution encoder

7. check the encoder to see if an encoded frame is available and if so, write it

to the appropriate output .mp4 file

79

8. Check if the output file has exceeded one second of video time and that

the current frame from the encoder is an “I-frame.” If so, write the tail to

the current video file, open a new video file, initialize it, and write the

current frame, an I-frame, to the new file.

9. If the user set a max time limit of execution and you have reached it, flush

the encoders, write the tails to the output files and free all memory and

objects. Display final statistics then quit.

10. If there is no time limit, run forever by returning to step 4.

6.8.2 Data Flow

The basic data flow within the tv_channel program is shown in the following diagram.

80

Initialize System

Read 4K Optionally H264 Write If Delete old frame downscale Encode frame to segment segment from frame Frame segment is 1 sec, files. camera file finalize file

Figure 22. Prototype tv_channel Data Flow

I chose to modularize the software into three main parts to facilitate future growth. The first module is one that takes the user input, initializes the required number of codec interface instances, and initializes statistics. The second module opens the camera and prepares the FFMPEG downscaler for the low resolution. The third module is the codec interface module. The codec interface module is essentially an object that wraps an H.264 codec instance. It stores all of the information associated with a particular DASH representation including the frame width and height, the status of the segment in terms of its time depth, and the FFMPEG contexts needed to communicate with the FFMPEG H.264 encoder. By creating this codec interface module, the system is able to handle N representations more easily. In my project prototype, N is 2. The limit of N is a function of the computing cycles needed to compress the video frames using the compression factor input by the user.

81

The other design motivation for creating the codec interface module was to prepare the system for a cloud type of deployment. In my prototype system, the camera and the encoder are on the same system, using the same memory space. In a cloud system, I would put the camera interface module and the N codec interface modules on different compute instances. The API boundary between the main module types is the transfer of a raw video frame, which may also be done over a high-speed network.

6.8.3 Instance Diagram

The instance diagram of the program appears as follows.

Initializer and Statistics Optional H264 Segment Downscaler Encoder File Set 1 Instance 1

Camera Interface Module

Optional H264 Segment Downscaler Encoder File Set Instance N N

Figure 23. Software Instance Diagram

82

Chapter 7. User’s Guide and Human Interactions

This chapter will describe how a user or operator uses the tv_channel program. The arguments to the command line interface will be described. Sample output will be shown. Expected statistics from execution will be shown.

7.1 Command Line Interface

The command line interface “help” appears as follows. brian@fast-intel-i7:~/thesis/segment_enc$ ./tv_channel tv_channel - HD Camera to DASH segment encoder

Usage: tv_channel

camera=/dev/video<0|1>

segment=<0|1>

seconds=S (0 is infinite)

h264_bitrate=B (for 4K encoding use 30000)

h264_bitrate2=B2 (for low res encoding use 5000)

debug=<0|1>

Copyright 2018 by Brian P. Bresnahan

Figure 24. tv_channel Usage

The “camera” argument allows the user to select a specific video device on Linux.

Usually this is “/dev/video0.” A second camera would appear as “/dev/video1.”

83

The “segment” argument controls whether or not tv_channel will write segments to disk.

The “seconds” argument allows the user to run tv_channel for a specific amount of time.

The “h264_bitrate” argument allows the user to specify a bit rate to be passed to the encoder instance for the high-resolution encoding.

The “h264_bitrate2” argument allows the user to specify a bit rate to be passed to the encoder instance for the low-resolution encoding.

The “debug” argument allows the user to enable verbose debug output or not.

7.2 Statistics and Monitoring

The tv_channel program stores statistics on the total number of frames encoded per resolution and the number of errors if any. tv_channel also emits the same frame statistics every 10 seconds while executing. An example of the final stats is: tv_channel statistics:

tv_channel encoded 9000 frames at resolution 3840 x 2160 at 30 fps with 0 errors.

tv_channel encoded 9000 frames at resolution 854 x 480 at 30 fps with 0 errors.

Figure 25. tv_channel Statistics

84

7.3 Camera Image Acquisition and Resolutions

tv_channel reads raw 4K images from the camera and encodes them with

H.264 in the following resolutions:

 3840 x 2160 (i.e., 4K)

 854 x 480

The logic for the resolutions was explained in the Requirements subchapter.

7.4 Web Server and CORS

The web server the project uses is “nginx” and is developed by the company

“nginx.” It supports the Cross Origin Resource Sharing (CORS) functionality required by the Chromecast. Note that I began using the widely used Apache web server, but was unable to configure it to support CORS. CORS is a method by which the Chromecast assures itself that the video segments it is being told to play are associated with an application that told it to play them. The set-top box application authenticates to the web server and then the set-top box commands the Chromecast to download a specific manifest file. The Chromecast then requests the manifest file and subsequent video files from the web server and Chromecast expects to see a “cross origin resource sharing”

HTTP header. This must be configured in the web server as part of the DASH configuration. If the Chromecast does not see the header, it will not read the manifest nor play the video files.

85

7.4.1 Directory Structure and Files

The following table shows the relevant directories in the tv_channel project.

Directory Usage

../video_samples/live1/ .mp4 files created by tv_channel

program

.mpd manifest file associated with .mp4

files

../cast1/CastVideos-chrome/ directory containing the CastVideos.js set-

top box application

/var/log/nginx directory containing nginx error.log and

access.log which records all files served

by the web server

Table 6. Web Server Directories and Files

7.5 Virtual Set-Top Box JavaScript Application

The virtual set-top box application can be accessed by any web browser. The intent is that a home user would use his cell phone as his . He would open the web page to the set-top box application and see the following screen.

86

Figure 26. Virtual Set-Top Box GUI

The interface is rudimentary, but it achieves the main purpose of selecting one of a few channels. The user selects which channel he wishes to watch and the set-top box application sends commands to the Chromecast to load a particular .mpd file stored on the web server, process the .mpd then start downloading video segments from the web server.

The small box to the right of the speaker symbol is the Chromecast cast enabled button. Google provides boilerplate JavaScript code for cast enabling web applications.

87

By clicking on this cast enabled icon, a Device Selection dialog box pops up, letting the user select a particular Chromecast device on the network. I have two Chromecast devices on my test network. The selection box appears as:

Figure 27. Chromecast Device Selection

7.6 Sample Execution

A sample execution for 20 seconds follows. Note that the H.264 encoder is verbose with its output so I have highlighted the input and output directly associated with the tv_channel program.

88

brian@fast-intel-i7:~/thesis/video_samples/live1$

./start_channel.sh

Live DASH Segmenter. camera=/dev/video0, segment=1, seconds=300, bitrate1=30000, bitrate2=5000

Process priority is 0. api_enc_ctx_init: w=3840, h=2160, fps=30 file=3840x2160.mp4 seg_ctrl=1 bit_rate=30000

Output #0, mp4, to '4K.mp4':

Stream #0:0: Unknown: none (libx264)

[libx264 @ 0x55ce95a54800] frame MB size (240x135) > level limit

(8192)

[libx264 @ 0x55ce95a54800] DPB size (4 frames, 129600 mbs) > level limit (1 frames, 32768 mbs)

[libx264 @ 0x55ce95a54800] MB rate (972000) > level limit

(245760)

[libx264 @ 0x55ce95a54800] using cpu capabilities: MMX2 SSE2Fast

SSSE3 SSE4.2 AVX FMA3 AVX2 LZCNT BMI2

[libx264 @ 0x55ce95a54800] profile High, level 4.1

[libx264 @ 0x55ce95a54800] 264 - core 148 r2795 aaa9aa8 -

H.264/MPEG-4 AVC codec - Copyleft 2003-2017 - http://www.videolan.org/x264.html - options: cabac=1 ref=1 deblock=1:0:0 analyse=0x3:0x3 me=dia subme=1 psy=1 psy_rd=1.00:0.00 mixed_ref=0 me_range=16 chroma_me=1 trellis=0

8x8dct=1 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=0 threads=18 lookahead_threads=3 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=3

89

b_pyramid=2 b_adapt=1 b_bias=0 direct=1 weightb=1 open_gop=0 weightp=1 keyint=25 keyint_min=2 scenecut=40 intra_refresh=0 rc=abr mbtree=0 bitrate=30 ratetol=1.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 ip_ratio=1.40 pb_ratio=1.30 aq=1:1.00

Output #0, mp4, to '4K.mp4':

Stream #0:0: Video: h264 (libx264), yuv420p, 3840x2160, q=-1-

-1, 30 kb/s, 30 tbn, 30 tbc api_enc_ctx_init: w=854, h=480, fps=30 file=854x480.mp4 seg_ctrl=1 bit_rate=5000

Output #0, mp4, to '854.mp4':

Stream #0:0: Unknown: none (libx264)

[libx264 @ 0x55ce96171ec0] using cpu capabilities: MMX2 SSE2Fast

SSSE3 SSE4.2 AVX FMA3 AVX2 LZCNT BMI2

[libx264 @ 0x55ce96171ec0] profile High, level 4.1

[libx264 @ 0x55ce96171ec0] 264 - core 148 r2795 aaa9aa8 -

H.264/MPEG-4 AVC codec - Copyleft 2003-2017 - http://www.videolan.org/x264.html - options: cabac=1 ref=1 deblock=1:0:0 analyse=0x3:0x3 me=dia subme=1 psy=1 psy_rd=1.00:0.00 mixed_ref=0 me_range=16 chroma_me=1 trellis=0

8x8dct=1 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=0 threads=15 lookahead_threads=2 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1 b_bias=0 direct=1 weightb=1 open_gop=0 weightp=1 keyint=25 keyint_min=2 scenecut=40 intra_refresh=0

90

rc=abr mbtree=0 bitrate=5 ratetol=1.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 ip_ratio=1.40 pb_ratio=1.30 aq=1:1.00

Output #0, mp4, to '854.mp4':

Stream #0:0: Video: h264 (libx264), yuv420p, 854x480, q=-1--

1, 5 kb/s, 30 tbn, 30 tbc

Succeeded encoding frame: 23 size: 1639 duration:0 pts:512 dts:-512

Succeeded encoding frame: 26 size: 2244 duration:0 pts:512 dts:-512 pts: 153600 for frame: 300 pts: 153600 for frame: 300

Succeeded encoding frame: 322 size: 2342 duration:0 pts:152576 dts:152576

Succeeded encoding frame: 325 size:46271 duration:0 pts:153600 dts:152576 pts: 307200 for frame: 600 pts: 307200 for frame: 600

Succeeded encoding frame: 8722 size: 3955 duration:0 pts:4454400 dts:4453376

Succeeded encoding frame: 8725 size:16290 duration:0 pts:4454400 dts:4453376 pts: 4608000 for frame: 9000 pts: 4608000 for frame: 9000

Encoder flushed 25 frames. Encoded 9000 total frames.

91

[libx264 @ 0x55ce95a54800] frame I:394 Avg QP:22.34 size:146098

[libx264 @ 0x55ce95a54800] frame P:3287 Avg QP:24.24 size:

90080

[libx264 @ 0x55ce95a54800] frame B:5319 Avg QP:25.55 size:

41743

[libx264 @ 0x55ce95a54800] consecutive B-frames: 19.3% 3.8%

5.4% 71.5%

[libx264 @ 0x55ce95a54800] mb I I16..4: 31.0% 65.5% 3.5%

[libx264 @ 0x55ce95a54800] mb P I16..4: 23.5% 26.1% 0.2%

P16..4: 33.3% 0.0% 0.0% 0.0% 0.0% skip:16.8%

[libx264 @ 0x55ce95a54800] mb B I16..4: 6.7% 4.4% 0.0%

B16..8: 24.7% 0.0% 0.0% direct:13.4% skip:50.8% L0:45.5%

L1:49.9% BI: 4.5%

[libx264 @ 0x55ce95a54800] final ratefactor: 39.67

[libx264 @ 0x55ce95a54800] transform intra:51.4% inter:60.5%

[libx264 @ 0x55ce95a54800] coded y,uvDC,uvAC intra: 14.4% 47.5%

13.2% inter: 3.1% 32.9% 1.0%

[libx264 @ 0x55ce95a54800] i16 v,h,dc,p: 44% 26% 16% 14%

[libx264 @ 0x55ce95a54800] i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 17% 17%

27% 8% 6% 6% 6% 5% 7%

[libx264 @ 0x55ce95a54800] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 26% 22%

11% 11% 6% 6% 6% 6% 6%

[libx264 @ 0x55ce95a54800] i8c dc,h,v,p: 43% 23% 26% 8%

[libx264 @ 0x55ce95a54800] Weighted P-Frames: Y:15.1% UV:8.2%

[libx264 @ 0x55ce95a54800] kb/s:29.98

92

Encoder flushed 22 frames. Encoded 9000 total frames.

[libx264 @ 0x55ce96171ec0] frame I:372 Avg QP:18.54 size:

36268

[libx264 @ 0x55ce96171ec0] frame P:2556 Avg QP:20.05 size:

17660

[libx264 @ 0x55ce96171ec0] frame B:6072 Avg QP:21.70 size:

6149

[libx264 @ 0x55ce96171ec0] consecutive B-frames: 7.3% 6.1%

6.5% 80.1%

[libx264 @ 0x55ce96171ec0] mb I I16..4: 11.5% 44.4% 44.1%

[libx264 @ 0x55ce96171ec0] mb P I16..4: 5.5% 14.5% 5.2%

P16..4: 69.2% 0.0% 0.0% 0.0% 0.0% skip: 5.6%

[libx264 @ 0x55ce96171ec0] mb B I16..4: 1.8% 2.3% 0.3%

B16..8: 38.8% 0.0% 0.0% direct:24.3% skip:32.5% L0:35.3%

L1:40.8% BI:23.8%

[libx264 @ 0x55ce96171ec0] final ratefactor: 31.54

[libx264 @ 0x55ce96171ec0] 8x8 transform intra:52.7% inter:50.3%

[libx264 @ 0x55ce96171ec0] coded y,uvDC,uvAC intra: 66.8% 77.5%

52.6% inter: 24.3% 41.9% 8.6%

[libx264 @ 0x55ce96171ec0] i16 v,h,dc,p: 38% 22% 14% 26%

[libx264 @ 0x55ce96171ec0] i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 19% 20%

18% 7% 6% 7% 7% 7% 10%

[libx264 @ 0x55ce96171ec0] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 22% 25%

11% 8% 6% 6% 7% 6% 9%

[libx264 @ 0x55ce96171ec0] i8c dc,h,v,p: 39% 25% 23% 13%

[libx264 @ 0x55ce96171ec0] Weighted P-Frames: Y:22.9% UV:11.5%

93

[libx264 @ 0x55ce96171ec0] kb/s:5.00 tv_channel statistics:

encoded 9000 frames at resolution 3840 x 2160 at 30 fps with 0 errors.

encoded 9000 frames at resolution 854 x 480 at 30 fps with 0 errors. brian@fast-intel-i7:~/thesis/video_samples/live1$

Figure 28. Sample Execution Output

94

Chapter 8. Verifications and Results

In this chapter, I will explain the test network, the methods of test and finally the results of the prototype system.

The test network appears in Figure 29.

Internet/ Youtube

Laptop 1080p 4K Brio Tv_chan Nginx Chrome- 4K Human with Display Camera nel web cast 4K TV Subject HDMI project server output exe

Figure 29. Test Network

Shown on the left, using a computer playing HD video from YouTube on a 1080p display, I created a method for consistently getting source video that could be processed through the system repeatedly.

8.1 Evaluation Methodology

Testing of tv_channel is done in multiple ways. The methods include: validating the XML of the manifest file, broadcasting a running clock to check for

95

latency, visual macroblocking check, observing CPU utilization using Linux System

Monitor, observing the files the Chromecast was playing at the Chromecast console, using a second client type in the DASH Forum test client and finally human visual acceptance. These methods are explained in more detail in the following subchapters.

8.1.1 XML Verification

I verified the manifest.mpd XML format using the online tool http://www.xmlvalidation.com/. This is a basic check of XML syntax and formatting.

This test does not understand the MPEG-DASH specific keywords or content.

8.1.2 Broadcast Clock

At an early stage, I noticed periodic latency of about half a second every 5 or 7 seconds. I used a second Linux machine to show a real-time clock with a resolution of

10ths of seconds. As the clock image was displayed on the 4K TV, it would “glitch.” I used the Chromecast tools to monitor the network download time of the video segments and realized it periodically took Chromecast over 1 second to download a 1-second video. Note the yellow time bars in the display below exceed 1 second, the length of the segments. Obviously, this was a problem.

96

Figure 30. File Download Speed Analysis

The solution was to relax the H.264 encoding parameters to produce smaller 4K encoding files that could be downloaded in time over the wireless test network. This, in theory, produced images of slightly lower quality, but I was unable to see the difference.

This was a reasonable solution for my small-scale system with two resolutions, but a more practical solution for a real-world scenario would be to have more than one encoding at the 4K resolution.

97

8.1.3 Macroblocking Check

In this test, I visually checked the resulting video for excessive macroblocking.

Macroblocks appear as small squares on the video display and they are a normal result of compression. As an example, Figure 31 shows a clear image of a white flower (Maylett,

2005).

Figure 31. Uncompressed Image

After compression, the same image shows highly visible squares that are a result of creating macroblocks from squares of pixels. For example, a 16x16 block of pixels may be compressed to one macroblock. Fewer bytes are needed to store the macroblock compared to the original pixels it represents, thus an image becomes compressed. Figure

98

32 demonstrates a high level of compression to accentuate the concept. The macroblocking is quite visible.

Figure 32. Compressed Image with Macroblocking

The more compression, the more noticeable the macroblocking is. The goal is to see as little macroblocking as possible, especially on scenes that have a great deal of motion.

8.1.4 4K Chromecast

This is the most important test since Chromecast is a real-world device. In conjunction, I also used the display info of the 4K Samsung TV to confirm it was in 4K

99

resolution. I also used the command line shell of the Chromecast to monitor what .mp4 files it was downloading to confirm the resolution via the filenames. By putting the resolution in the filenames, this check was easy.

8.1.5 The DASH Industry Forum reference Client

I used the DASH Industry Forum reference client at http://reference.dashif.org/dash.js/v2.6.6/samples/dash-if-reference-player/index.html.

This is an extremely useful engineering version of a DASH client that provides real-time statistics of the video being played. See the below. This client includes essential monitoring statistics for the video download rate in Mbps, the amount of video in its memory buffer in MB, and which DASH Representation is being played. In my case, I had 2 live encoded representations, one at 4K resolution requiring about 20 to 30

Mbps and one at a low 854x480 resolution requiring about 5 Mbps. By using the

Network Conditions feature in chrome, I could force the player to switch from the first resolution to the second, proving the DASH functionality was working with my live streams. The Network Conditions feature usage is described in detail below.

8.1.6 Network Conditions Feature in Chrome

The Chrome browser set in Developer Mode allows you to set “Network

Conditions” which includes a “Network Throttling” feature. The Network Conditions box is circled below in red.

100

Figure 33. Forcing Network Congestion Conditions

I added a number of throttle points at 1 Mbps, 1.5 Mbps, 1.7 Mbps, 2 Mbps, 3

Mbps, and 7 Mbps as seen in the following Chrome dialog box:

101

Figure 34. Network Throttle Points

By using this feature and playing the tv_channel produced video streams in the DASH reference client, I was able to constrain the network which in turn caused the

DASH client to switch from the high resolution and high bandwidth encoding to the low resolution and low bandwidth encoding. This validated the DASH protocol was working.

8.1.7 CPU Utilization

In this test, I checked the CPU utilization of the Linux host machine. CPU utilization at 95% was acceptable, but I found that Linux running at 100% would cause

Linux to lock up. Note that the Linux top utility was not very useful since the H.264 encoder is multi-threaded and top combines all threads utilizations together creating a non-useful CPU utilization that is greater than 100%. Instead, I used the Linux System

Monitor utility’s Resources display to graphically monitor CPU utilization.

102

Without the tv_channel program running, the 12 system cores ran with utilization between 10% and 20% as seen in Figure 35.

Figure 35. CPU Utilization

When tv_channel was running, the CPU utilization and memory usage climbed as shown below. CPU usage varied from about 15% to 50%.

103

Figure 36. CPU Utilization in Running State

If I increased the bit rate value passed to the H.264 encoder, the CPU utilization increased. I had to determine a bit rate that produced a good quality of visual experience and did not run the CPUs at a high utilization. I concluded a bit rate of 30,000 for the 4K encoding.

One other conclusion regarding CPU utilization was that I could encode 2 live streams at 30 Mbps and 5 Mbps keeping the CPU usage below 50%. These encodings had good visual quality. This implies that the 12 core CPU had more headroom and could possibly encode 2 more resolutions for a total of 4 resolutions. This, in turn, would

104

benefit the viewing experience since the DASH client would be able to pick from 4 resolutions.

8.2 DASH Client Verification

The dash_out.mpd file used in the verification test contains 2 representations.

Representation #1 for the low resolution 854x480 video stream and Representation #2 for the high resolution 4K 3840x2160 video stream.

105

Figure 37. Test Manifest .mpd File

The following screenshot shows the beginning of execution of a verification test.

106

Figure 38. Initial View of the Channel

As seen on the right side of Figure 38 and the magnified view of Figure 39 below, playback begins by downloading the dash_out.mpd descriptor file, followed by downloading video segments at the lower resolution of 854x480, which is representation

1 of 2 in the mpd file.

107

Figure 39. Initial Low-Resolution Image Files

After 20 seconds of playback, the player detects that it has enough network headroom to switch to the higher resolution representation #2 and we see the client switch from 854x480_x.mp4 files to 3840x2160_x.mp4 files in Figure 40. Since each file is 1 second worth of video, we know that the client switched at 20 seconds because the last 854x480 file played was 854x480_20.mp4. Also, note that when the client switches resolutions, it loads the small 3840x2160.mp4 initialization file first, as defined in the dash_out.mpd file. The initialization file does not contain video data, it contains information regarding how the video stream was encoded. Note the initialization file is only 57.2 KB whereas the subsequent data file is 1.5 MB.

108

Figure 40. Transitioning to High Resolution

We also note some other playback characteristics in the following time diagram at the bottom of the client screen. Time is in seconds on the horizontal axis.

Figure 41. Player Characteristics by Time

109

The image is not very clear, but from right to left, the vertical axes read “Video

Download Rate (Mbps),” “Video Current Quality,” “Video Bitrate,” and “Video Buffer

Level.” Note that the “Video Buffer Level,” the blue line, quickly rises to 30 seconds of playback time. The client is “greedy” and reads ahead 30 seconds to fill its playback buffer. Note that the Chromecast device will read ahead only 10 seconds. These read- ahead times are not configurable.

The “Video Bit Rate” in Kbps, the orange line at the top, rises to about 1.8 Mbps.

It is the playback rate.

The “Video Current Quality” in green indicates which representation number is being shown. It is 1 which is the 854x480 representation in the mpd file.

Playback is stable at the higher resolution. We force the dynamic adaptation

(DASH) capabilities of the system by changing the “Network throttling” setting from no limit to “brian DASH 2 Mbps.”

110

Figure 42. Initiating Network Congestion

Note that the video file names being downloaded transition from 3840x2160 files to 854x480 files. Video playback continues seamlessly. This transition is possible because I coded the tv_channel program to create segments that begin on I-frame

(also known as key frame) boundaries. This allows the player to switch to a different representation and immediately play a full frame of video. When developing the system,

I initially was not creating segments on an I-frame boundary and the player was not able to transition between representations.

As shown in the following screenshot, playback continues with the 854x480 video files. The file list is on the right side of the player.

111

Figure 43. Transitioning Back to Low Resolution

In a subsequent test, I remove network throttling and start playback. I then drop the network throttling to 3 Mbps for about 10 seconds then increase it back to 24 Mbps.

Examining the player timeline below, we note that the green line is “Video Current

Quality” and is the representation being played with a 0-base.

112

Figure 44. Switching Representations

We see at time 00:30 (first red circle) the representation drops from 1 to 0. This is where the player went from 4K to 854x480. At approximately 00:40 I set the Network

Throttling back to 24 Mbps and by 00:45 the client has downloaded enough 4K files to switch the representation back to 1 (second red circle). The blue line is the Video Buffer

Level in seconds and from 00:45 on, the client starts building up from approximately 8 seconds of playback time to 30 seconds by the end of the snapshot.

113

8.3 Chromecast Verification

For the Chromecast verification, I open the Chromecast console in debug mode and monitor the video files being downloaded and played. First, I trigger the playback using the virtual set-top box application. Channel 1, “Ch 1,” is the Olympics channel.

Figure 45. Select Channel on Set-Top Box Application

Next, we view the playback on the 4K TV (Figure 46 is a picture of the TV):

114

Figure 46. Visual Verification of Chromecast

Next, I observe the files being downloaded by the Chromecast. It begins with the

854x480 representation as shown in Figure 47.

115

Figure 47. Chromecast Starts at Low Resolution

Eventually, the Chromecast determines it can switch to the higher resolution as seen in Figure 48, where it transitions from video file 854x480_34.mp4 to

3840x2160_35.mp4 (red oval). Thus, we see the Chromecast adapting to the network bandwidth available.

116

Figure 48. Chromecast Transitions to High Resolution

117

Chapter 9. Summary, Contributions and Future Work

In this chapter, I will summarize results, draw conclusions, and highlight potential future work.

9.1 Summary

In this thesis, I set out to prove the practicality, feasibility, and scalability of an alternative television system operating at 4K resolution, using the MPEG-DASH protocol and internet technologies.

In terms of practicality, I provided a number of reasons why it is practical. In the theory portion of the thesis, I provided arguments for the benefits of a non-proprietary protocol in that it would be adopted by more device makers and thus let video reach more device types. The consumer would not be constrained to sit in front of a TV or set-top box with an RF tuner needed to decode and display the channels. Any device running the packet-based DASH client would suffice.

I provided arguments describing how MPEG-DASH intelligently and automatically determines the best possible resolution the user can view over a given network. I presented a few types of algorithms used for playback logic and identified active research on these algorithms. Moreover, by eliminating RF tuners from the network and streaming over the internet, the cost to the consumer would be reduced from the hardware cost perspective. However, the overall cost to the consumer is still dependent on the cost of internet service.

118

For the above reasons, I conclude that the proposed MPEG-DASH television system is practical.

In terms of feasibility, the results of the prototype were positive as explained in the Verifications and Results chapter. For the implementation portion of the thesis, I successfully used a common x86 computer running Linux, a 4K camera, a C compiler, some video compression libraries and a web server to build the prototype tv_channel

Linux application. I have proven a commodity x86 computer can create a live 4K television channel that can be distributed at multiple resolutions. To scale the system to a live operational system with many channels would be a significant amount of work, but most systems begin with a prototype as was produced in this thesis. New and greater challenges would need to be met in multiple areas of the system as will be described in the Future Work section below. With system and network scaling work, it should be possible to construct an alternative television system to the existing cable television system using the MPEG-DASH protocol and 4K resolution imaging.

I found that it is difficult to work with the Chromecast device. It was difficult to register it for use, difficult to get to the console port for debug purposes and difficult to get significant information on errors it was having. Future work includes using the tv_channel system with a Roku box or Amazon Firestick to see if the development experience would be better. In spite of these difficulties, I have proven that one can run a continuous TV channel from a single x86 computer. In fact, it is likely you could run two channels, each with two resolutions, on a six-core six-hyperthread x86 computer.

For the above reasons, I conclude that the proposed MPEG-DASH television system is feasible.

119

In terms of scalability, I looked at the proposed television system from a few perspectives. I examined current internet access rates in relation to the bandwidth required to support 4K video streaming. The conclusion was not all internet connection types or service offerings are sufficient to support 4K streaming at this time. However, test results and studies show internet access bandwidth is increasing over time.

I provided a basic mathematical model for scaling the distribution of video for many viewable channels across the internet to millions of consumers using a CDN. I purposely used a high channel count of five hundred to validate the size of the CDN required and that it is still smaller than the current size of the Akamai network. In addition, I provided background information on Netflix, how they use CDNs and their current plan to create their own CDN. It took Netflix years but they scaled their system to the national level and now they are expanding to the global level. We can extrapolate from the functioning system that a set of channels could therefore also be scaled up in similar ways using a CDN. This would be an essential part of the proposed television system.

For the above reasons, I conclude that the proposed MPEG-DASH television system is scalable.

9.2 Contributions

I believe I have made several contributions to the field of 4K video research for .

120

First, I provided logic for building a new television system and enumerated the benefits. I provided a basic mathematical model of the system that perhaps others could leverage, modify, and enhance. It should also be possible to simulate the CDN.

Second, I have proven that you can take a basic x86 computer with six cores and six hyperthreads and have it run one or more TV channels. Since x86 computers have become commodity devices, excluding the CDN, the cost of creating say a hundred channel system is relatively low. I did not need a dedicated or specialized piece of hardware to perform the video encoding and keep up with the real-time 4K video rate. I also leveraged a web server running on an x86 machine. The web server was able to keep up with the real-time 4K video rate also. This further validates the DASH concept of leveraging HTTP as the video transport. The contribution is that I have proven the barrier of entry to prototype and explore this type of television system is fairly low. The hardware is relatively inexpensive and the software tools are in the public domain.

Interested researchers could easily engage in this type of project.

Third, there is a contribution in the framework of the tv_channel application software design. I split the video data processing into two main modules, the camera module and the video encoder module, on purpose. In theory, the raw video data could be upscaled or downscaled in the camera module and then sent over a network to a set of cloud-based video encoders. A video encoder could run as a container under the

Kubernetes (Kubernetes, Inc., 2018) container management system. In this manner, as more video resolutions are required, the video feed from the camera module could be sent to more than one instance of the video encoder module, possibly running on more than

121

one server in a cloud environment. Since a television system typically has hundreds of channels, scalability would be critical to the proposed MPEG-DASH based system.

9.3 Future Improvements and Next Steps to the tv_channel Program

The current implementation could be improved in numerous ways.

First, the server should be connected to the internet instead of being hosted on a private wireless network. Doing so would add realistic latency to the system. Except for the camera portion, it should be possible to host the server on

(AWS) (Amazon, Inc., 2018). AWS Lightsail could be used to host a web page that stores and serves the Virtual Set-top Box. AWS Elastic Compute (EC) could be used to run the tv_channel server application. AWS Simple Storage Service (S3) could be used to store video files. After basic video streaming is operational, AWS Cloudfront could be used to prototype the CDN portion of the project.

The implementation could be made more flexible in the number of resolutions it supports by making it configurable instead of being fixed at two. As previously explained, there are two main modules, the camera module, and the video encoder module. The video encoder module is already modularized and can support multiple instances. However, the camera module is hard-coded to two resolutions. In the video encoder modules, I used a context structure that made it very easy to move from one resolution to two. The same approach could be used in the camera module.

The implementation should tally more metrics and improve upon the basic 10- second moving average calculation I periodically show. An exponential decay to the moving average should be added so the average is weighted to the more current time

122

samples. The reason the system should have this is to see if it is keeping up with live encoding target frame per second rate.

The implementation should automatically create the .mpd file based on the user input. The manual creation of the .mpd file for two resolutions is manageable, but would be cumbersome with more resolutions. In a live production system, automatic creation of the .mpd file would be mandatory.

The set-top box application is very crude and does not allow for an on-screen display of the channel description. You only see the description after you select the channel. The application should also show a horizontal timeline of future programming.

The application should help the user pick programs to suit his or her taste. It should remember the user and make intelligent suggestions for programming.

The tv_channel application does not take advantage of the H.264 hardware encoder in the Intel i7 CPU. A production system would take advantage of this hardware encoder. Intel has been adding video processing functionality to its x86 dies since 2008.

A H.264 decoder was added to the Clarkdale microarchitecture in 2010 and an encoder was added to the Sandy Bridge microarchitecture in 2011. The marketing name for their video processing technology is Intel Quick Sync Video. The most current CPUs with

Quick Sync also support the HEVC/H.265, VP8, and VP9 compression standards

(Wikipedia, 2018). ffmpeg drivers exist to work with the Quick Sync video functionality as described in the Intel white paper Intel QuickSync Video and FFmpeg (Intel, Inc.,

2016).

The i7-8700K processor I used for this thesis includes the 630 Ultra High

Definition (i.e., 4K) graphics coprocessor. It should be capable of encoding two 4K

123

resolution video streams simultaneously. Since I created a good framework for my software implementation, a next logical step would be to add 4K video processing in hardware. Intel sells and supports a Studio product that appears to have code samples for using the hardware (Intel, Inc., 2018).

9.4 Future Work

Future work on this topic is varied. First, the system could be written with a more modern network-enabled language like Go (Kernighan & Donovan, 2016), especially if it is modularized for the cloud as presented in the Contributions section.

Second, the DASH system can use a few segment file naming conventions, per the DASH specification. I used a method where the first file always starts at an index of

“1.” DASH supports another time base where the video file names are encoded with an integer representing the number of seconds since a specific UTC time. This would increase the ability to debug the system, especially across time zones.

Third, the system can only scale with a Content Delivery Network underneath it.

I did not explore some aspects of the CDN in detail. Those are: (1) the actual latency of distributing a video segment across the network, (2) the duration the segments must be stored, (3) the cost of the CDN infrastructure in terms of viability and (4) error handling.

Fourth, an exciting development to follow is that the ATSC organization has adopted the MPEG-DASH protocol for use in hybrid set-top boxes. This decision is associated with the ATSC 3.0 specification for next generation over-the-air high definition broadcasts. The proposal will allow television service providers to broadcast

124

the same media files over-the-air and over the internet. That is an interesting area of convergence and is further proof that the concept presented in the thesis is valid. The

ATSC taking this step and moving the MPEG-DASH protocol to over-the-air broadcast aligns both the over-the-air and the internet delivery methods (ATSC, 2018). Examining the specification Guidelines for Implementation: DASH-IF Interoperability Point for

ATSC 3.0 (DASH Industry Forum, Inc., 2016) I found that a hybrid set-top box can receive media via its over-the-air ATSC 3.0 interface or over an internet connection. It is conceivable the over-the-air media could be recorded in the set-top box to support time- shifted viewing. Further research could be a deeper inspection of the practicality of this specification.

Fifth, a production system should have a subsystem dedicated to warning and failure alarms for system operators. For example, if a live broadcast was underway and the actual frame rate dropped below the target frame rate, an alarm could be sent. If the network connection to the CDN provider failed, that would be another important alarm.

Sixth, another logical area of work would be designing and implementing a prototype for an 8K resolution television system. Consumers have barely begun to adopt

4K and now 8K video media is becoming available on YouTube. The network throughput requirements for 8K are four times that of 4K. The basic mathematical model in the CDN chapter could be scaled up to 8K rates. H.264 supports 8K resolution, but more practical codecs with better compression ratios would be H.265 and VP9. I used a resolution downscaler API in my implementation. In theory, an upscaler API could be used to upscale 4K video to 8K and then process it through tv_channel like 4K video.

125

The problem proving the full functionality is that the Chromecast device does not support

8K, so a different device would have to be used to render the video.

Lastly, since I was able to read raw video frames from a camera and “inject” them into a TV channel, I am inspired to create a synthetic news anchor and a synthetic weatherman. I can imagine a basic “talking head” on a channel verbalizing the latest news or weather. The news could come from a Reuters or Bloomberg news feed and be updated every thirty minutes. The weather could come from an NOAA feed and a periodic five-minute weather report could be synthesized and broadcast on the DASH network. Since DASH reaches cell phone and computer screens, end users could get a weather update or news update on their cell phones with the look and feel of a real news and weather desk.

The research opportunities on a high-resolution MPEG-DASH based television system using internet technologies seem broad, challenging, and exciting!

126

Appendix 1. Legal issues with Rebroadcasting

While considering what to do with the technology I prototyped, I thought of using the same method I used with the camera to instead capture video frames from an antenna and tuner and then process them into DASH. This could be the basis of a consumer service that lets people get minimal television service by having the service company capture station broadcasts off the air and deliver them over the internet into their homes. The protocol used could be DASH and everything discussed about DASH in the thesis would apply to this network topology. The benefit to the consumer is a lower cost service as compared to a traditional cable TV subscription or FIOS subscription.

After researching this approach, I found that a company named tried this very same approach and was brought to court by cable TV companies. The courts ruled against Aereo, ruling that they were effectively a broadcast company and did not have the rights to broadcast the content they captured from over-the-air. Even though each end user had his own antenna in an Aereo warehouse, that fact did not matter to the court.

Aereo eventually had to file for bankruptcy. I was intrigued by how this application of television re-broadcasting was not legally acceptable.

127

Bibliography and References

Akamai, Inc. (2017). Akamai Documentation. Retrieved from Learn Akamai :

https://learn.akamai.com/en-us/products/media/adaptive_media_delivery.html

Akamai, Inc. (2017, August). AKAMAI MEDIA DELIVERY SOLUTIONS: PRODUCT

BRIEF . Retrieved from Akamai:

https://www.akamai.com/us/en/multimedia/documents/product-brief/adaptive-

media-delivery-product-brief.pdf

Akamai, Inc. (2017, May). Akamai’s State of the Internet Q1 2017 Volume 10, Number

1. Boston, MA, USA: Akamai.

Akamai, Inc. (2018, March 22). About Akamai, Facts and Figures. Retrieved from

Akamai : https://www.akamai.com/us/en/about/facts-figures.jsp

Amazon, Inc. (2018, April 4). Amazon Web Services. Retrieved from Amazon Web

Services: https://aws.amazon.com/

Apple Inc. (2018, March 1). HTTP Live Streaming Developer Page. Retrieved from

developer.apple.com: https://developer.apple.com/streaming/

Apple Inc. (2018, March). iPad Specifications. Retrieved from Apple:

https://www.apple.com/ipad-9.7/specs/

ATSC. (2018). Newsletter. Retrieved from ATSC:

https://www.atsc.org/newsletter/dashing-finish-line-transport-layer-key-

broadband-broadcast-convergence/

128

Bitmovin, Inc. (2018, March 22). What is DRM and How Does it Work? Retrieved from

Bitmovin.com: https://bitmovin.com/what-is-drm/

Buckley, S. (2018, March). Google Fiber, AT&T, CenturyLink drive the 1 Gbps game.

Retrieved from Fierce Telecom: https://www.fiercetelecom.com/special-

report/google-fiber-at-t-centurylink-drive-1-gbps-game

Chicago Tribune. (2012, June 6). Reuters. Retrieved from Reuters:

http://articles.chicagotribune.com/2012-06-06/business/sns-rt-us-

limelightbre85518w-20120606_1_netflix-limelight-networks-cloud-storage

Comcast, Inc. (2018, March 22). Xfinity Gigspeed Internet. Retrieved from Xfinity:

https://www.xfinity.com/gig

DASH Industry Forum. (n.d.). Retrieved from http://dashif.org/members/

DASH Industry Forum. (2017, September 7). Guidelines for Implementation: DASH-IF

Interoperability Points.

DASH Industry Forum. (2018, March 22). Protection. Retrieved from DASH Industry

Forum: www.dashif.org/identifiers/protection

DASH Industry Forum, Inc. (2016, January 31). Guidelines for Implementation: DASH-

IF Interoperability Point for ATSC 3.0. Retrieved from DASH Industry Forum:

http://dashif.org/wp-content/uploads/2017/02/DASH-IF-IOP-for-ATSC3-0-

v1.0.pdf

129

Encyclopedia Britannica. (2018, March 22). Digital Rights Management. Retrieved from

Encyclopedia Britannica: https://www.britannica.com/topic/digital-rights-

management

FFMPEG Organization. (2018, February 1). FFMPEG Documentation. Retrieved from

FFMPEG: https://ffmpeg.org/doxygen/3.4/files.html

Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L., Leach, P., & Berners-Lee, T.

(1999, June 1). IETF RFC 2616. Retrieved from IETF RFC Documents:

https://tools.ietf.org/html/rfc2616

Fink, D. G. (1976, September). Perspectives on Television: The Role Played by the Two

NTSC’s in Preparing Television Service for the American Public. Proceedings of

the IEEE, vol. 64, no. 9, pp. 1322-1331.

Forbes, Inc. (2012, June 5). CIO Next. Retrieved from fobes.com:

https://www.forbes.com/sites/ericsavitz/2012/06/05/netflix-shifts-traffic-to-its-

own-cdn-akamai-limelight-shrs-hit/#77a33000294d

Fryxell, D. (2002, April). Broadband Local Access Networks;An Economic and Public

Policy Analysis of Cable Modems and ADSL. Pittsburgh, PA, USA: Carnegie-

Mellon University.

Google. (2018, February). Shaka Player Architecture Diagrams. Retrieved from Shaka

Player: https://shaka-player-demo.appspot.com/docs/api/tutorial-architecture.html

Google Inc. (2017, August). Google Cast API Reference. Retrieved from

developers.google.com: https://developers.google.com/cast/docs/reference/

130

Google, Inc. (2018, March 22). Shaka Player Github. Retrieved from Google Shaka

Player: https://github.com/google/shaka-

player/blob/master/lib/abr/simple_abr_manager.js

Google, Inc. (2018, February 1). Supported Media for Google Cast. Retrieved from

Google Cast: https://developers.google.com/cast/docs/media

GPAC. (2017, July). Multimedia Open Source Project. Retrieved from GPAC:

https://gpac.wp.imt.fr/

GPAC MP4Box. (2017, July). Multimedia Open Source Project. Retrieved from GPAC

MP4Box: https://gpac.wp.imt.fr/mp4box/mp4box-documentation/

Harada, R., Kanai, K., & Katto, J. (2015). Evaluations of 4K/2K Video Streaming Using

MPEG-DASH with Buffering Behavior Analysis. Tokyo, Japan: Dept. of

Computer Science and Communication Engineering, Waseda University.

Huang, T.-Y. (2014). A Buffer-Based Approach to Rate Adaptation: Evidence from a

Large Video Streaming Service. Chicago, Illinois: ACM; SIGCOMM.

IEEE Computer Society. (2009, October 29). IEEE Std 802.11n-2009. Part 11: Wireless

LAN MAC and PHY Specification. New York, NY, USA: IEEE.

Intel, Inc. (2016). Intel Quick Sync Video and FFMpeg. Retrieved from Intel Cloud

Computing:

https://www.intel.com/content/dam/www/public/us/en/documents/white-

papers/cloud-computing-quicksync-video-ffmpeg-white-paper.pdf

131

Intel, Inc. (2018, March 30). Intel Media Server Studio. Retrieved from Intel Developer

Zone: https://software.intel.com/en-us/intel-media-server-studio/code-samples

International Standards Organization. (2015, December 15). MPEG Specification 14496-

12. Information Technology – Coding of Audio Visual Objects. Part 14: MP4 file

format. Geneva, Switzerland: ISO.

International Standards Organization. (2015, December 15). MPEG Specification 14496-

12. Information Technology – Coding of Audio Visual Objects. Part 12: ISO

Base Media File Format. Geneva, Switzerland: ISO.

ISO, DASH Specification. (2012, April 1). Dynamic Adaptive Streaming over HTTP

(DASH) ISO 23009-1. Geneva, Switzerland.

Jack, K. (2007). Video Demystified, 5th Edition. Burlington, MA: Newnes/Elsevier.

Jagannath, A. (2014). Implementation and Analysis of User Adaptive Mobile Video

streaming using MPEG-DASH. Arlington, Texas, USA: The University of Texas

at Arlington.

Jangeun, J. (2003). Theoretical Maximum Throughput of IEEE 802.11 and its

Applications. Proceedings of the Second IEEE International Symposium.

Karagkioules, T., & Concolato, C. (2017). A Comparative Case Study of HTTP Adaptive

Streaming Algorithms in Mobile Networks. Taipei, Taiwan: ACM; NOSSDAV

2017.

Kernighan, B. W., & Donovan, A. A. (2016). The Go Programming Language.

Crawfordsville, Indiana: Addison Wesley.

132

Khan, S. (2018, March 1). KHANACADEMY. Retrieved from KHANACADEMY:

https://www.khanacademy.org/

Kubernetes, Inc. (2018, April 4). Kubernetes Concepts. Retrieved from Kubernetes:

https://kubernetes.io/docs/concepts/

Laubach, M. E., Farber, D. J., & Dukes, S. D. (2001). Delivering Internet Connections

over Cable. New York, NY: John Wiley & Sons, Inc.

Leider, R. (2015, January 27). Youtube Engineering and Developers Blog. Retrieved

from Youtube: https://youtube-eng.googleblog.com/2015/01/youtube-now-

defaults-to-html5_27.html

Leighton, T. (2008, October). Improving Performance on the Internet. ACM Queue, pp.

22-29.

Lin, T.-S., & Krzanowski, R. (2012). US Patent No. 9232240B2.

Logitech. (2018, March 1). Logitech Brio. Retrieved from Logitech:

https://www.logitech.com/en-us/product/brio

Love, R. (2010). Linux Kernel Development. Crawfordsville, Indiana: Addison

Wesley/Pearson.

Maylett, C. (2005, July 24). https://en.wikipedia.org/wiki/Compression_artifact.

Retrieved from Wikipedia:

https://commons.wikimedia.org/wiki/File:Sego_lily_cm.jpg

133

Mazhar, K. (2011, July 30). Compliance Procedures for Dynamic Adaptive Streaming

over HTTP – (DASH). Munich, Germany: The Royal Institute of Technology,

School of Electrical Engineering.

Merriam-Webster. (2018, March 22). bit rate. Retrieved from Merriam-Webster

Dictionary: https://www.merriam-webster.com/dictionary/bit%20rate

Monroe Electronics. (2017, May 23). Product Information. Retrieved from Monroe:

http://www.digitalalertsystems.com/pdf/pr_170523.pdf

Netflix, Inc. (2017, Sept 29). Serving 100 Gbps from an Open Connect Appliance.

Retrieved from Netflix Tech Blog: https://medium.com/netflix-techblog/serving-

100-gbps-from-an-open-connect-appliance-cdb51dda3b99

Netflix, Inc. (2018). Fill, Updates and Maintenance. Retrieved from Netflix Open

Connect CDN: https://openconnect.netflix.com/en/fill/

Netflix, Inc. (2018, March). Netflix Open Connect Network Configuration. Retrieved

from Netflix Open Connect: https://openconnect.netflix.com/en/network-

configuration/#flash

Netflix, Inc. (2018, Feb). Netflix Openconnect. Retrieved from Netflix Openconnect:

https://openconnect.netflix.com/en/

Netravali, R., Mao, H., & Alizadeh, M. (2017). Neural Adaptive Video Streaming with

Pensieve. Proceedings of the 2017 ACM SIGCOMM Conference.

Ozer, J. (2017). Video Encoding by the Numbers. Galax, VA: Doceo. Retrieved from

Ozer, Jan. (2017). Video Encoding by the Numbers.

134

Pantos, R. (2017, August). IETF. Retrieved from IETF: https://tools.ietf.org/html/rfc8216

Park, A. (2013, April 14). The Netflix Tech Blog. Retrieved from Netflix:

https://medium.com/netflix-techblog/html5-video-at-netflix-721d1f143979

Patterson, D. A., & Hennessy, J. L. (2014). Computer Organization and Design.

Waltham, MA, USA: Morgan Kaufmann.

Peterson, L. L., & Davie, B. S. (2012, March). Computer Networks, 5th Edition.

Burlington, MA: Morgan Kaufmann.

Peterson, L. L., & Davie, B. S. (2012, March). Computer Networks, 5th Edition.

Burlington, MA: Morgan Kaufmann.

Roku. (2018, February). Roku Developer Documentation. Retrieved from Roku:

https://sdkdocs.roku.com/display/sdkdoc/Audio+and+Video+Support

Sharon, C., & et. al. (1998). USA Patent No. US6389473B1.

Silva, R. (2018, January 15). Macroblocking and Pixelation - Video Artifacts. Retrieved

from Lifewire: https://www.lifewire.com/macroblocking-and-pixelation-1847333

Sodagar, I. (2011, October). The MPEG-DASH Standard for Multimedia Streaming over

the Internet. Geneva, Switzerland. Retrieved from

https://cs.uwaterloo.ca/~brecht/courses/854/readings/mpeg-dash-standard-ieee-

mm-2011.pdf

Software, Primo. (2017). About Hardware Acceleration. Retrieved from AVBlocks:

http://wiki.avblocks.com/about-avblocks/about-hardware-acceleration#avc-h-264-

encoding

135

speedtest.net. (2017, September 27). United States Reports. Retrieved from speedtest.net:

http://www.speedtest.net/reports/united-states/#fixed

Stallings, W. (1990). Local Networks. New York, NY: MacMillan.

Statista, Inc. (2017, December). Media, Advertising, , TV, Film. Retrieved from The

Statistics Portal: https://www.statista.com/statistics/497279/comcast-number-

video-subscribers-usa/

Unified Streaming. (2018, February). Which Devices Support DASH Playback? Retrieved

from Unified Streaming: http://docs.unified-streaming.com/faqs/players/dash-

players.html

Verizon, Inc. (2018, March 22). Verizon FIOS. Retrieved from Verizon Inc:

https://www.verizon.com/home/fios-fastest-internet/

Verizon, Inc. (2018, March 22). Verizon High Speed Internet. Retrieved from Verizon:

https://www.verizon.com/home/highspeedinternet/

Watkinson, J. (2004). The MPEG Handbook, 2nd Edition. In J. Watkinson, The MPEG

Handbook, 2nd Edition. Burlington, MA: Watkinson, John (2004). The MPEG

Handbook, 2nd Edition.

Weil, N. (2017, Spring). The State of MPEG-DASH 2017. Retrieved from Streaming

Media Europe:

http://www.streamingmediaglobal.com/Articles/Editorial/Featured-Articles/The-

State-of-MPEG-DASH-2017-116505.aspx

136

Widevine, Inc. (2018, March 22). Widevine Supported Platforms. Retrieved from

Widevine: http://www.widevine.com/supported_platforms.html

Wikipedia. (2010, July). Adaptive Bitrate Streaming. Retrieved from

https://en.wikipedia.org/wiki/Adaptive_bitrate_streaming:

http://en.wikipedia.org/wiki/Adaptive_bit_rate

Wikipedia. (2018, February 4). Intel Quick Sync Video. Retrieved from Wikipedia:

https://en.wikipedia.org/wiki/Intel_Quick_Sync_Video#cite_note-21

Youtube/Google Inc. (2018, February 1). Live encoder settings, bitrates, and resolutions.

Retrieved from Youtube:

https://support.google.com/youtube/answer/2853702?hl=en

Zhebin Qian, G. E. (2008). USA Patent No. US8228982B2.

137

Glossary

4K - A video resolution of 3840 horizontal pixels by 2160 vertical pixels.

4K camera - A video camera capable of creating 4K resolution video frames at a specific

frame rate such as 30 frames per second. The camera typically communicates on

a high-speed connection such as USB bus and is accessed through a device driver.

720p - a video resolution of 1080 horizontal pixels by 720 vertical pixels. The “p” stands

for progressive scan, which means each image, is complete as compared to

interlaced mode, 720i.

8K - A video resolution of 7680 horizontal pixels by 4320 vertical pixels.

Adaptive Bit Rate - a method of transferring video and or audio media over a network in

which the rate is monitors and adjustments to the version of content being

transferred can be changed during the transfer.

ATSC - Advanced Television Systems Committee (ATSC) standards are a set of

standards for transmission over terrestrial, cable, and satellite

networks. It is largely a replacement for the analog NTSC standard and used

mostly in the United States, Mexico, and Canada.

AVC - see H.264 bandwidth - Bandwidth has two main definitions. First, it is the measure of the width of

a band of RF spectrum and in this case, it is written with units Hz (cycles per

second). For example, a U.S. video channel from frequency 575 to 581 has

bandwidth of 6 MHz. Second, bandwidth refers to the number of bits per second

138

that can be transmitted on a communications link. In this case, it will appear in

units of bits per second (Peterson & Davie, Computer Networks, 5th Edition,

2012). For example, a fiber link may have 10 Gbps of bandwidth. See also

throughput. bit rate - a measure of the speed of data processing usually calculated as the number of

bits per second (Merriam-Webster, 2018). The less common spellings are bit-rate

and bitrate.

CATV - community antenna television.

CDN - content delivery network, a specialized network that replicates and disperses files

to multiple nodes in a larger network such that other nodes that require access to

the data can do so with minimal latency.

Codec - a two-part system that consists of an encoder part that encodes a data set often

for compression purposes into an intermediary data set and a decoder part that

decodes the intermediary data set into a lossy or lossless version of the original

dataset.

DASH - Dynamic Adaptive Streaming over HTTP, a method by which video and audio

media is reduced to segment files and is transmitted over an adaptive bit rate

network efficiently by utilizing multiple encodings of the media that require

different levels of network throughput. die - the individual rectangular section of a silicon wafer that contains one CPU complex.

Informally known as a chip (Patterson & Hennessy, 2014).

139

DRM - Digital Rights Management.

DSL - . A high-speed Internet access method that uses copper

telephone lines to homes and businesses to transmit data. This type of high-speed

Internet requires a DSL modem.

DVR - digital video recorder.

EAS - Emergency Alert System. fps - frames per second.

Gbps - gigabits per second. 1 x 109 bits per second.

GUI - Graphical User Interface.

H.264 - a block-oriented video compression standard created by the Motion Pictures

Expert Group. It is specified as MPEG-4 Part 10 and is also known as Advanced

Video Coding (AVC).

H.265 - also known as High Efficiency Video Coding (HEVC), a video compression

standard created by the Motion Pictures Expert Group. Compared to H.264,

H.265 offers about double the ratio at the same level of video

quality.

HEVC - see H.265

HTTP - Hypertext Transfer Protocol. hyperthreading - an Intel CPU concept in which one physical core appears to the

operating system as two logical cores or hyperthreads and hence the operating

140

system may run two threads simultaneously on the same core. Internally, the

hyperthreads share a number of resources such as the execution unit, cache

memory and bus interfaces and therefore can only run simultaneously when there

is no resource conflict.

ISP - Internet Service Provider.

Kubernetes - an open-source, highly scalable software container management system

created by Google and placed into the public domain. macroblock - a processing unit in image and video compression that typically consists of

16×16 samples. macroblocking - a video artifact in which objects or areas of a video image appear to be

made up of small squares, rather than proper detail and smooth edges (Silva,

2018).

Mbps - Megabits per second. 1 x 106 bits per second.

MPD - Media Presentation Description. A metafile in the DASH system that describes

the details of the media segments including their timing, duration, video

resolution and required bandwidth.

MPEG - Motion Pictures Expert Group, a working group of that was formed by ISO in

1988 to set standards for audio and video compression and transmission. MPEG

was responsible for example for the specification of MPEG2 video, mp3 audio,

and MPEG4 video.

141

MPEG-4 - is a method of defining compression of audio and visual (AV) digital data was

introduced in late 1998 by MPEG. Its advances over MPEG-2 include improved

encoding rates, 3D rendering, and DRM, Digital Rights Management.

MPEG-DASH - Dynamic Adaptive Streaming over HTTP (DASH) is a protocol intended

to support a media-streaming model for delivery of media content in which

control lies exclusively with the client. Clients may request data using the HTTP

protocol from standard web servers that have no DASH-specific capabilities (ISO,

DASH Specification, 2012). over-the-top - a method of streaming video over the internet as opposed to using

individual broadcast channels in the cable television system.

OSI (Open System ) Network Model - The seven-layer network reference

model developed by the ISO that guides the design of ISO and ITU-T protocol

standards (Peterson & Davie, Computer Networks, 5th Edition, 2012).

Pbps - petabits per second. 1 x 1015 bits per second.

RAM Disk - a section of random-access memory in a computer configured to function

like a disk drive

RF - , electromagnetic wave frequencies used for radio communications,

television broadcasts or network transmission modulation. A portion of the RF

spectrum is typically dedicated to a specific purpose. For example, the 6 MHz

portion of the RF spectrum centered at 597 MHz may be used to transmit a

composite MPEG channel on a cable television network.

142

RTP - real-time protocol, specified by IETF RFC 3550. RTP provides end-to-end

network transport functions suitable for applications transmitting real-time data,

such as audio, video or simulation data, over multicast or unicast network

services. RTP does not address resource reservation and does not guarantee

quality-of-service for real-time services.

STB - Set-top Box, a device collocated with a television that combines RF tuning,

channel selection, and channel decryption. They are often rented to the consumer

by a cable television service provider.

Tbps - terabits per second. 1 x 1012 bits per second.

TCP/IP - Transmission Control Protocol (TCP) and Internet Protocol (IP) throughput - A term often used interchangeably with bandwidth, but refers more to the

measured performance of a communications channel (Peterson & Davie,

Computer Networks, 5th Edition, 2012). For example, a 10 Mbps Ethernet link

has 10 Mbps of bandwidth but after accounting for layer encapsulation and other

factors, it may only have 8 Mbps of throughput. See also bandwidth.

URL - Uniform Resource Locator, a reference to an internet resource that specifies its

location on a computer network and a protocol for retrieving it. For example, a

file on the internet hosted on computer abc.com could be referenced by the HTTP

protocol with the URL http://www.abc.com/file.html

143