DEGREE PROJECT IN COMPUTER SCIENCE AND ENGINEERING, SECOND CYCLE, 30 CREDITS STOCKHOLM, SWEDEN 2018
Towards network based media processing using cloud technologies
ROBERTO RAMOS CHAVEZ
KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE
Abstract
Media delivery based on HTTP adaptive bitrate streaming protocol has become one of the most popular methods for streaming video and audio over the Internet. These services are commonly built on top of centralized infrastructures that rely on network conditions or computing resources, which may reduce performance applications on the client side. Massive increases in cloud technologies, such as infrastructure as a service (IaaS), container as a service (CaaS), and new deployment paradigms like function as a service (FaaS) provide accessible deployment of virtual functions in multiple locations, which reduces various streaming performance constraints. However, there is no clear understanding of whether all cloud deployments are suitable for media delivery. This thesis develops a proof of concept for the deployment of media processing functions in the cloud. Three media processing use cases and a media streaming prototype for video on- demand (VOD) ad insertion are designed, implemented, and evaluated. Each of the three media processing use cases is evaluated in a specific deployment architecture to obtain key performance indicators (KPIs) for use in future networks, such as 5G and network-based media processing (NBMP). Acknowledgment
First, I would like to express my enormous gratitude towards my supervisor Dr. Rufael Mekuria at Unified Streaming. For his guidance and strong patience during this work. I would also like to thank Dirk Gri oen and Mark Ogle for our endless discussions and support during the past six months. I have learned a lot from them, my perception as a researcher, technology, and teamwork has changed dramatically. Their extensive knowledge in the field of media delivery inspired me every day that I was present at Unified Streaming. I would also like to stretch my thanks to Prof. Markus Flierl, for his continuous advice during his academic courses and the thesis. But mostly I would like to thank my parents and family for supporting me through the years.
Amsterdam, October 10th, 2018 Sammanfattning
Medieleverans baserad p HTTP-adaptivt bitrate-streamingprotokoll har blivit en av de mest populra metoderna fr streaming av video och ljud ver Internet. Dessa tjnster r vanligtvis byggda utver centraliserad infrastruktur som bygger p ntverksfrhllanden eller datafrbrukning, vilket kan minska prestandatillmpningar p kundsidan. Massiva kningar i moln teknik, ssom infrastruktur som en tjnst (IaaS), behllare som en tjnst (CAA) och nya distributions paradigm som fungerar som en tjnst (FAAS) tillhandahlla lttillgnglig utbyggnaden av virtuella funktioner p flera platser, som du minskar flera streaming prestanda begrnsningar. Det finns emellertid ingen klar frstelse fr huruvida molnutlggningar r lmpliga fr medialeverans. Denna avhandling utvecklar ett bevis p konceptet fr distribuering av mediabehandlingsfunk- tioner i molnet. bearbetning anvndningsfall och en prototyp fr media streaming video on-demand (VOD) ad insttning r utformade, genomfrs och utvrderas tre medierna. Var och en av de tre me- diebearbetningsanvndningsfall utvrderas i ett visst utplacering arkitektur Ska↵a nyckeltal (KPI) fr anvndning i framtida ntverk, och 5G: ssom ntverksbaserad media bearbetning (NBMP). Contents
1 Introduction 1 1.1 Background Information ...... 1 1.2 ResearchPurpose...... 1 1.3 Motivation ...... 1 1.4 Challenges...... 2 1.5 Researchquestions ...... 2 1.6 Thesisoutline...... 3
2 Technical Background and Related Work 4 2.1 Media Delivery ...... 4 2.1.1 HTTP adaptive bitrate streaming ...... 4 2.1.2 Live and on-demand streaming models ...... 4 2.1.3 Multimedia container formats and media descriptor languages ...... 5 2.1.4 Mediaprocessingsoftware...... 7 2.2 Cloud Computing ...... 8 2.2.1 Computing Virtualization ...... 8 2.2.2 Network Virtualization ...... 9 2.2.3 Container orchestration engine ...... 9 2.2.4 Functionasaservice...... 10 2.2.5 Edge computing ...... 11 2.3 Related Work ...... 12 2.3.1 Media processing operations in the network ...... 12 2.3.2 Clouddeploymentarchitectures...... 14 2.3.3 Mediaprocessingfunctions ...... 14 2.4 Our Contribution ...... 15
3 Media processing Function Use cases 16 3.1 Content packaging ...... 16 3.2 VideoTranscoding ...... 18 3.3 ContentAdInsertion...... 19
4MediaProcessingDeploymentDesign 20 4.1 Software Components ...... 21 4.2 Cloud Technology Components ...... 22 4.2.1 Hypervisor-based virtualization deployment technology ...... 22 4.2.2 Container orchestration deployment technology ...... 23 4.2.3 Serverless deployment technology ...... 25 4.3 DesignDecisions ...... 26 4.3.1 Cloud deployment technology decisions ...... 27 4.3.2 Ad insertion prototype design using Container Orchestration ...... 29
5 MediaProcessingDeploymentImplementation 32 5.1 Introduction...... 32 5.2 VMDeploymentImplementation ...... 34 5.2.1 Mediaprocessingimplementationprocedure ...... 34 5.3 Container Orchestration Deployment Implementation ...... 35 5.3.1 MediaprocessingImplementationprocedure...... 35 5.4 ServerlessDeploymentImplementation ...... 36 5.4.1 Serverless cloud provider implementation overview ...... 36 5.4.2 MediaprocessingImplementationprocedure...... 36 5.5 Ad Insertion Prototype Deployment Implementation ...... 38 5.5.1 Mediaprocessingimplementationprocedure ...... 38
6 Media Processing Deployment Evaluation 40 6.1 Load Generator Tools and Measurement Specifications ...... 40 6.1.1 MediaprocessingwithCGIexecution ...... 41 6.2 Use case (A) : Content packaging results ...... 43 6.2.1 VMs in content packaging ...... 43 6.2.2 Serverless in content packaging ...... 44 6.3 Use case (B) VideoTranscodingresults ...... 47 6.3.1 VMs in Video Transcoding ...... 47 6.3.2 ServerlessinVideoTranscoding...... 48 6.4 Use case (C) Contentstitchingresults ...... 48 6.4.1 VMincontentstitching ...... 50 6.4.2 Serverlessincontentstitching...... 51 6.5 Use case (D):ContentAdInsertionresults ...... 53 6.5.1 Container orchestration setups ...... 53 6.5.2 Measurement procedure for Ad insertion prototype ...... 54 6.5.3 Container orchestration for Ad insertion results ...... 54
7 Conclusion and Future work 58 7.1 Conclusions...... 58 7.2 Limitations and Future work ...... 59 7.2.1 Implementation constraints ...... 59 7.2.2 Futurework...... 59
Appendices 65
A Serverless Cloud Provider Evaluation 66 A.0.1 Serverless provider performance evaluation ...... 66 A.0.2 Serverlessproviderperformanceresults...... 67
i Chapter 1
Introduction
1.1 Background Information
This project was carried out at Unified Streaming, which is a leading company in streaming services located in the Netherlands. Unified Streaming provides streaming software for o✏ine and on-the-fly packaging content video and audio. Its software is based on C/C++ module and also possible to use on most common web server plugins.
1.2 Research Purpose
We are in a new era of media entertainment which o↵ers viewers endless choices for media consumption, from watching our favorite TV shows, to live streaming sporting events, to following camera feeds on social media in real-time. Consequently, the increase of data tra c through the networks has escalated dramatically, and network operators and service providers are struggling to serve the fast-growing media tra c over existing infrastructure. By 2020 Video tra c is expected to increase up to 80% from all Internet tra c [23]. HTTP Adaptive Bitrate Streaming (ABS) has converged as the primary transport protocol for delivering video over the Internet. The Cloud Services have evolved exponentially by providing ease of access to powerful computing resources in multiple locations. They have developed other novel technologies to improve the provisioning of resources such as Function as a Service(FaaS) and microservices based technologies. These allow to develop, run, and deploy code with specific functionality without controlling the back-end infrastructure. This thesis looks into the performance evaluation of di↵erent cloud technologies available to deploy media processing functions, which can help to achieve target KPI’s for 5G networks and improve the e ciency of media distribution. Cloud technologies such as hypervisor virtu- alization, containers, micro-service, and cloud functions will be used in this research. Chosen media processing functions are based on a di↵erent level of computing complexity in streaming services: content packaging, video transcoding, and content stitching.
1.3 Motivation
There are several motivations to carry out this work. First, the Future networks such as 5G, which can provide distributed cloud computing at the edge. For media delivery, it is attractive because it allows to run specific media processing functions at any point of the network.
1 For instance, a late-transmuxing seen in [46], provides higher performance than a CDN when delivering content, and a server-side ad insertion, which increases level of user personalization. This last one can be achieved by the deployment at strategic geo-locations/edges.
1.4 Challenges
The bottom line of this work it is to embed media processing in cloud-native networks envisioned in 5G. The cloud technologies are not explicitly developed for media processing. Therefore, certain adaptations into the design and implementation will take into account to achieve media processing functions with low latency, handling high volumes of data, and within a large number of users. Virtual Machine(VM), container, microservice, and cloud function, hence this study needs to look at the performance of deployment in di↵erent use cases.
1.5 Research questions
1. Q: ”Are cloud technologies suitable for the deployment and processing of media functions at the edge of the network” This thesis will investigate novel cloud technologies such as VMs, containers, and serverless by the assessment evaluation of media processing functions KPIs. It presents specific media functions used in streaming architectures with a distinct level of computational complexity.
(a) Q: ”What is the state of the art to support media processing functions in the cloud?” Our second research question aims to explore related work to find out what kind of media processing technologies and infrastructure systems have been developed in the past, and which technologies are available for the next generation of 5G Networks in Advanced Streaming services. (b) Q: ”What are the di culties and considerations for the design and deploy- ment of media processing functions using cloud technologies?” Our third question relies on the technologies available, the decisions made for using each of the deployment architectures. It relates to the supported features of each cloud provider, and the ease of deployment for the di↵erent use cases of content packaging, video transcoding, and content ad insertion. (c) Q: ”Which challenges are presented when implementing di↵erent use cases of media processing functions within cloud deployment technologies?” Our fourth question is based on how well our experimental setups of media processing function can be deployed using cloud technologies. This question will able to be answered by carrying out the di↵erent experimental setups. Consequently, it will provide a baseline to chose the most suitable deployment architecture for a (Video on demand)VOD Ad Insertion prototype. (d) Q: ”What is the performance evaluation of the chosen cloud deployment technologies?” Our final question relies on the results and experimental findings from each deployment architectures and media processing use case. It aims to provide critical KPIs from the di↵erent cloud setups and identify if cloud technologies are a suitable method for advanced streaming services.
2 1.6 Thesis outline
Chapter 2 presents related work and technical background based on the evolution of Cloud computing media distribution. Chapter 3 describes the chosen use cases of the selected media processing functions use cases. Chapter 4 and 5 introduce the design and implementation re- spectively, for each media processing function. Chapter 6 presents the evaluation of the di↵erent experimental setups and the evaluation results. Finally, chapter 7 concludes this thesis and outlines future research directions.
3 Chapter 2
Technical Background and Related Work
This chapter is divided into six sections. First, it introduces key concepts for media deliv- ery. Second, it presents a cloud computing background that we will use throughout this work. Third, it introduces the background of future networks like 5G networks and useful virtualization technologies. Fifth, we present the related work to this thesis. Finally, we present a summary of the contribution to this thesis.
2.1 Media Delivery
Video streaming enables clients to play back video while the content is downloaded, in comparison to a normal file, where the user must download the entire file before making use of it.
2.1.1 HTTP adaptive bitrate streaming Video streaming on the Internet has concentrated on delivery using HTTP as the primary transport protocol. In addition, HTTP provides keys benefits, like the re-use of the existing In- ternet infrastructure, such as caching and application servers. Additionally, the HTTP protocol is stateless. If an HTTP client requests data, the servers reply by sending the data, and the trans- action is terminated. To address this problem, ABS over HTTP can subdivide a large-source video into small media segments. These smaller media containers can be downloaded and de- coded independently by the client-side player. Furthermore, this approach allows quality/bitrate switching for each of the segments on the client side relying upon the network conditions [63]. Most popular streaming transport protocols for ABS have been developed in the past decade by di↵erent companies and standardization groups, for instance Apples HTTP Live Streaming (HLS) [15], Microsofts HTTP Smooth Streaming (HSS) [48], Adobes HTTP Dynamic Streaming (HDS) [14], and MPEGs Dynamic Adaptive Streaming over HTTP (MPEG-DASH) [51].
2.1.2 Live and on-demand streaming models Figure 2.1 shows the two most common end-to-end video streaming models: live streaming and VOD. The live streaming model uses a live encoder as a media source, which compresses
4 and encodes the media ingest in real time. The streaming server/compute node packages the media data to di↵erent streaming protocols and supplies the videos to the client. The VOD model uses stored multimedia files that are normally already encoded and packaged in multiple format specifications. However, the compute node is still capable of just-in-time configuration, serving the media files based on the users device requirements. Over-the-top media services use CDNs to distribute the content on a large scale. The CDN uses the best possible server based on the geographical proximity to the client. The CDN reduces latencies by caching the requested content from the client device. On the client side, the device renders the video on display.
Figure 2.1: Live and on-demand streaming models using a compute node.
2.1.3 Multimedia container formats and media descriptor languages Multimedia data streams are based on container formats. The information inside the con- tainer is divided into two type of information: the physical data and the metadata. For instance, metadata is needed when the client device uses information such as subtitles, compression stan- dards, and bitrates, among others. In video streaming, there are two main multimedia formats: MPEG-2 transport streams (MPEG-2 TS) [24] (extensions: .ts, .tsv, and .tsa) and Interna- tional Organization for Standardization (ISO) base media file formats (ISOBMFF) [39] (MP4, fragmented MP4, and data reference (dref) MP4). Apple adopted this standard in their HLS protocol, becoming an important standard for the video streaming industry. The MPEG-2 TS is based on multiplexing together audio, video tracks, and metadata. The MP4, fragmented MP4 (fMP4), and dref MP4 containers are part of the MPEG-4 Part 12 standard, which covers the ISOBMFF. The MP4 container became a widely adopted format.
ISO base media file format container specifications An MP4 media container is formed of a file type box (ftyp ), which provides the compatible brands and specifications of the file. The movie box (moov) represents the container box that contains metadata (e.g., title, tags, descriptions, time, etc.) for the media presentation. The movie fragement box(moof) contains the information about sample locations and sample sizes. The box type (stbl) refers to the table that contains all the time and data indexing of the media samples in a track. The media data box (mdat) is the container box, which holds the actual media data for the presentation. Figure 2.2b shows the fMP4 container, which has interleaving moov and moof boxes in comparison to only one moov type box from the MP4 container shown in Figure 2.2a. The dref MP4 container is a data reference box specified in the ISOBMFF [39]. It is defined as a data reference object. It contains a table of data references as URL locations
5 of the actual media data used within the presentation. Figure 2.2c shows an example of five referenced video tracks.
(a) MP4 Container (b) fragemented MP4 container (c) dref MP4 container
Figure 2.2: ISO base media file format container types.ISO base media file format container types.
Synchronized Multimedia Integration Language media files Synchronization Multimedia Integration Language (SMIL) 2.01 is a markup language that describes multimedia representations. It allows handling parameters, such as time, layout, ani- mation, visual transitions, and other multimedia specifications. It provides a simple method to read di↵erent video content sources that must be stitched together. Compared to other media formats, the SMIL file was a convenient choice that fit the use-case requirements for content ad insertion. Figure 2.3 shows a SMIL example file with a pre-roll example, showing a five-minute video of Tears of Steal content, plus five seconds of content from an advertisement.
1https://www.w3.org/TR/2005/REC-SMIL2-20050107/
6 1 2