Levelup: a Thin-Cloud Approach to Game Livestreaming

LevelUp: A thin-cloud approach to game livestreaming Landon P. Cox Lixiang Ao Microsoft Research UC, San Diego ABSTRACT Transcoding 100 hours of multi-bitrate live video on Game livestreaming is hugely popular and growing. Each AWS and Azure costs between $300 and $500, and tran- month, Twitch hosts over two million unique broadcasters ferring video data to viewers can cost thousands of dol- with a collective audience of 140 million unique viewers. lars per channel, per month. These costs are prohibitive Despite its success, livestreaming services are costly to run. for small companies looking to grow new platforms, AWS and Azure both charge hundreds of dollars to encode and represent a significant savings opportunity for large 100 hours of multi-bitrate video, and potentially thousands companies like Amazon, Facebook, and Microsoft. each month to transfer the video data of one gamer to a rel- One reason for these costs is the expense of perform- atively small audience. ing realtime, multi-bitrate transcoding on cloud infras- In this work, we demonstrate that mobile edge devices are tructure. Wowza charges less than $20 to livestream ready to play a more central role in multi-bitrate livestream- 100 hours of single-bitrate video, which is roughly 15x ing. In particular, we explore a new strategy for game livestream- cheaper than 100 hours of multi-bitrate video. Multi- ing that we call a thin-cloud approach. Under a thin-cloud bitrate services typically accept video streams from a approach, livestreaming services rely on commodity web in- broadcaster at a single bitrate and then transcode the frastructure to store and distribute video content and lever- video at multiple bitrates so that viewers can adapt age hardware acceleration on edge devices to transcode video their stream quality to network conditions. When a and boost the video quality of low-bitrate streams. We have viewer's network is strong, it will stream high-quality built a prototype system called LevelUp that embodies the video, and when a viewer's network is weaker, it will thin-cloud approach, and using our prototype we demon- shift to lower-quality video. Generating multiple video strate that mobile hardware acceleration can support real- qualities in realtime is computationally intensive and time video transcoding and significantly boost the quality requires specialized hardware like graphical processing of low-bitrate video through a machine-learning technique units (GPUs) or dedicated hardware transcoders to keep called super resolution. We show that super-resolution can streaming latency low. Unlike commodity cloud services improve the visual quality of low-resolution game streams like web front-ends, blob storage, and content distribu- by up to 88% while requiring approximately half the band- tion networks (CDNs), realtime video transcoding has width of higher-bitrate streams. Finally, energy experiments not achieved economies of scale to drive down costs. show that LevelUp clients consume only 5% of their battery In addition, transfer costs are high due to platforms' capacity watching 30 minutes of video. large audiences and the amount of time these audiences spend watching streams. For example, the total number of minutes viewed on Twitch grew from 355 billion 1. INTRODUCTION in 2017 to over 600 billion in 20193. Game streaming services are large and growing. As In this paper, we argue that edge devices are ready to of February 2018, Twitch reports having over 140 mil- play a more central role in livestreaming platforms. We lion unique monthly viewers.1 And between 2017 to propose a new approach to livestreaming based on the 2019 Twitch increased the average number of concur- notion of a thin cloud, in which the cloud provides com- rent viewers from nearly 750,000 to over 1.2 million. It modity storage and content-distribution infrastructure, also doubled the average number of concurrent streams but is not relied upon to provide expensive resources from 25,000 to 50,0002. Yet despite the popularity of like GPUs and other hardware accelerators. We have Twitch and other livestreaming platforms like Facebook implemented a prototype system called LevelUp that Live, livestreaming is costly. embodies this approach. LevelUp is a game livestreaming service with refactored responsibilities. The cloud is 1twitchadvertising.tv 2twitchtracker.com/statistics 3twitchtracker.com/statistics 1 responsible for storing and distributing video and other The rest of the paper is organized as follows: Sec- files, and edge devices assume responsibility for multi- tion 2 provides background information on livestream- bitrate transcoding and boosting the quality of reduced ing, mobile-device hardware, and image-similarity met- bitrate streams. rics; Section 3 articulates the design principles under- Our approach is enabled by the rapid deployment of lying LevelUp, Section 4 describes the LevelUp design, hardware acceleration on the edge. Because gaming, Section 5 describes our experimental evaluation, Sec- photography, and media playback are critical smart- tion 6 describes related work, and Section 7 provides phone applications, GPUs and video-transcoding hard- our conclusions. ware is commonplace on smartphones SoCs. Further- more, the next wave of accelerators, machine learn- 2. BACKGROUND ing (ML) co-processors, have already been deployed on In this Section, we provide background information high-end smartphones. These accelerators can handle on video streaming, mobile hardware accelerators, and complex ML workloads and will be pervasive in the near image-similarity metrics. future. LevelUp uses hardware acceleration to reduce cloud 2.1 Live streaming costs in two ways. First, it uses broadcasting devices' hardware video encoders to generate and upload videos The two most widely used live-streaming protocols at different resolutions. This obviates the need to transcode are Real-Time Messaging Protocol (RTMP) and HTTP in the cloud. Second, LevelUp stream viewers can use a Live Streaming (HLS). hardware-accelerated convolutional neural network (CNN) RTMP sits directly above TCP. RTMP broadcasters to improve the visual quality of lower bitrate video. split their streams into small audio and video chunks This can significantly reduce the amount of data a gam- (e.g., 4KB videos) and send them over a persistent TCP ing platform must transfer to clients without sacrificing connection to an RTMP server. Each stream viewer also visual quality. maintains a persistent TCP connection to the RTMP We do not claim any contributions to the field of server, and the RTMP server pushes new chunks to its deep learning or ML. Work on compressing models to viewers as they become available. In its simplest form, run more effectively on mobile devices and construct- an RTMP server acts as a pass-through relay for me- ing models that achieve better inference and/or perfor- dia chunks from a broadcaster to a viewer. RTMP also mance are orthogonal to LevelUp. Instead, our contri- supports multi-bitrate coding, in which an RTMP server butions are as follows: transcodes incoming media into different qualities (e.g., different resolutions) before forwarding the appropriate • We identify multi-bitrate transcoding and data trans- chunks to viewers. Viewers are responsible for commu- fer as costly parts of cloud-based livestreaming, nicating their desired level of quality to the server. and propose a thin-cloud approach to streaming The primary benefit of RTMP is that it can pro- that pairs commodity web infrastructure in the vide low end-to-end latency between a broadcaster and cloud and commodity hardware-acceleration on the player. There are at least two reasons for this. First, be- edge. cause RTMP pushes media over a persistent TCP con- • We identify super-resolution as a way to mitigate nection, servers can forward chunks as soon as they are the high cost of transferring game-stream data. In available. This is much faster than the alternative of particular, we observe that super-resolution mod- forcing viewers to poll for new chunks, or worse, initiat- els trained with on specific game content can pro- ing a new connection for each poll. Second, each small vide additional quality improvements over networks chunk in an RTMP stream necessarily covers a brief pe- trained on different games. riod of time. For example, if a broadcaster uploads at • Using our LevelUp prototype, we show that real- 250kbs and the server expects 4KB chunks, then each time multi-bitrate transcoding is feasible on com- 4KB chunk will capture 128ms of video. Thus, a broad- modity devices, and that high-end SoCs can per- caster can send new data to the RTMP server every form realtime super-resolution of reduced-resolution 128ms. videos. We further demonstrate that super-resolution Despite its low latency, RTMP has a major draw- can improve the visual quality of reduced-resolution back. Because RTMP runs directly on top of TCP it game streams by up to 88%, and that training a requires a dedicated server, and thus RTMP does not super-resolution model on specific game content easily integrate with existing content distribution net- can improve visual quality by over 25% compared works (CDNs). In contrast, HLS and Dynamic Adap- to a model trained on a diffferent game's content. tive Streaming over HTTP (MPEG-DASH) scale more Finally, energy experiments show that LevelUp clients easily but increase end-to-end latency. spend only 5% of their total battery capacity to Like RTMP, HLS divides video streams into smaller super-resolve 30 minutes of low-resolution video. segments. However, because HLS runs on top of HTTP 2 rather than directly on TCP, segments are stored as nor- including data transfer costs).4. Amazon's Media Live mal files (e.g., an H264 video) and viewers must retrieve product costs over $500 per month, per video channel new segments by issuing HTTP get requests. HTTP/2 at 1920x1080 resolution and at 30 FPS, also not includ- supports server push, but HTTP/2 is not widely de- ing data transfer costs5.A primary reason that multi- ployed or integrated with HLS, and thus viewers must bitrate transcoding in the cloud is so expensive is that pull video segments over HTTP.

Load more