A Cross-Layer Design for Scalable Mobile Video
Total Page:16
File Type:pdf, Size:1020Kb
A Cross-Layer Design for Scalable Mobile Video The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation Jakubczak, Szymon, and Dina Katabi. “A Cross-layer Design for Scalable Mobile Video.” ACM Press, 2011. 289. As Published http://dx.doi.org/10.1145/2030613.2030646 Publisher Association for Computing Machinery (ACM) Version Author's final manuscript Citable link http://hdl.handle.net/1721.1/72994 Terms of Use Creative Commons Attribution-Noncommercial-Share Alike 3.0 Detailed Terms http://creativecommons.org/licenses/by-nc-sa/3.0/ A Cross-Layer Design for Scalable Mobile Video Szymon Jakubczak Dina Katabi CSAIL MIT CSAIL MIT 32 Vassar St. 32 Vassar St. Cambridge, Mass. 02139 Cambridge, Mass. 02139 [email protected] [email protected] ABSTRACT Cisco visual index, mobile video traffic will grow 66 fold over Today’s mobile video suffers from two limitations: 1) it can- a period of five years [1]. Such predictions lead to a natu- not reduce bandwidth consumption by leveraging wireless ral question: can existing wireless technologies, e.g., WiFi, broadcast to multicast popular content to interested re- WiMax, or LTE, support this impending demand and pro- ceivers, and 2) it lacks robustness to wireless interference vide scalable and robust mobile video? and errors. This paper presents SoftCast, a cross-layer de- (a) Scalability. As demands for mobile video increase sign for mobile video that addresses both limitations. To congestion will also increase. The problem becomes particu- do so, SoftCast changes the network stack to act like a lin- larly severe when many users try to watch a popular realtime ear transform. As a result, the transmitted video signal be- event, e.g., the Super Bowl game. In such case, one would comes linearly related to the pixels’ luminance. Thus, when like to save bandwidth by multicasting the event as a sin- noise perturbs the transmitted signal samples, the pertur- gle video stream. Different receivers, however, have different bation naturally translates into approximation in the orig- channel qualities (i.e., SNRs). Multicasting a single video inal video pixels. This enables a video source to multicast stream to multiple receivers requires the source to transmit a single stream that each receiver decodes to a video qual- at the lowest bit rate supported by their channels. This re- ity commensurate with its channel quality. It also increases duces all receivers to the video quality of the receiver with robustness to interference and errors which now reduce the the worst channel. Since such a design is undesirable from sharpness of the received pixels but do not cause the video a user perspective, the typical approach today transmits an to glitch or stall. We have implemented SoftCast and evalu- individual video stream to each receiver, even when all of ated it in a testbed of software radios. Our results show that these streams share the same content, which is unscalable. it improves the average video quality for multicast users by (b) Robustness. The wireless medium suffers high bit er- 5.5 dB, eliminates video glitches caused by mobility, and in- ror and packet loss rates due to both interference and channel creases robustness to packet loss by an order of magnitude. noise. Video codecs however are very sensitive to errors and losses [2,29]. Fig. 1 plots the impact of interference-caused Categories and Subject Descriptors packet loss on MPEG4 (i.e., H.264/AVC) and SVC layered- video.1 The figure is generated using the reference implemen- C.2 [Computer-Communication Networks]: Miscella- tations of the two codecs [13,34], and by having an interferer neous transmit at regular intervals. (Other details are in 8.4.) The figure confirms past results [29], showing that both§ MPEG4 General Terms video and SVC layered video are highly sensitive to interfer- Algorithms, Design, Performance, Theory ence and become unviewable (i.e., PSNR < 20 dB) when the packet loss rate is higher than 1%. The lack of scalability and robustness in today’s mobile Keywords video stems from the existing design of the network stack. wireless networks, scalable video communications, joint Specifically, mobile video is impacted by two layers in the source-channel coding stack: the application video codec, which compresses the video, and the physical layer, which protects the video from channel errors and losses. Today, video codecs do an excel- 1. INTRODUCTION lent job in compressing the video and removing redundancy. Mobile video is predicted to be the next killer applica- However, they also make the video highly vulnerable to bit tion for wireless networks [1]. In particular, according to the errors and packet losses. In particular, all common video codecs use entropy coding (e.g., Huffman), in which a single bit flip can cause the receiver to confuse symbol boundaries, Permission to make digital or hard copies of all or part of this work for producing arbitrary errors in the video. This compressed personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies video has to be transmitted over an erroneous wireless chan- bear this notice and the full citation on the first page. To copy otherwise, to nel. Thus, the PHY layer has to add back redundancy in the republish, to post on servers or to redistribute to lists, requires prior specific 1 permission and/or a fee. SVC produces a base layer necessary for decoding and a MobiCom'11, September 19–23, 2011, Las Vegas, Nevada, USA. refinement layer that adds details for receivers with better Copyright 2011 ACM 978-1-4503-0492-4/11/09 ...$10.00. channels. 40 MPEG4/H.264 DCT thereby destroying linearity and making the video frag- 35 SVC ile to bit errors and packet losses [19]. This forces the PHY to 30 compensate for the lack of robustness by adding back the re- 25 dundancy in the form of error protection codes. In contrast, SoftCast does not use traditional entropy coding; instead, it 20 Video PSNR [dB] weighs the DCT components according to their entropy, i.e., 15 the amount of information they contribute to the decoded 10 0 0.001 0.002 0.005 0.01 0.02 0.05 0.1 video. This allows SoftCast to leverage the basic idea under- Fraction of Lost Packets lying entropy coding but without destroying the linearity of Figure 1: Impact of interference-related packet loss its design. As a result, the physical layer does not need to add on video quality. PSNR below 20 dB corresponds excessive redundancy to protect the video, which produces to unacceptable video quality [23]. The figure shows an efficient end-to-end design. that both H.264/MPEG4 and SVC (i.e., layered) (b) Resilience to Packet Loss: Current video codecs em- videos suffer dramatically at a packet loss rate as ploy differential encoding and motion compensation. These low as 1%. techniques create dependence between transmitted packets. As a result, the loss of one packet may cause subsequent cor- form of error protection codes. Since the compressed video is rectly received packets to become undecodable. In contrast, highly fragile, video streaming requires the PHY to add ex- SoftCast employs a linear Hadamard transform [3] to dis- cessive redundancy to eliminate the possibility of bit flips or tribute the video information across packets such that each packet loss. This approach is particularly inefficient in mo- packet has approximately the same amount of information. bile video because the PHY needs to add excessive coding As a result, all packets contribute equally to the decoded to deal with channel variations across time due to mobility video, and the loss of a few packets does not cause sharp or interference, and channel variations across space due to degradation in the video quality. receiver diversity. We note that despite its cross-layer design, SoftCast is Theoretical results show that the existing layer separa- relatively easy to incorporate within the existing network tion – i.e., separating source coding (i.e., video compres- stack. Specifically, SoftCast is built atop an OFDM physical sion) from channel coding (i.e., error protection) – is ac- layer similar to that used in today’s WiFi, WiMax and LTE, ceptable only in the case of unicast channels and when the and hence can be realized in such systems by having the statistics of the channel are known a priori to the trans- OFDM PHY layer send the values at SoftCast’s output as mitter [27,30]. Such separation however becomes inefficient the I and Q components of the transmitted digital signal. for multicast/broadcast channels, or when the channel’s We have implemented SoftCast and evaluated it in a statistics are hard to predict due to mobility or interfer- testbed of 20 GNURadio USRP2 nodes. We compare it with ence [27,30]. two baselines: 1) MPEG4 (i.e., H.264/AVC) over 802.11, and This paper aims to improve the robustness and scalability 2) layered video where the layers are encoded using the scal- of mobile video. The paper presents SoftCast, a cross-layer able video extension to H.264 (SVC) and transmitted using design of mobile video that both compresses the video and hierarchical modulation as in [15]. We evaluate these schemes protect it from errors and losses. SoftCast starts with a video using the Peak Signal-to-Noise Ratio (PSNR), a standard that is represented as a sequence of numbers, with each num- metric of video quality [23].2 We have the following findings: ber representing a pixel luminance. It then performs a se- quence of transformations to obtain the final signal samples SoftCast can multicast a single video stream that delivers that are transmitted on the channel.