Video Telephony for End-Consumers: Measurement Study of Google+, Ichat, and Skype

1 Video Telephony for End-consumers: Measurement Study of Google+, iChat, and Skype Yang Xu, Chenguang Yu, Jingjiang Li and Yong Liu Department of Electrical and Computer Engineering Polytechnic Institute of New York University Brooklyn, NY, USA 11201 {yxu10, cyu03, jli12}@students.poly.edu, [email protected] ABSTRACT codes high-quality voice at data rate of 40kbps, a Skype Video telephony requires high-bandwidth and low-delay voice video call with good quality can easily use up bandwidth kbps and video transmissions between geographically distributed of 900 [33]. While seconds of buffering delay is of- users. It is challenging to deliver high-quality video tele- ten tolerable even in live video streaming, in video con- phony to end-consumers through the best-effort Internet. In ferencing, user Quality-of-Experience (QoE) degrades this paper, we present our measurement study on three popu- significantly if the one-way end-to-end video delay goes lar video telephony systems on the Internet: Google+, iChat, over 350 milli-seconds [17]. To deliver good conferenc- best-effort and Skype. Through a series of carefully designed active and ing experience to end-consumers over the In- passive measurements, we are able to unveil important infor- ternet, video conferencing solutions have to cope with mation about their key design choices and performance, in- user device and network access heterogeneities, dynam- cluding application architecture, video generation and adap- ic bandwidth variations, and random network impair- tation schemes, loss recovery strategies, end-to-end voice ments, such as packet losses and delays. All these have and video delays, resilience against random and bursty loss- to be done through video generation and distribution in realtime es, etc. Obtained insights can be used to guide the design of , which makes the design space extremely applications that call for high-bandwidth and low-delay da- tight. This motivates us to conduct a measurement s- ta transmissions under a wide range of “best-effort" network tudy of three existing solutions: iChat, Google+, and how they address video conferenc- conditions. Skype, to investigate ing challenges and how well they do it on the Internet. Specifically, our study is focused on the following issues 1. INTRODUCTION about their key design choices and their delivered user The Internet has fundamentally changed the way peo- conferencing experiences. ple communicate, ranging from emails, text-messages, • System Architecture: A natural conferencing blogs, tweets, to Voice-over-IP (VoIP) calls, etc. We are architecture is Peer-to-Peer (P2P), where users send now experiencing the next big change: Video Telephony. their voice and video to each other directly. Skype Video telephony was originally conceived in 1920s. Due employs P2P for VoIP and two-party video chat. to its stringent bandwidth and delay requirements, for While P2P achieves low-delay, it comes short at years, business customers have been paying high prices achieving high-bandwidth: a residential user nor- to utilize specialized hardware and software for video mally cannot upload multiple high-quality video encoding, mixing and decoding, and dedicated network streams simultaneously. While video conferencing pipes for video distribution.Video telephony had little servers can be employed to relay users’ voice and success in the end-consumer market, until very recently. video, data relay often incurs longer delay than The proliferation of video-capable consumer electronic direct transfer. Conferencing servers have to be devices and the penetration of increasingly faster res- strategically located and selected to balance the idential network accesses paved the way for the wide bandwidth and delay performance of voice and adoption of video telephony. Two-party video chat and video relay between geographically distributed user- multi-party video conferencing services are now being s. offered for free or at low prices to end-consumers on various platforms. Notably, Apple iChat [15], Google+ • Video Generation and Adaptation: To cope Hangout [11], and Skype Video Calls [26] are among the with receiver heterogeneity, a source can generate most popular ones on the Internet. single video version at a rate downloadable by the Video conferencing requires high-bandwidth and low- weakest receiver. One-version design unnecessarily delay voice and video transmission. While Skype en- limits the received video quality on other stronger 2 receivers. Alternatively, multiple video versions their behaviors and performances through a set of care- can be generated, either directly by the source or fully designed active and passive measurement exper- by relay servers through transcoding. Different iments. Extrapolating from the measurement result- video versions will be sent to different receivers, s, we are able to unveil important information about matching their download capacities. In a simple their system architecture, video encoding and adapta- multi-version design, each video version is encoded tion, loss recovery and delivered user QoE. Our major and transmitted separately. It incurs high encod- findings are summarized as following. ing and bandwidth overhead. Scalable Video Cod- 1. While P2P is promising for voice conferencing and ing (SVC) encodes video into multiple layers. It is two-party video calls, P2P alone is not sufficient appealing to adopt SVC in video conferencing to to sustain high-quality multi-party video confer- realize multi-version design with greatly reduced encing. The design choices and performance of overhead. multi-party video conferencing systems are largely • Packet Loss Recovery: To achieve reliability in affected by the availability of bandwidth-rich serv- realtime streaming, the conventional wisdom is to er infrastructures. use Forward Error Correction (FEC) coding, in- 2. Conferencing server locations not only impact the stead of retransmission. However, in video confer- delivered user delay performance, but also affec- encing, video has to be encoded and decoded in re- t the design of loss recovery mechanism and the altime. To avoid long FEC encoding and decoding achieved loss resilience. When servers are located delays, FEC blocks have to be short. This large- close to end users, retransmission is more prefer- ly reduces FEC’s coding efficiency and robustness able than FEC to recover from random and bursty against bursty losses. Meanwhile, retransmission losses. Data relays through well-provisioned server is viable if the network delay between sender and networks can deliver low-delay and high-bandwidth receiver is small. Unlike FEC, retransmission adds voice and video transfers between geographically redundancy only as needed, and hence is more distributed users. bandwidth-efficient. Redundant retransmissions can also be used to protect important packets a- 3. To deliver high-quality video telephony over the gainst bursty losses. The choice between FEC and Internet, realtime voice and video generation, pro- retransmission is tightly coupled with system ar- tection, adaptation, and distribution have to be chitecture and video generation. jointly designed. Various voice and video process- ing delays, incurred in capturing, encoding, decod- • User Quality-of-Experience: Ultimately, the ing and rendering, account for a significant portion performance of a video conferencing system is e- of the end-to-end delays perceived by users. valuated by the delivered user conferencing experiences, which are highly sensitive to various voice 4. Compared with multi-version video coding, lay- and video quality metrics, such as end-to-end voice ered video coding can efficiently address user ac- and video delay, synchronization between voice and cess heterogeneity with low bandwidth overhead. video, video resolution, frame-rate and quantiza- With layered coding, content-aware prioritized s- tion, etc. To provide stable conferencing services elective retransmissions can further enhance the to end consumers over the Internet, it is extremely robustness of conferencing quality against random important for a conferencing system to be adaptive and bursty losses. to varying network conditions and robust against The rest of the paper is organized as follows. We random network impairments. We systematically briefly discuss the related work in Section 2. Our mea- study the delivered user conferencing experiences surement platform is introduced in Section 3. Then we of each system in a wide range of real and emulat- study the application architectures of the three systems ed network scenarios. We further investigate the in Section 4. Video generation and adaptation strate- implications of the design choices made by each gies are investigated in Section 5. Voice and video delay system on their delivered user experiences. performances are measured in Section 6. In Section 7, It is admittedly challenging and ambitious to come we investigate their loss recovery strategies, and mea- up with conclusive answers for these questions. Al- sure their resilience against bursty losses and long de- l three systems use proprietary protocols and encrypt lays. The paper is concluded in Section 8. data and signaling messages. There is very limited pub- lic information about their architectures and protocols. 2. RELATED WORK To address these challenges, we undertake an extensive Most of the previous measurement studies of real- measurement campaign and systematically measure the time communications over the Internet were focused three systems as black-boxes. We

Load more