Youtube Traffic Content Analysis in the Perspective of Clip Category and Duration Li
Total Page:16
File Type:pdf, Size:1020Kb
YouTube Traffic Content Analysis in the Perspective of Clip Category and Duration Li, Jie; Wang, Hantao; Aurelius, Andreas; Du, Manxing; Arvidsson, Åke; Kihl, Maria Published in: [Host publication title missing] 2013 Link to publication Citation for published version (APA): Li, J., Wang, H., Aurelius, A., Du, M., Arvidsson, Å., & Kihl, M. (2013). YouTube Traffic Content Analysis in the Perspective of Clip Category and Duration. In [Host publication title missing] IEEE - Institute of Electrical and Electronics Engineers Inc.. Total number of authors: 6 General rights Unless other specific re-use rights are stated the following general rights apply: Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain • You may freely distribute the URL identifying the publication in the public portal Read more about Creative commons licenses: https://creativecommons.org/licenses/ Take down policy If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim. LUND UNIVERSITY PO Box 117 221 00 Lund +46 46-222 00 00 Download date: 27. Sep. 2021 YouTube Traffic Content Analysis in the Perspective of Clip Category and Duration Jie Li 1, Andreas Aurelius 1,3 , Manxing Du 1,3 Hantao Wang 2, Åke Arvidsson 2 1Acreo Swedish ICT AB, Networking and Transmission 2Ericsson AB, Stockholm, Sweden Laboratory, Electrum 236, SE-164 40 Kista, Sweden. 3 Email: [email protected] Maria Kihl 3Lund University, Department of Electrical and Information Technology, Lund, Sweden Abstract —In this work, we study YouTube traffic world’s most popular online streaming video service with over characteristics in a medium-sized Swedish residential municipal one billion monthly users, and nearly one out of every two network that has ∼∼∼ 2600 mainly FTTH broadband-connected people on the Internet visiting YouTube’s website [4]. households. YouTube traffic analyses were carried out in the Moreover , since its launch in 2005, YouTube has evolved perspective of video clip category and duration, in order to from a pure individual user generated content (UGC) video understand their impact on the potential local network caching service to an important business platform for not only gains. To the best of our knowledge, this is the first time advertisers, but also for ‘tens of thousands of partners that have systematic analysis of YouTube traffic content in the perspective created channels and found and built businesses’[4]. Therefore, of video clip category and duration in a residential broadband network. Our results show that the requested YouTube video it is of great interest to investigate the YouTube traffic clips from the end users in the studied network were imbalanced characteristics and understand its potential impact on both the in regarding the video categories and durations. The dominating Internet traffic as well as the end-user experience. video category was Music, both in terms of the total traffic share Studies on YouTube traffic characteristics have been as well as the contribution to the overall potential local network reported previously [5-9]. In an early study carried out in caching gain. In addition, most of the requested video clips were 2007 [5], using video clip metadata retrieved from YouTube, between 2-5 min in duration, despite video clips with durations analyses on the global popularity evolution of YouTube video over 15 min were also popular among certain video categories, clips were carried out. However, only video clips categorized e.g. film videos. as `Entertainment' and `Science & Technology' were analyzed Keywords—Internet traffic monitoring, residential Internet in this study (1 day for ‘Entertainment' and 8 days for `Science traffic pattern, traffic locality, YouTube, video category, video & Technology'). Besides, no individual end user behavior was duration, caching gain available in this study. In another similar study carried out by Google [6], randomly selected 20 million YouTube videos I. INTRODUCTION uploaded between September 2010 and August 2011 were Internet has now become the global information and analyzed. The results showed that online video popularity communication base for every aspect of people’s work and life. appeared strongly constrained by geographic locality, and the Even so, global Internet traffic keeps on growing steadily with impact of social sharing to the spatial popularity was limited. the ever-increasing mainly video-oriented content distributions Again no individual end user behavior was available in this and services, as well as with the ever-increasing new devices study. In [7-8] measurement studies on YouTube traffic of that are connected to the network [1]. This puts challenges for university campus networks were conducted. The results network operators and Internet Service Providers (ISP) to verified that local and global popularity were not or only deliver broadband IP services that meet the quality-of-service slightly correlated. Somewhat differently, in another study [9] (QoS) requirements of multimedia services needed to provide based on one week traffic data in February 2011 in two campus sufficient quality-of-experience (QoE) to the end users [2]. networks and three nation-wide ISPs, YouTube user access One important part in meeting this challenge is to behaviors were studied, which revealed that users accessed understand the Internet traffic characteristics, especially the YouTube in a very similar manner, independent of their Internet traffic patterns of residential end users. These users location, the device they used, and the access network that have generated according to [1] about 80% of the global IP connected them. traffic in the years 2010-2011, and they are expected to have In this work, we study YouTube traffic characteristics in a even larger total traffic share in the next few years. Indeed, medium-sized Swedish residential municipal network that has many research groups have so far published a large number of ∼ 2600 mainly FTTH broadband-connected households. internet traffic measurement and analysis results. In particular, YouTube traffic analyses were carried out in the perspective of it has been pointed out in a long-term study of residential the requested video clip categories and durations and their Internet traffic evolution that streaming-based video and audio impact on the potential local network caching gains. Compared services are the driving forces for the increase of the total to the previously reported studies touching (partly) the video network traffic [3]. Among them, YouTube has become the clip category or durations in [5][7][9], the novelty of our work is that our results are based on the captured (residential) end- residential municipal network, where the service providers are user YouTube watch requests, from which we have two sets of connected to the network (even so, due to the continuous data: the requests (including the repeated requests for the same upgrade and re-configuration of the network, not all the video) and the unique videos watched (while for those households in the network might be covered by PacketLogic published results only the watched video clip statistics were during the studied period of time) [10 ]. After parsing the available). Moreover, to the best of our knowledge, this is also dumped data packets and retrieving the corresponding metadata the first time systematic analysis of YouTube traffic content in from YouTube, the obtained video watch request records and the perspective of video clip category and duration in a video parameters were transferred into a MySQL data base for residential broadband access network. final traffic statistics aggregation analyses. Note that when transferring data into the MySQL data base server, all the IP II. METHODOLOGY and MAC addresses were hashed, ensuring the protection of A. Target network, traffic data collection and processing the end user integrity. The network involved in this study is a medium-sized B. Data trace and terminologies municipal residential broadband access network in In this work, a 3 week packet dump during 2012-06-08 − Sweden [10 ]. There are approximately 2600 households 2012-07-05 was carried out in the Swedish municipal network. connected to the network. The network is an open network, Since IP addresses were dynamically allocated to each meaning that there are several ISPs that the connected household during the measurement period, MAC addresses households can choose from, and each ISP offers a set of were used to identify each household. In addition, in order to subscription types with the maximum symmetric access speed distinguish different users in a household, the combination of at 100 Mbit/s. the MAC address and the extracted HTTP user agent string was The traffic data collection tool was PacketLogic, a used to represent an end user, i.e., we regard each detected commercial IP traffic monitoring and management equipment different end user device as a unique end user. featuring currently more than 1600 application signatures in its As might already be perceived by attentive readers from the traffic identification engine [11 ]. Apart from traffic application descriptions above, in this work, due to the practical limitation identification, PacketLogic can also be used as a traffic data of the data storage server, we used the captured YouTube filter to select and dump IP packets of a selected application, request records to analyze the YouTube traffic statistics in the e.g., YouTube, using a pre-set filter rule based on the relevant studied residential network. Three aggregated statistical properties of the targeted traffic IP packets.