Game of Streams

A Comprehensive Study of Streaming Cyberlockers

Benjamin Braun Rizwan Ahmad Department of Computer Science and Department of Computer Science and Engineering Engineering University of California, San Diego University of California, San Diego [email protected] [email protected]

ABSTRACT this website by authorities showed a complex, intricate re- Streaming cyberlockers, third-party video streaming plat- lationship between the streaming portal, indexing sites, and forms which primarily consist of pirated content, have seen even advertising networks. However, since this raid, there incredible growth in recent years, but seem to avoid much have been very few studies into streaming cyberlockers, and scrutiny and thus have stayed outside the scope of most consequently very little new information about their opera- cyberlocker-centric studies. This project attempts to close tions. this gap in the literature by identifying central characteris- As such, given the growing popularity of these websites tics of streaming cyberlockers such as their traffic and rev- and the absence of studies that look into them, we sought to enue generation models, the physical location of the web- characterize and identify a few key parts of their infrastruc- sites, relationships between websites, and whether there are ture. Specifically, we wanted to understand the following: any feasible forms of external intervention capable of ex- • How they attract and maintain viewers and uploaders. ploiting faults in their operational model. We found that sites did tend to locate themselves in similar locations, and • What their main source of revenue is. that their revenue is generated primarily through advertis- ing, while an affiliate model drives traffic to the website by • Whether there are any prominent similarities or rela- attracting content uploaders, who in turn attract viewers. tionships between websites We were unable, however, to identify any feasible methods of interfering with this model, and determined that current • If there are any readily identifiable means of interven- anti-piracy efforts such as DMCA takedowns are largely un- ing and halting the explosive growth they are currently successful although further studies which take into account experiencing. indexing sites and advertising networks could potentially de- termine some form of intervention. This was accomplished by performing a study on 131 dif- ferent streaming cyberlockers, in which advertising networks Keywords and video content hosters were identified, and other perti- nent information was collected and analyzed. Additionally, Abuse, streaming cyberlockers, we were able to gain some insight regarding the ecosystem of streaming cyberlockers, which is discussed in the next 1. INTRODUCTION section. Streaming cyberlockers are third-party, web-based plat- forms which allow viewers to stream videos hosted on their 1.1 How Streaming Cyberlockers Operate platform. While some video streaming platforms are legiti- There are a number of actors involved in the successful mate sites, such as youtube.com, many of them are used pri- operation of a streaming cyberlocker. A general overview of marily to host and stream copyright protected or otherwise the ecosystem is shown in Figure 1. The three most impor- illegal content. In recent years, such streaming cyberlockers tant actors that can be seen in the diagram are the streaming have become particularly popular, with one such website, sites, indexing sites, and content uploaders. streamcloud.eu cracking the top 25 most visited sites in Streaming Sites: The streaming site itself usually has by Alexa ranking, and maintaining a global rank- a relatively simple interface. We found that most sites us a ing of 611. This popularity has caused an uproar amongst third party video player such as the JWPlayer. Most sites organizations such as the MPAA and other content creators, do not provide a browsable or searchable index on the site who have notoriously worked to shut down these websites. itself. Instead the viewer needs to have a URL to a video Despite their increasing prominence in contemporary so- when coming to the site. While some streaming sites ac- ciopolitical issues, most studies overlook streaming cyber- tually employ web developers to create the sites, there are lockers in particular, and focus instead on a larger class of also a number of website templates available for purchase. A cyberlockers - one-click hosters. As such, streaming cyber- popular example is the XFileSharing template [4], which is lockers have, for the most part, avoided widespread public used by realvid.net and www.thevideo.me as well as oth- scrutiny, to the point that a great deal of the insight we have ers, and can be purchased for $99. Commonly, the streaming about streaming cyberlockers comes from a singular point of sites are hosted on either standard public hosting providers data - a raid on the streaming site kino.to. The analysis of or hosting providers that provide further privacy guarantees,

1 Premium/Affiliate Models

$$

Uploaders Streaming Sites Indexing Site

Viewers Advertising Networks

Website Templates Video Players Video Server Hosts

Figure 1: The Ecosystem of Streaming Cyberlockers in order to increase resilience against DMCA takedown re- of this revenue to be translated directly to profit. The report quests. To generate revenue, all observed streaming cyber- is also cited in a letter from the MPAA to the United States lockers serve aggressive ads on their sites. Some sites also Trade Representative [10], which led to a number of stream- offer premium membership for benefits, such as being able ing sites being mentioned on the USTR Notorious Markets to download the videos, faster streaming, or longer file avail- Report. ability. Others offered a more intriguing affiliate member- The other cyberlocker-based studies we identified focused ship, which allowed uploaders to receive money in response primarily on one-click hosters, such as a study performed by to views generated by their videos. Lauinger et al. [6], who conducted a large scale measurement Indexing Sites: Because the streaming sites do not pro- study on the effectiveness and impact of current anti-piracy vide an index themselves, there are a number of indexing efforts against one-click hosters. In their study, they crawled sites available to help viewers locate the videos they want two indexing sites to collect links to pirated files hosted on to watch. The indexing sites aggregate and organize links cyberlockers and observed how their availability. From this, to videos hosted on streaming sites, while not hosting any they investigated the impact of DMCA takedown Notices illegal content on their own site. Indexing sites come in var- and court mandated actions by the OCHs against piracy ious shapes and forms, with some indexing sites focusing and hardware seizures. They concluded that current anti- exclusively on TV shows, while others focus on other types piracy measures are mostly ineffective. of media. In a related study [7], Lauinger et al. also investigated Uploaders: Uploaders collect content distributed through prevalence of copyrighted material on one-click hosters us- other channels, such as torrents and one-click hosters, and ing the available file metadata to heuristically infer whether upload it to the streaming sites. Their main incentive are content is legitimate or infringing, concluding that the vast the affiliate rewards systems of the streaming sites, which majority of content hosted on these sites was indeed infring- pay the uploader anywhere between $10 and $40 per 10,000 ing material. unique video views. To attract more people to view the Liu et al. [9] studied the structure of URL-sharing sites uploaded videos, the uploader will typically distribute the linking to one-click hosters. They collected pages containing unique links to a number of indexing sites and numerous cyberlocker URLs for movies via the use of general search other places, including the information field on YouTube engines and proceeded to analyze characteristics of those videos. indexing sites. In this study, we strive to perform an analysis similar to that of the spam value chain in [8]. Specifically, we aim 2. RELATED WORK to take a closer look a the ecosystem of streaming cyber- The only study we are aware of that looked into stream- lockers to uncover the inner workings of these sites and to ing cyberlockers was one performed by NetNames, a brand identify parts of the ecosystem that are more susceptible to protection agency. This report [11] investigated the cost intervention. and revenue structure of direct download and streaming cy- berlockers. According to their notion, cyberlockers distin- guish themselves from legitimate cloud storage services by 3. DATA COLLECTION METHODOLOGY not limiting access to files, using affiliate programs that re- Given the previously mentioned goals of this project, we ward uploaders when content is accessed, and by dealing chose to collect data regarding the most commonly used ad- with repeat offenders in different manners. Based on their vertising networks found on the websites, the hosting providers estimates, popular streaming cyberlockers make millions of used to serve video content, and the websites’ reactions to dollars in revenue annually from ads and, to a lesser extent, DMCA takedown notices. In order to collect this data, sev- premium models. They also maintain an operating cost that eral different streaming websites were identified, and each is a small fraction of their total revenue, leading the majority site was crawled individually while pulling information re-

2 garding advertising networks and video hosters. Videos sub- the page by injecting it into the DOM, and then sending sequently found in Google DMCA notices were identified, the data back to a web server running locally on the same and the status of the video - whether it was live or taken machine, which would store it in a SQL database. down - was noted. After aggregating this information, we The injected code initially attempted to play the video, proceeded to identify trends between different cyberlockers. which would cause the browser to send out several network The observed trends are noted in section 4. requests to the video server in order to receive the video content. However, due to the prevalence of Flash-based 3.1 Collecting Links to Streaming Sites video players, and protections which prevent Javascript from In order to collect links to streaming sites, a Ruby crawler interacting with Flash objects, this method proved to be was implemented to go through an indexing site, watchseries. impractical. As a result, this method was substituted for lt, and collect two sets of links to streaming sites. The first one which would make specific calls to the Javascript-visible set was obtained by simply looking through the index of all API exposed by the Flash objects and then would pull the television shows on watchseries.lt to gain an initial base of video URL out from the responses to these calls. Currently, websites. The second set contained links to television who this method only works with the commonly used JWPlayer, episodes that had aired the day immediately prior to the which allows the URL of the currently playing video to be crawling. By doing this, we were able to obtain a large base obtained by calling jwplayer().getPlaylistItem().file. set of cyberlockers that contained prominent contemporary The extension has been built in such a way, though, that cyberlockers in addition to older ones that potentially could extending it to work with other video players is trivial. have been shut down or otherwise modified over time. Over a two day collection period, this method identified 3.3 Host Analysis a total of 7103 links to pirated content spanning 131 do- When collecting the data regarding video hosts, we were mains. After improvements were made to the crawler, over able to extract either the IP address or the fully qualified 65,000 links were collected during a five day period. How- domain name of the server hosting the video. For each iden- ever, further analysis was only run on the initial 7103, as tified host, we looked up the reverse DNS name. Addition- this was deemed a large enough sample size for the scope of ally, we used an API provided by Team Cymru [3] to look up this project. the associated autonomous system number (ASN), the AS name, the AS country code from the allocation data of the 3.2 Identifying Video Hosts regional registries, and the registered IP address block. To As different websites used different video players and stored get somewhat more accurate location data, we also use the video information differently, identifying video hosts required MaxMind’s GeoLite2 free City database [2] to get the coun- manually finding ways to properly parse the pages for the try and city of the server. We further verified the automated host. As a result, only a subset of 45 domains, out of the 131 results by manually checking a small number of results using identified, were polled for video host information, and two traceroute. different approaches were used in order to pull the location of the video content. 3.4 Identifying Advertising Networks The first method involved running a Python script that Advertising networks present on webpages were found by would find and parse variables readily available in the HTML the Chrome extension that was used previously in conjunc- source of a webpage to rebuild the location of the video. For tion with Adblock. The extension would identify ad net- some of the services, we could directly extract the video works by logging the destination of network requests that URL from the HTML source using a simple regular expres- were blocked by Adblock. These destinations were then sion. However, most sites had some sort of protection in saved in a database future analysis. The presence of the place to prevent the automated extracting, so as to prevent net::ERR_BLOCKED_BY_CLIENT flag in the requests’ error mes- other software to stream or download the videos without vis- sage made these blocked requests easily distinguishable from iting the website, and therefore reducing ad revenue for site other ordinary failed network requests, and dropped the rate operators. We were able to deobfuscate the scheme used of false positives in the data dramatically. by a number of popular sites, in particular the one used While the Chrome extension’s video parsing ability was by sites of the so-called ”Movshare Group,” who are talked limited to sites utilizing JWPlayer, its ability to parse adver- about later in this report. For another group of sites that tising data was not prone to the same restrictions, therefore required the user to click on a button before actually load- allowing us to identify ad networks on all domains. The ex- ing the video, we were able to simulate the button with tension as a whole, however, was limited by the fact that it a POST request and could subsequently extract the video was significantly slower than the scripts used for other data URL. This method proved to be reusable across several dif- collection. As a result, we could not parse all 7103 links, ferent websites, producing accurate results relatively quickly. though we did ensure, for the completeness of the data, that However, some websites used customized schemes on which at least 5 pages per domain were parsed for advertising net- this crawler failed, or would have more complicated redirect works. pages which required user interaction in order to expose the Flash objects. 3.5 Crawling DMCA Takedown Requests In order to parse video host locations for the websites that To analyze streaming cyberlockers’ responses to DMCA did not work with the Python script, a Chrome extension takedown requests from copyright holders, we made use of was used to execute Javascript code which would expose the the Chilling Effects database [1]. Chilling Effects is an in- video location. The extension worked by reading in websites dependent 3rd party research project which collects DMCA that were stored in a Redis queue, loading them in Chrome, takedown requests directed to a number of participating on- executing the appropriate Javascript code in the scope of line service providers. We used the Chilling Effects API

3 to search and download all requests that mentioned the Country # Sites hosted fully qualified domain name of any of the services we previ- Netherlands 16 ously identified. The relevant information contained in these United States 15 takedowns was the URL to the infringing content and the 8 date the takedown request was received. We would then France 7 use the aforementioned crawlers and, when necessary, the Switzerland 5 Chrome extension to determine whether or not the videos Bulgaria 2 identified in the takedown notices were live or taken down. Germany 2 All takedown notices for the streaming sites we identified India 2 were directed to Google and aimed to remove the links from 2 Google’s search results. Since Chilling Effects only con- tains takedown notices willfully submitted by companies and Table 1: Hosting provider locations none of the streaming sites directly submitted their received DMCA takedowns themselves, we cannot prove that the sites received the notices, but we do not find it to be an 4.2 Response to DMCA Takedowns illogical jump to believe that the rightholders would also submit the takedown requests to the streaming cyberlocker We focused the analysis of DMCA takedown requests on directly as well. the most popular sites for which our fast crawler was working in order to be able to check a significant number of URLs 4. RESULTS AND ANALYSIS for their availability. For the sites mentioned in Table 2, we were able to check all URLs mentioned in the DMCA search The data received shows that there are a great deal of com- results for that service, with two notable exceptions: Both monalities between streaming cyberlockers, and that there videoweed.es and novamov.com blocked our IP address after are clusters of websites which appear to be owned and op- making a few hundred requests, which required us to use a erated by the same group of individuals. These similarities proxy which eventually got blacklisted as well. grow clearer when comparing trends in DMCA takedown responses and the hosting providers used by the websites. Service Name # DMCA # reported % unavail- 4.1 Video Hosting Providers reports URLs able videos realvid.net 1233 14992 59.7 7 gorillavid.in 3191 93820 92.8

6 nowvideo.sx 4862 144656 7.6

5 fastvideo.in 154 4301 33.8 streamcloud.eu 457 18992 27.4 4 videoweed.es 772 20333 9.0 3 played.to 6130 111622 97.8 2 novamov.com 1623 21805 12.3 # Streaming Sites 1 movshare.net 4862 94695 7.6

0 vidzi.tv 2216 53159 15.4

Table 2: Breakdown of the crawled DMCA take- OVH, FR M247, BE

PLI-AS, CH down requests INCERO, US AS12876, FR NFORCE, NL VERDINA, BG VOXILITY, RO PORTLANE, SE ABOVENET, US WEBAZILLA, NL WESTHOST, US WZCOM-US, US ESTROWEB, NL SOLARCOM, CH COGENT-174, US FDCSERVERS, US SWIFTWAY-AS, GB LEASEWEB-NL, NL GLOBALLAYER, NL LEASEWEB-US, US SERVERIUS-AS, NL MAKSTEN-G-AS, BG We observed a few types of reactions to DMCA takedown CLOUDFLARENET, US THREE-W-INFRA-AS, NL request from the studied sites: One class of sites did not seem Hosting Provider to react to DMCA takedown requests at all. As shown for the example movshare.net in Figure 4, most videos are still Figure 2: Hosting Providers available, even after more than a year. The small fraction of videos that is unavailable (7.6%) is most likely due to videos In total, out of the 131 domains we identified through being deleted by the uploader or for other unknown reasons. crawling watchseries.lt, we were able to parse the video’s Another other class of sites was much more active in re- hosting location from 45 of them. As shown in Figure 2, acting to those requests. For example, played.to has taken these sites have a number of hosting providers in common. down over 97% of the videos mentioned in the DMCA re- The top hosting provider M247 is used by seven different quests. These sites also had a much more legitimate looking sites, having servers located primarily in Bucharest, Ro- website. For example, gorillavid.in has a number of ac- mania and England. For other larger streaming sites, we cessible videos on their homepage that can be accessed di- found that they were using multiple hosting providers, while rectly and seems to also be hosting a significant amount of smaller ones with a lower Alexa rank typically relied on one legitimate videos. hosting provider. Despite the data which does seem to pick out significant We found that the hosting providers are primarily located trends in websites, our methodologies did have some set- in Europe, with 16 sites having servers in the Netherlands backs. For one, as was mentioned before, we do not actu- and 8 sites having servers in Romania. Additionally, some- ally know for which URLs the streaming services received what surprisingly, 16 sites also had servers located in the takedown requests. However, based on the complaints men- United States. The results are summarized in Table 1. tioned in [10] and the minimal effort required once the in-

4 Figure 3: Video Availability for DMCA take- Figure 4: Video Availability for DMCA take- downs on played.to downs on movshare.net fringing URLs are identified, we think it is likely that the a) takedown requests are not only sent to Google, but also the streaming site. Another limitation is that we are not able to distinguish between videos taken down due to a DMCA request and videos deleted after inactivity. Most sites have a policy to delete inactive videos if they have not been ac- cessed within the last 30-90 days, as these videos do not b) generate enough revenue anymore. This is reflected in the statistics for realvid.net and a few other sites. For those, recent videos less than 90 days c) old are mostly still available, while older ones are mostly inaccessible. 4.3 Related Website Clusters After analyzing the above data sources, we identified three distinct groups of websites which appeared to be closely re- lated: Figure 5: Three identified groups of related websites: a) The ”Movshare Group,” b) Go- 1. Movshare, CloudTime, VideoWeed, Novamov, NowVideo rillaVid, MovPod, DaClips, c) VodLocker, VidBull, FileNuke, ClickToWatch 2. GorillaVid, MovPod, and DaClips

3. VodLocker, VidBull, FileNuke, and ClickToWatch lt, we compiled a list of the top 5 upstream websites for For each of these groups, it was found that the websites the 20 most popular sites by Alexa ranking in our data set. had similar user interfaces and language on pages such as the The results obtained from this shows that watchseries.lt Terms of Service and FAQ (Figure 5). For the first group, was one of the top 5 upstream sites for 13 of the 20 sites, DMCA takedown patterns were also consistent among three as can be seen in Table 3. This does seem to point to a of the websites, as can be seen in Table 2. Additionally, and relationship between the indexing site and the streaming perhaps more significantly, it was found that, in addition to sites, but we are hesitant to draw any concrete conclusions having common hosters, the websites in each group often from this, as our sample size is relatively small and since we used the same servers to serve video content. have not eliminated potential confounding variables, such Our methodology in characterizing these sites as related as the relative popularity of watchseries.lt compared to was further lent credence by the MPAA’s letter to the USTR other indexing sites which commonly link to the streaming [10], in which they reference the ”Movshare group” consist- sites. ing of Movshare, Novamov, Videoweed, and other sites not covered by this study. As an interesting side note, the 4.5 Advertising Networks & Revenue Models MPAA also makes reference to the indexing site we used, Upon looking through the advertising data, 116 distinct watchseries.lt, as a partner in the Movshare group. ad networks were identified, with each website having ads from, on average, three different networks. Overall, the dis- 4.4 Upstream Traffic Sources tribution of ad networks over websites can be seen in Figure As a result of the MPAA letter’s reference toward the 6. One of the prominent features that can be seen in this relationship between the Movshare group and watchseries. graph is the fact that there are three extremely prominent

5 We were additionally able to characterize the seeming ineffectiveness of DMCA takedown requests across several streaming cyberlockers. This ineffectiveness, together with the extremely broad range of ad networks and hosters used by the websites and the feedback loop with uploaders formed a seemingly impenetrable operational model which makes intervention extremely difficult from external entities. A few goals for future work into this area would include increasing our data set to incorporate websites linked to by other indexing sites, analyzing the relationship between in- dexing sites and the cyberlockers they link to, parsing video hosts from a greater range of websites, and perhaps a more involved study similar to [8] in which we actually interact with the cyberlockers to gain further information regarding financial backings and behind-the-scenes services offered. 6. REFERENCES Figure 6: Ad Networks [1] Chilling Effects. https://www.chillingeffects.org/. [2] MaxMind’s GeoLite2 City free. Upstream Site # Recipient Sites http://dev.maxmind.com/geoip/geoip2/geolite2/. [3] Team Cymru IP to ASN mapping. https: watchseries.lt 13 //www.team-cymru.org/IP-ASN-mapping.html. google.com 11 [4] XFileSharing Pro. watchseries.ag 10 http://sibsoft.net/xfilesharing.html. movie4k.to 5 [5] T. Lauinger, E. Kirda, and P. Michiardi. Paying for watchtvseries.se 5 ˘ ´ primewire.ag 5 piracy? an analysis of one-click hostersˆaAZ watchseries-online.ch 4 controversial reward schemes. In Research in Attacks, projectfreetv.ch 4 Intrusions, and Defenses, pages 169–189. Springer, 2012. Table 3: Common upstream sources and sites to [6] T. Lauinger, K. Onarlioglu, A. Chaabane, E. Kirda, which they lead W. Robertson, and M. A. Kaafar. Holiday pictures or blockbuster movies? insights into copyright infringement in user uploads to one-click file hosters. advertising networks, PopAds, AdCash, and DirectRev, that In Research in Attacks, Intrusions, and Defenses, were collectively present on approximately 50% of the sites pages 369–389. Springer, 2013. we crawled. However, despite the prominence of these three [7] T. Lauinger, M. Szydlowski, K. Onarlioglu, networks, there was still a very long tail of ad networks, G. Wondracek, E. Kirda, and C. Kruegel. with several networks appearing on only one site out of the Clickonomics: Determining the effect of anti-piracy 131 crawled. Given this information, we concluded that it measures for one-click hosting. In NDSS, 2013. would be extremely different to cut off revenue to the web- [8] K. Levchenko, A. Pitsillidis, N. Chachra, B. Enright, sites by incentivizing ad networks to not promote on these M. F´elegyh´azi, C. Grier, T. Halvorson, C. Kanich, sites, since there are simply too many networks available for C. Kreibich, H. Liu, et al. Click trajectories: the websites to use. End-to-end analysis of the spam value chain. In Another interesting trend we found regarding revenue mod- Security and Privacy (SP), 2011 IEEE Symposium on, els was the relative absence of premium models, which are pages 431–446. IEEE, 2011. extremely popular amongst one-click hosters [5]. Instead, [9] M. Liu, Z. Zhang, P. Hui, Y. Qin, and S. R. Kulkarni. streaming cyberlockers seemed to be more content with af- Measurement and understanding of cyberlocker filiate models, in which uploaders would get a stipend based url-sharing sites: Focus on movie files. In Advances in on the number of views their videos got. Based on this, we Social Networks Analysis and Mining (ASONAM), infer that the majority of money generated by these websites 2013 IEEE/ACM International Conference on, pages is through ads, and the affiliate model serves as a way to 902–909. IEEE, 2013. attract uploaders, and, consequently, viewer traffic. Given [10] Motion Picture Association of America, Inc. MPAA this, as well as the difficulty in cutting off ad revenue, we filing to USTR on worlds most notorious markets. were unable to identify any apparent methods of curbing the http://www.mpaa.org/wp- increasing amount of profit generated by the websites. content/uploads/2014/10/MPAA-Filing-to-USTR- on-Worlds-Most-Notorious-Markets.pdf, October 5. CONCLUSION 2014. This study successfully identified several relationships among [11] NetNames. Behind the cyberlocker door: A report on streaming cyberlockers and was able to pinpoint potential how shadowy cyberlocker businesses use credit card sources of revenue and traffic. We identified 25 hosting com- companies to make millions. panies which served video content used by the streaming http://www2.itif.org/2014-netnames- cyberlockers, and 116 ad networks in use by the websites. profitability.pdf, September 2014.

6