How much material on BitTorrent networks is infringing content? A validation study

Robert Layton, Paul A. Watters, Richard Dazeley

November, 2010

1 Abstract

BitTorrent is a widely used protocol for peer-to-peer (P2P) , including material which is often suspected to be infringing content. However, little systematic research has been undertaken to establish to measure the true extent of illegal file sharing. In this paper, we propose a new methodology for measuring the extent of infringing content. Our initial results indicate that at least 89.9% of files shared contain infringing content. We discuss the limitations of the approach and outline proposals to further verify the results.

Keywords BitTorrent, infringing content, copyright infringement, piracy

1 Introduction

BitTorrent is a peer to peer (P2P) file sharing protocol which allows files to be efficiently distributed without reliance on a central server [1]. Files are distributed through clients and peers, with each peer containing different pieces of the file. Peers contact each other to download new pieces, while at the same time, allowing the pieces they currently have to be uploaded to other peers. Downloads through BitTorrent can be much faster than traditional downloads, due to the highly distributed nature of the download process. As a protocol, BitTorrent has become extremely popular on the – one global estimate is that BitTorrent traffic accounts for 57.19% of all Internet traffic [2]; one major ISP in Australia estimates that the figure is >50% of all traffic [26]. It has been utilised commercially, with companies such as Blizzard Entertainment using BitTorrent to release patches and updates for their popular online game World of Warcraft [3]. Another legitimate use is distributing updates to computers in corporate networks, allowing for the more efficient utilisation of scarce network resources. However, there is significant debate in many communities over the way in which BitTorrent can and has been used to share and distribute movies, software and music over the Internet, usually infringing the copyright held on the material. When shared illegally, this type of content is known as infringing content. It is fair to say that there are a range of perspectives expressed in the popular and online media about the extent to which BitTorrent and other P2P systems are used to distribute infringing content. On the one hand, critics of the copyright system argue that new technologies have opened up new ways of doing business, and that “old economy” companies must adapt to the changes [4]. On the other hand, creative industries rely on the copyright system to protect their intellectual property. It is important that these matters be publicly debated; our intention in this paper is not to enter into this debate, however, but to introduce a methodology that can be used to provide objective evidence about the true nature of copyright infringement over BitTorrent networks. Evidence must be a key part of any public debate; often, proponents highlight the “positive” aspects of the technology. For example, the very popular “BitTorrent for Dummies” book [5] says that BitTorrent can be used for: 2  Distributing “free” computer operating systems (like Linux)  Distributing a “free” file to create “buzz”  Distributing a book “for free” (like Free Culture)  Distributing musical recordings released for “free” (like Phish)  Distributing beta software All of these uses are theoretically possible. The critical question for copyright owners and law enforcement is whether these “free” items constitute the vast majority of BitTorrent file sharing or not. File sharing proponents would generally argue for the former position – indeed, during trial, the defendants argued that 80% of torrents were legal [27]. In contrast, copyright owners often argue that the opposite case must be true. Finding an objective answer would assist all parties involved in fighting or advocating for file sharing to understand the actual scale and scope of the problem. However, given the distributed nature of P2P protocols, answering this question in a rigorous and reliable manner is non-trivial. To understand why the question is significant, consider why file sharers use P2P technology rather than a website with a single URL. In simple terms, BitTorrent and similar P2P technologies work in the following way: 1. A source file is created for sharing. 2. A is created, that acts as a table of contents for fragments of a shared file. It contains the expected filenames of the shared files, the number of fragments in the file, and the hash of each fragment, so that the client can verify that the file has been reconstructed correctly. It also has a list of preferred and alternate trackers, and – for the latest version of the protocol – distributed hash and details. 3. A tracker is notified that the source file is ready for sharing 4. The source is seeded until enough copies are available in fragmentary form on clients that have downloaded the source file. 5. Downloading the source file requires (a) finding the torrent, and (b) ensuring that all of the fragments are available and ideally downloaded from “peers” who have the highest bandwidth and lowest packet latency relative to the downloader, by using a client that understands the BitTorrent protocol. Searching is performed at one of several searching sites, such as The Pirate Bay or Isohunt. Integrity checks performed by the client ensure that the file is correctly reassembled, using a hashmap of the file. P2P systems can be considered highly secure: (1) availability is provided through numerous peers rather than a single server representing a single point of failure; (2) access control can be provided through a number of different frameworks [6]; and (3) confidentiality can be provided through encryption of the source file. P2P technology reduces the bandwidth burden and cost associated with content producers; once a file has been seeded, there is no further necessary burden on the user who has shared the source file. There is also a logical separation between the act of hosting data fragments (as a peer) and searching for torrents (which is quite centralized). It is important to note that torrent search sites do not directly store any copyrighted data, and typically (but not always) disclaim any responsibility for copyright infringement1.

1 Note that trackers do not store any attributions of copyright either. 3 The scale of file sharing activity is significant: for every shared file, there may be hundreds and thousands of fragments. In extreme cases, it can be difficult (but not impossible [7]) to identify, track, monitor and notify individuals who are involved in sharing a single file, especially where anonymisation technologies or network address translation is used. The highly distributed nature of BitTorrent makes it very difficult to directly “measure” its attributes. However, some recent research has been undertaken to characterize different aspects of BitTorrent performance, including measures and estimates of popularity, availability, content lifetime and download performance, suggesting that BitTorrent outperforms its peers on the following metrics, as defined in [8]:  Popularity, defined as the total number of users active during a specific time window  Download performance, which is the ratio of the file size to the time taken to complete the download  Content injection time, which is the gap between the creation of (copyrighted) content and its P2P release  Pollution level, which is the proportion of content that is corrupt Each of these metrics has received limited study in the academic literature, although one study [9] looked at content injection times for cinematic-release films on P2P networks. Ironically, much literature, for example [10-12], focuses on the effect of “free-riding” in BitTorrent and P2P networks, i.e., users who download a lot but do not significantly contribute to uploading data for other users. While the work described in [8] and [13] has been useful in modelling characteristics of P2P file sharing (such as average download speeds) these are a function of both popularity and available bandwidth. Most research so far does not directly address the status of copyrighted material, even though the computations were made using copyrighted files. Research papers which do address copyright infringement (e.g., [14]) are often then not concerned with the practicalities of measuring the scale of sharing for specific (or all) copyrighted works. Other projects have focused on identifying whether specific countermeasures (such as distributing fakes) are effective, and conclude that they are probably not as effective as an intelligence-based approach [15, 16] In this paper, we introduce a methodology that attempts to measure the extent of sharing of copyright infringing material over BitTorrent. Specifically, we set out to answer the following research questions:  How many files are shared using BitTorrent, and what are the major categories of the files being shared?  At a given point in time, how much file sharing is actually occurring using BitTorrent?  For each shared file, how many times has it been shared in total?  Overall, what is the number and percentage of shared files which are infringing, both by number of files and total downloads? Obtaining an exact answer for any of these questions is impossible due to the scope and distributed nature of BitTorrent - there are thousands of BitTorrent trackers available, as well as other technologies such as Distributed Hash Tables and Peer Exchange, which prohibit a complete study being performed. However, our goal was to makes the most accurate and precise approximations possible by sampling the most popular trackers, and using a number of techniques to extract metadata from torrents, and then matching these to known descriptors. After describing the methodology, we present preliminary results, and use triangulation 4 to verify the relative rates of sharing of different categories of files being shared. In the discussion, we identify future areas for enhancement (especially in fake file detection), and reflect on the limitations of sampling methodologies and biases arising from undertaking large-scale analyses of this kind.

2 Methods

BitTorrent is still a predominately server-based system through the use of trackers, as described in Section 2. To answer the research questions posed in the Introduction, we have developed a methodology based on scraping trackers, and recording and interpreting the results of these scrapes. This provides an objective way of understanding BitTorrent usage through the trackers, rather than relying on using a sampling of torrents. The methodology works in five stages:

1. Trackers are representatively sampled to reduce biased results from any one tracker (Section 3.1) 2. The tracker sample is then scraped (Section 3.2) 3. Filenames are determined from the scrapes (Section 3.3) 4. Categorisation of the torrents is performed (Section 3.4) 5. The number of infringing files is determined (Section 3.5)

We describe the execution of each stage for this study below.

2.1 Tracker Sampling

To obtain the most representative results from a sample, it is important to extract data from the most popular trackers in use. To do this, we sampled the 10 most popular torrents on the website (http://www.torrentz.com/) on April 21st, 2010. All trackers listed for each of these files was then selected for sampling.

2.2 Scraping

Once the tracker sample was determined, they were each scraped for their peer information. This scrape was downloaded in a similar way to a normal HTTP download. If the download was interrupted, the scrape was not attempted again in that iteration. An interrupted download could still be useful, however, as it would contain valid scrape information up to the end of the downloaded portion. For example, if a scrape was interrupted after downloading 80% of the file, there would still be 80% of the scrape information available. When parsing the scrape data, the consistency of the file was not verified to ensure that information could be gathered from interrupted downloads. Rather, any valid data for each file was collected and saved into a database. The information collected included:

 The info hash  The number of times the file had been completely downloaded.  The number of seeders on the network at the given time.  The number of leechers on the network at the given time.

5 For some trackers, the download number varied, and may have indicated the true value. However, in many cases, the trackers we retrieved data from indicated that all files had been downloaded 10 times, even when the number of current seeders was in the thousands. This was clearly impossible, as – by definition - a seeder is someone who has completely downloaded a file. Thus, the downloaded number was excluded from our results. Instead, we used the `complete' count as our number of downloads, as a file must be fully downloaded by a user for that user to be listed as a seeder. In this paper, therefore, the term 'downloads' refers to the number of seeders a file has.

2.3 Filename Determination

As the procedure that calculates the info hash is a one-way function, we could not recreate the filename from the scrape data alone. However, by querying external data sources, it is possible to correlate the info hashes with file titles. One of the advantages we have here is that – like searching for internet pornography – users need to search for terms of interest, and search engines thus provide a convenient means to perform reverse lookups [17]. For example, searching for the hash value “9064267d4a83e096e6eb14593762bc18633eda0f” returns “Avatar 2009 720p BluRay x264” as the filename. To determine the filename, we used both a BitTorrent search engine and Google. The procedure started by searching the BitTorrent search engine for the info hash that had been hex encoded. If the BitTorrent search engine had the torrent that generated this info hash, it would return the torrent, including the names of the files contained in it. We then parsed the search results to extract only the filename, and stored the resulting filename in the database. If this procedure failed, we performed a Google search for the hex encoded info hash. If results were returned from Google, we ranked them in order of appearance. If the title of the search result (i.e., the title of the corresponding webpage found by Google) included the hex hash, it was ignored, as many websites repeat this value in their title, giving a null result. If the hex hash was not in the title, we used the title as our filename result. In some cases, the filename was “dirty”, as the title of the search result was likely to contain other information such as the name of the website linked to by Google. A full parsing of the returned results remains a significant problem for automatic parsing, and was considered out of scope for this methodology. To determine the accuracy of the filename determination procedure, the results were verified by performing a reverse lookup. To do this, we selected the top 50 seeded torrents with filenames, and a random sample of 50 torrents from the full set of named torrents, as our test set. For each of these 100 torrents, the original torrent file was searched for, using the given info hash. The torrent file was then downloaded, and the info hash re-calculated to verify that the torrent was correct. This sampling method was chosen to ensure that there were no biases between the top torrents, compared to a representative sample of the full set of named torrents.

2.4 Categorisation

After the filenames were determined, category determination was performed. Category determination was easier for some files than others. Most movies are of the form: 6

() in which fields can be separated by spaces, periods or other characters. This format changes a little bit as well between release groups and sometimes is a different format altogether. Another common pattern is:

SE which is used for TV shows to indicate which episode is available. An example of this would be:

The.Simpsons.S10E04 to indicate the fourth episode of the tenth season of The Simpsons TV show. To perform automatic categorisation, we use a simple rule based system. A list of patterns, in the form of regular expressions, was listed along with the category they corresponded to. The full list of all rules used is given in Appendix A. The rules are listed (in the author's view) from the most accurate to the least accurate. To categorise a rule, each rule in order was applied to the file. Once a rule was triggered, which happened when the filename contained the pattern given by the regular expression, the file was assigned the category from the rule, and the matching procedure would stop. To verify the results, the top 500 torrents by seeders and a random sample of 500 torrents was taken, and these categorisations were manually verified. Further to this, the percentage of torrents that were classified (i.e., the coverage) was calculated to give an overall value of the percentage of all torrents that were categorised correctly.

2.5 Infringement Determination

Once the torrents were assigned filenames, we then determined which files were infringing content. This determination was primarily based on the title of the file. There were two key limitations to the procedure: firstly, we took the filename at face value, and secondly, if there was any ambiguity in the filename, we erred on the side of caution, and guess that it is legal. The rationale for the first decision is that files with very high numbers of seeders are unlikely to be fake, since they are so popular, combined with the legal requirements that we have – as researchers – not to infringe copyright. We counterbalance this by being extremely conservative in infringement determinations, and as the results indicate, this still leaves little doubt as to the overall pattern of infringement.

3 Results

The results below are provided for our original experimentation, a follow- up study, and a method for validation.

7 3.1 Trackers and Scrape Collection

To create the list of trackers, the top 10 torrents on the website http://www.torrentz.com were downloaded, and the trackers that were listed for each of the torrents were collected. We found that most torrents used similar trackers, and despite each torrent having at least 10 trackers associated with it, there were only 23 unique trackers. In no specific order, the following trackers were used for this study:

 inferno..com:3395  tracker.packy.se:2710  tracker.mightynova.com  tracker.torrentbay.to:6969  p2p.lineage2.com.cn:6969  sombarato.org:6969  tracker.openbittorrent.com:80  kubanmedia.org:2710  bt1.the9.com:6969  bt.rghost.net  tracker.bitreactor.to:2710  tracker.mightynova.com:4315  free.btr.kz:8888  tracker.torrent.to:2710  tracker.irc.su:80  www.desidhamal.com:6883  tracker.prq.to  linuxoid.in:4443  tracker.ilibr.org:6969  tracker.ilibr.org:80  tracker.hkreporter.com:6999  idowns.org:6969  tracker.desi6.com:7979

Of these trackers, 19 scrapes were recovered. Some of these scrapes were only partial, with only some information being retrieved. The following trackers did not allow a full server scrape to occur:

 tracker.mightynova.com  sombarato.org:6969  tracker.bitreactor.to:2710  tracker.mightynova.com:4315

Trackers might disallow scrapes because of a lack of bandwidth, or to prevent exhaustive searching against the torrents that they are tracking. A smaller tracker may wish to minimise their bandwidth usage by disabling this feature. For this reason, we will no longer discuss these servers in this paper. Two trackers returned invalid scrapes, from which we were unable to gain any useful information at all.

8 3.2 Filename Determination

Scraping the trackers resulted in a total of 1,046,713 different sets of torrent information being retrieved from the sampled servers. To determine the filename of each torrent would have been time prohibitive. However, we hypothesised that the ranking of torrent popularity would follow a power law [18], i.e., relatively few torrents would account for the largest proportion of downloads. In engineering terms, this is often known as the “80-20 rule”. Power laws are becoming more widely acknowledged in computer science but have been well— known in biology for many years [19]. Based on this hypothesis, we examined the distribution of torrent popularity, by ranking all of the torrents by their available seeders, and found that we had underestimated the “80-20” split – in fact, just 4.0% of torrents (a total of 15,367) were responsible for 80% of seeders. Furthermore, just 9.9% of torrents (38,365), accounted for 90% of seeders. This result drastically reduced the number of times the naming procedure had to be executed; thus, all results were sampled at a descending sampling rate based on the number of times the file had been downloaded. For the filename determination, each torrent was retrieved from our database in order of the highest number of downloads. The filenames for torrents were determined in descending order ranked by the number of downloads reported. Out of 151,268 attempts to determine the filename - accounting for 99.36% of all seeders - 121,684 succeeded and resulted in a filename being assigned, i.e., there was an 80.4% success rate for filename identification. In addition, there were no failed filename determinations in the Top 50 most seeded torrents, with the first occurring at rank 68, and a total of 6 in the Top 100. In the Top 1,000, there were 119 failed filename determination attempts. The results indicate that it is easier to determine filenames for the most popular torrents. Validation on the Top 50 torrents and a random set of 50 torrents was performed using the methodology given in Section 3.3, which resulted in all torrents being correctly named, where a name was given.

3.3 Categorisation

The categorisation was performed using a set of manually derived rules. Categorisation was performed on the Top 15,367 torrents, thus accounting for 80% of all downloads. Of these torrents, 10,741 were categorised, giving a coverage of 69.9%. After applying the categorisation, the categories were manually verified for two samples - the Top 500 torrents, and a random sample of 500 Torrents. The classification accuracy achieved was 98.8%, with only 6 entries being mis-categorised. The percentages of files in each category are given in Table 1.

Table 1 – Number of Downloaded Items in Each Category

Category Number % Anime 9 0.1% Book 22 0.2% Child Porn 4 >0.0% Documentary 7 0.1% Game 477 4.4% Hentai 4 >0.0% 9 Movie 4651 43.3% Music 1775 16.5% Pictures 18 0.2% Porn 400 3.7% Software 252 2.3% TV Shows 3122 29.1% Total 10741 100%

The incorrect entries, along with the rule that caused their categorisations, are given in Table 2. Of the six errors listed, 4 were caused by the rule “XVid” which normally indicates that the torrent is a movie with high accuracy. The AVS Video Editor is a commercial program which needs a `crack' to run without paying, and this procedure is normally used for pirated video games. The High Stakes Poker file indicates that the Season/Episode rule should be above the XVid rule, as this file was marked as a movie, despite having the season episode information in the filename. Finally, the last line is for the movie “Zack and Miri Make a Porno”, which is classified as porn because it has the word ‘Porn’ in its title.

Table 2 – Examples of Categorisation Errors

Filename Given Actual Rule Alicia Keys - As I Am Movie Music (Video) XVid [2007][CD+SkidVid_XviD+Cov] 192Kbps katharine mcphee-had it all Movie Music (Video) XVid -dvdrip-x264-2009- mv4u-(0001).mkv Lily Allen- Movie Music (Video) XVid It’s Not Me It’s You [2009][CD+SkidVid_XviD+Cov] AVS Video Editor v4.2.1.166 Game Software Crack + Crack SETUP [ResourceRG Apps by Lop High.Stakes.Poker.s06e03 Movie TV Show XVid .PDTV.XviD-TH.avi Zack And Miri Make A Porno . Porn Movie Porn DVDrip(CanusRG-pill)

Investigating the 30.1% of torrents that were uncategorised would require the development of a more sophisticated automatic categorisation system, since the targets are simple filenames which need more context-aware matching. For example, the torrent file “Blackadder complete” was not automatically categorised. Without being aware that Blackadder is a TV series, and that “complete” contextually refers to the torrent containing all released episodes of the show, it would not be possible to create a pattern for this type of file. Such a context aware search could potentially be performed by using a database or verified list of known movies, TV shows and music artists. For the uncategorised files, a sample of 100 files was manually classified.

10 Of those files 55 were movies, 26 were music, 10 were games, 5 were software, 1 was TV and 3 were unknown. This is a slightly different distribution from the categorised filenames, possibly indicating that there are categories which are more easy to create rules for than others. For example, the rule:

$s\setminus d+\setminus W*e\setminus d+$ is very good at classifying individual episodes of TV shows since it is a generally accepted convention. “S03E06” would be used to indicate the sixth episode of the third season of a TV show, for instance. This regularity is one reason for the low rate of unrecognised TV show torrents compared to movies and other files, such as software, where there are few or no universal conventions. Often, these torrents just have the filename and sometimes the release year. Without context, it is difficult to reliably extract entities that relate to a specific instance; for example, “Indiana Jones” could refer to the film, video game, audio book or an e-book.

3.4 Infringement Determination

To determine the relative proportions of infringing and non-infringing files from the torrents listed, a sample of 1,000 random torrents that were assigned filenames was selected (i.e., for the most seeded files). These filenames were manually checked to determine if they were infringing or legally allowed to be distributed. Our key finding is that - of the 1,000 torrents in the sample – we could only confirm 3 as being non-infringing (0.3%). We were unable to establish whether a further 16 were infringing or not (0.16%), and there were 91 porn torrents (0.91%). We did not attempt to verify the infringing status of the porn torrents, as there is a high level of ambiguity over the terms that we would generally use to determine infringements. For example, many porn torrents are described as “amateur” (and potentially non-infringing) when in fact they are professionally produced and therefore infringing. Excluding the legal, unknown and porn torrents, there were still 890 copyright infringing torrents in the sample, 89% of the total in the sample.

3.5 Extent of File Sharing

For the 17 BitTorrent trackers we sampled, that returned usable scrape results, we found that at least a million different torrents were being shared. This is the same order of magnitude reported by popular search engine sites like Isohunt.com2. This number is expected to increase at a lower rate with more trackers included. It would be impossible to determine an overall population value, as there are a large number of BitTorrent trackers and some are private. But, by triangulating our estimates with those reported by torrent search engines, our results are in the right ballpark; indeed, they appear to be conservative. For each shared file, we also investigated how many times it had been shared in total. This is an important question, given the power law relationship hypothesised earlier. As part of our study, we scraped information for more than one million torrents. The Top 100 most seeded torrents are listed in Appendix A. and it is

2 As at 15/09/2010, Isohunt.com reports 5,676,995 active torrents, comprising 134M files. 11 clear from these results that the overwhelming majority of the most popular torrents are infringing. This is not to say that the least popular torrents are also infringing; indeed, it is these files which are often stated to be the most widely shared [5] but the opposite appears to be true from our data. There was only one legal torrent in the Top 100 listed in Appendix A, an open source program VLC player which uses BitTorrent as its distribution method. Information on more than one million torrents was collected during our initial study. Just 4.0% of torrents (a total of 15,367) accounted for 80% of all downloads, and only 9.9% of torrents (just 38,365) were responsible for 90% of downloads. This result confirms our hypothesis of a power law distribution in torrent downloading, and suggests that the downloading rate (in relation to popularity) is much worse than the “80-20 rule” would suggest. We were able to assign names to more than 120,000 of the top 150,000 most downloaded torrents, accounting for 99.36% of all seeders. This means that our headline 89.9% infringement figure is applicable to both the overall percentage of infringing files and total seeders. By examining the titles in Appendix A, it is interesting to speculate about why some files are downloaded more than others, at any point in time. To some extent, popularity in downloading appears to be related to popularity at the box office: Avatar was very frequently downloaded, and it was also popular at the cinema, opening with US$232m [20]. However, you can also observe cases where movies were less successful in the cinema but also popular for downloading. For example, the Incredible Hulk opened with more than US$54m [21], but was almost as popular as Avatar. Is there a link between accessibility and popularity? Or does the ease with which users can download infringing content make popularity a less relevant factor? Or are some torrents actually for fake files, given the high seed count and out-of-date nature of the material? Further research is required to better understand the decision making processes that users make when they are searching for and downloading infringing content, and also to accurately detect torrents for fake files.

3.6 Replication

To determine the reproducibility of the initial experiment, we replicated the study 3 months after the original data collection. As expected, the results varied in absolute terms (e.g., total number of downloads), but remained proportionally consistent (e.g., a similar proportion of files were movies). The results from the replication study are described below. We used the same initial list of trackers from the first study, however, not all of the same trackers returned usable scrapes. The 14 trackers that returned usable scrapes this time were:

 bt.rghost.net  bt1.the9.com:6969  free.btr.kz:8888  idowns.org:6969  p2p.lineage2.com.cn:6969  tracker.bitreactor.to:2710  tracker.ilibr.org:6969  tracker.ilibr.org:80 12  tracker.irc.su:80  tracker..com:80  tracker.packy.se:2710  tracker.prq.to  tracker.torrent.to:2710  tracker.torrentbay.to:6969

From this outcome, we observe that the “half life” of trackers may be relatively short – after only 3 months, 9 were no longer usable. There is also some measurement error to be expected – some trackers may be still functioning, but shaping their responses when traffic is slow, and disconnecting at other times. Note that the tracker from the previous study which gave the highest results in the firs study (desi6) did not provide a usable scrape in the replication study. This resulted in overall lower seeder numbers than recorded in the original study. We also pruned one tracker (openbittorrent.com) whose download numbers were unusually high in the original study. The results of the replication study indicate that our data are very reliant on the trackers used; some will be more popular in music circles, some more popular for TV shows and movies, and some will have a very short lifespan. Further longitudinal observation and analysis will be required to establish long-term patterns of activity. From the new sample, 98,978 files were given a filename, out of 2,945,000 torrent files found, and 161,083 filename guessing attempts. 61% coverage of filenames was achieved. The sampling method used was random this time (random torrents were chosen to be named), as opposed to using the most downloaded files in the original study. Despite this, the overall ranking and relative proportions of material in different categories remained consistent, as shown in Table 3.

Table 3 – Number of Downloaded Items in Each Category

Category Number % Anime 337 0.26 Book 5,107 4.00 Child Porn 152 0.12 Documentary 172 0.13 Game 10,489 8.22 Hentai 230 0.18 Movie 50,610 39.66 Music 21,693 17.00 Pictures 1,101 0.86 Porn 16,082 12.6 Software 2,474 1.94 TV 19,153 29.1 TOTAL 127,600 100%

In the new sample, the sum of the minimum number of seeds for each file was 1,141,360, and 124,147 files had at least one seed. This represents the minimum number of seeders per file for the new sample. The overall maximum number of seeders currently online was 6,489,016. This is less than the original

13 study in absolute terms, but given that we have fewer trackers (including the one accounting for the most downloads) this is not surprising. In terms of infringement, in the most downloaded list, there were 2 non- infringing files and 1 unknown. The non-infringing files were Windows 7 loaders which - while they are intended to support illegal activity - are not themselves generally infringing. The file with unknown status was “Amateur mix-LKRG”, which was a 1.85G archive of short porn clips, where some of the clips appeared to be infringing but others may not have been, eg, several files included “homemade” or “homeclips” in their filename. Again, this illustrates some of the difficulty in automatically categorizing porn files as being infringing or not. In summary, the results of the replication study support the conclusions of the original study; importantly, we struggled to find any material which was not infringing. Even giving the benefit of the doubt to some terms, we found that 97% of downloads in our sample were for infringing content.

3.7 Validation

In order to validate the results from our first study, we decided to use .com's own "zeitgeist" function which is a list of the top search phrases on that site. In isohunt.com’s own words, the list “should be representative of what’s popular in the BitTorrent and IRC scenes, if not the P2P world in general”. The goal here was to establish intent; what were people searching for, and was it likely to be infringing content? Appendix C contains a list of the Top 100 search terms, and the manual categorizations assigned to each case3. The category results are shown below:

 Movies 53%  TV 23%  Software 12%  Music 7%  Porn 4%  Child Porn 1%

Where a generic term was specified (eg, "pinoy" for Filipino movies), we performed a manual search and checked whether the resulting Top 10 titles were likely to be infringing. We couldn't identify any content which was not infringing or illegal using this technique. The results indicate that while there are some changes to the relative percentages of material being searched for in each category, the overall ranks of each category generally remained consistent. Discrepancies between terms being searched for and actual results simply indicate that sometimes people don’t find what they are searching for. Hopefully, this is the case for child porn, since the term "pthc" (acronym for “pre-teen hard core”) ranked ahead of "harry potter" in terms of popularity (#86 versus #88); assuming the latter is very popular, this is a disturbing result.

3 As at 05/08/2010. 14 4 Discussion

The goal of this paper was to present a new methodology for measuring how much infringing content there is on BitTorrent networks. We have also presented the results of an initial and follow-up study – with broader and narrower sampling respectively –indicating that the overwhelming majority of the most popular content on BitTorrent is infringing. As hypothesized, we found that there was a power law relationship between the number of downloads and popularity, but that the result was worse than expected, since just 4.0% of torrents (a total of 15,367) accounted for 80% of all downloads from a sample of greater than one million. In addition, for the 1,000 most popular, we were only able to identify three files which were not infringing content. Our replication study – which excluded trackers reporting high download rates – the relative rankings between the different categories of content remained largely the same. Furthermore, we validated the study by comparing what users are searching for (to establish their intent) and what they are actually downloading, and once again, we found the same pattern of use, i.e., users are searching for infringing content, with a view to downloading it, and generally they are finding what the search for. There are a number of limitations in this study, and it is important to recognise them when interpreting the results. Firstly, any study which relies on sampling has the potential for a number of different types of bias to influence the results [22]. We sampled from a list of the most popular public trackers for the most popular searches. This did not include private trackers, and given our hypothesis of a power law, did not provide coverage of the least popular public trackers and the least popular torrents. It is likely, though, that the files being shared on those trackers would be the type of content outlined in [5]. This reflects the fact that BitTorrent is a great technology that can be used to distribute material efficiently and effectively whether it is popular or not. Possibly the greatest limitation for the study is that we did not ourselves download any infringing content, as this would be illegal (both in a civil and criminal sense, since some of the material was child porn). Instead, we have relied on the observation that if a file is labelled and advertised with a certain title, and many independent users have downloaded that title and are also seeding it, then it is highly likely that the file is what it claims to be. This follows from the law of large numbers [23]; indeed, we would be more concerned about interpreting the data if fewer users were involved. However, we are currently developing other methods to validate the results by performing text mining of the reviews provided by independent users on various torrent sites, and verifying the reviewers by grouping them by reputation. In this way, we should be able to determine if a file is genuine by “taking the word” of only the most experienced and highly-regarded torrent reviewers. From a technical perspective, the most pressing limitation to this research is that large BitTorrent sites - such as The Pirate Bay – are moving away from the public tracker based model. Newer methods – such as Distributed Hash Tables (DHT) and Peer Exchange (PEX) - distribute the knowledge about peers between the peers themselves. These technologies can reduce and possibly eliminate the need for trackers, thus removing the main source of data used in this research. The development and implementation of these technologies appears to be a response to the various lawsuits that have been brought about against the operators of BitTorrent search sites and trackers. Without a centralised place to monitor

15 BitTorrent sharing, it will become more difficult to examine the extent of infringing, and in some cases criminal, content sharing in the future. Apart from addressing these limitations, we are focused on improving the automatic labelling of files, both by category and by their legality. The video sharing website YouTube, for example, uses a content management system called ContentID [24] to determine if a new video uploaded is a recognised infringing copy of a copyrighted file. A system like this could be extended toward the automatic categorisation of downloaded files, but suffers from the problem that the files must be downloaded in order to verify them. Apart from the legal issues involved, this would require a significant amount of bandwidth to download the millions of files being shared. Possibly, a sampling methodology could be used to download only a small portion of the file to determine if it is infringing, if the files contained embedded traitor tracing codes that were robust against attacks to remove them [25].

Acknowledgements

The ICSL is funded by the State Government of Victoria, IBM, Westpac Banking Corporation and the Australian Federal Police. This project received financial support from Village Roadshow.

References

[1] Cohen, B (2003) Incentives build robustness in BitTorrent. Proceedings of Workshop on Economics of Peer-to-Peer Systems. [2] Schulze, H & Mochalski, K (2007) Internet Study 2007. ipoque. [3] Blizzard Entertainment (2010) Blizzard Downloader F.A.Q. Retrieved from http://www.worldofwarcraft.com/info/faq/blizzarddownloader.html [4] Stallman, R (2001) Copyright and globalization in the age of computer networks. Retrieved from http://www.gnu.org/philosophy/copyright-and-globalization.html [5] Gardner, S. (2005). BitTorrent for Dummies. New York: Wiley. [6] Tran, H., Hitchens, M., Watters, P.A., & Varadharajan, V. (2005). Trust based access control framework for P2P file-sharing systems. In Proceedings of the Proceedings of the 38th Annual Hawaii International Conference on System Sciences (HICSS'05). [7] Mee, J. & Watters, P.A. (2005). Detecting and tracing copyright infringements in P2P networks. In Proceedings of the International Conference on Networking (ICN 2006). [8] Pouwelse, J. A., Garbacki, P., Epema, D. H. J., & Sips, H. J. (2004). A Measurement Study of the BitTorrent Peer-to-Peer File-Sharing System. Technical Report PDS-2004-003, Delft University of Technology, The Netherlands. [9] Byers, S. , Cranor, L. , Cronin, E., Kormann, D., & McDaniel, P. (2003). Analysis of Security Vulnerabilities in the Movie Production and Distribution Process. Proceedings of The 2003 ACM Workshop on DRM, Oct. 2003. [10] Bharambe, A. R., Herley, C., & Padmanabhan, V. N. (2005). Analyzing and Improving BitTorrent Performance. Technical Report MSR-TR-2005-03, Microsoft Research, Redmond, WA, February 2005. 16 [11] Jun, S. & Ahamad, M. (2005). Incentives in BitTorrent Induce Free Riding. Proceedings of the ACM SIGCOMM Workshop on Economics of Peer-to-Peer Systems (P2PECON), ACM Press, Aug. 2005. [12] Feldman, M., Papadimitriou, C., Chuang, J., & Stoica, I. (2004). Free-Riding and Whitewashing in Peer-to-Peer Systems. Proceedings of the ACM SIGCOMM’04 Workshop on Practice and Theory of Incentives in Networked Systems (PINS), August 2004. [13] Pouwelse, J. A., Garbacki, P., Epema, D. H. J., & Sips, H. J. The BitTorrent P2P file-sharing system: measurements and analysis. Proceedings of Peer-to-Peer Systems IV. [14] Lemley, M. (2004). Reducing digital copyright infringement without restricting innovation. Boalt Working Papers in Public Law. [15] Banerjee, A. , Faloutsos, M., & Bhuyan, L. (2007). Is someone tracking P2P users? In Proc of IFIP NETWORKING,Atlanta, GA. [16] Liang, J., Naoumov, N. , & Ross, K. (2006). The index poisoning attack in P2P file-sharing systems. In Proc. of IEEE Infocom, Barcelona. [17] Wai Han Ho, Paul Andrew Watters: Statistical and structural approaches to filtering Internet pornography. SMC (5) 2004: 4792-4798 [18] Mitzenmacher, M. (2003). A brief history of generative models for power law and lognormal distributions. Internet Mathematics 1: 226–251. [19] Watters, P. (1998) Fractal structure in the electroencephalogram. Complexity International, 5. [20] Mashable (2009). Avatar opens with $232 million worldwide. Reterieved from http://mashable.com/2009/12/21/avatar-earnings/ [21] Multipleverses (2008). Hulk smashes box office this weekend, Retrieved from http://multipleverses.com/tag/incredible-hulk-box-office-earnings/ [22] S. Boslaugh and P. Watters (2008). Statistics in a Nutshell. O’Reilly. [23] Mlodinow, L. The Drunkard's Walk. New York: Random House, 2008. p. 50. [24] Youtube (2010). YouTube Content ID system. Retrieved from http://www.youtube.com/t/content_management [25] Wu X, Watters P & Yearwood J (2008) New traceability codes and algorithms for tracing pirates. Proceedings of the IEEE International Symposium on Parallel and Distributed Processing with Applications, 719-724. [26] Testimony from Malone in AFACT vs iinet 2010. [27] Anderson, N. (2009) Pirate Bay: survey says that 80% of our torrents are legal. Retrieved from http://arstechnica.com/tech-policy/news/2009/02/pirate-bay-survey-says-that-80-of-our- torrents-are-legal.ars

17 Appendix A – Top 100 Downloads (Original Study)

Filename Downloads The Incredible Hulk[2008]DvDrip- aXXo97065494792.4447 1112628 Indiana Jones And The Kingdom Of The Crystal Skull[2008]-aXXo 1029695

College[2008]DvDrip-aXXo339166021846.017 509576

Sherlock Holmes (2009) DVDSCR XviD-MAX 479655 Avatar (2009) PROPER TS XviD- MAX889790305026.795 332665 Meet Dave[2008]DvDrip-aXXo 311894 Lady GaGa -The Fame Monster 2CDRip 2009 [Cov+2CD][Bubanee] 308117

The Andromeda Strain[2008]DvDrip-aXXo 284221 Shutter Island (2010) R5 DVDRip XviD- MAX851029283088.936 282628 2012 (2009) R5 DVDRip XviD- MAX883775626338.402 277043

Nirvana -Discography9843381055.49025 263315 The Men Who Stare at Goats (2009) R5 DVDRip XviD-MAXSPEED 254286 LimeWire PRO 4.18.8.1 238072 From Paris with Love (2010) R5 DVDRip XviD- MAX489229147326.545 234916 The Book Of Eli 2010 TELESYNC H264 AAC- SecretMyth (Kingdom-Relea 217049 Legion (2010) R5 DVDRip XviD- MAX559343324207.015 212854 Queen-Discography525461962058.291 212619

Zombieland (2009) R5 DVDRip XviD-MAX 211317 Next Avengers-Heroes Of Tomorrow[2008]DvDrip- aXXo 199894 Ninja Assassin (2009) DVDRip XviD- MAX375149626046.111 195660 District 9 (2009) DVDRip XviD-MAX 193568 Alicia Keys -The Element Of Freedom (Deluxe) CDRip 2009 189797 Daybreakers (2009) DVDSCR XviD- MAX297214896957.329 185235 Law Abiding Citizen (2009) DVDRip XviD- MAX575548696431.369 179429

Gorillaz -Plastic Beach [2010-MP3-Cov][Bubanee] 174760

The.Lovely.Bones.2009.DVDSCR.XviD-Lynks-PrisM 171729 The.Princess.And.The.Frog.DVDRSCREENER.XviD- MENTiON.avi 171638

18 The Blind Side.2009.DvdScr.Xvid -Noir 162618 The.Twilight.Saga.New.Moon.2009.DVDRip.XviD- NeDiVx761950398497.179 162040 Boy A[2007]DvDrip 160925 Michael Jackson -Black Or White (1991)[DVDRip- SyNtEr][Subs][ 155427

Michael Jackson This Is It (2009) DVDRip XviD-MAX 153963

The.Book.of.Eli.2010.TS.XviD-IMAGiNE.avi 152952 The.Pacific.Pt.I.HDTV.XviD-SYS.avi 151327 Ahead Nero v7 5 9 0 Multilingual Incl Keymaker- EMBRACE 143525

The Hurt Locker DVD eng 2008 xivid [switch] 143077

Surrogates (2009) R5 DVDRip XviD-MAX 141448 House MD Season 3 136795 Michael Jackson -Thriller [DVD-Rip][Subs][AVI][POP- USA][AC3 134944 Bigfish Games -Mystery Case Files -Ravenhearst + Crack 134748

Metallica -Discography -Mega Collection 134621 Michael Jackson -Remember The Time [1992][DVD- Rip-SyNtEr][AC3 133459 Old.Dogs.DVDRip.XviD-DiAMOND.avi 133150 Percy Jackson and the Olympians (2010) R5 DVDRip XviD 132043 House MD Season 2644074056414.786 130852 WALT DISNEYS [ALICE IN WONDERLAND][DVDRIP][ENG]-kidzcorner 27 129513 House MD Season 1470432445622.182 127860

The.Children.Of.Huang.Shi[2008]DvDrip-aXXo 127281

Pussycat Dolls -Doll Domination (Deluxe Edition)(2008)and bonus disc s-srg mrsidhq 126380 Weeds -Season 4 HDTV 125411

Planet 51 (2009) DVDRip XviD-MAXSPEED 123636 Michael Jackson -Smooth Criminal [1988][DVD-Rip- SyNtEr][Subs] 123449 Michael Jackson -Bad [1987][DVD-Rip- SyNtEr][Subs][AVI][POP 122108 Sherlock Holmes DVDSCR AC3 - IMAGiNE[ExtraTorrent] 121383

Michael Jackson -Billie Jean [SyNtEr][AC3][DVDRip] 118125 Eminem Presents -The Re-Up 117936 Up In The Air (2009) DVDRip XviD- MAX136331498949.959 117902 Michael Jackson -Earth Song (1995)[DVDRip- SyNtEr][Subs][AC3] 116190 19 Michael Jackson -Scream (1995)[DVDRip- SyNtEr][Subs][AC3] 113245 Worms Armageddon -FULL ISO 113105

VLC Media Player 0 9 2 NEW RELEASE!!! (September 15th 2008) Legal 112743 Windows Traktor DJ Studio v3 112677 Weezer album discography 112281 The Red Hot Chilli Peppers - Discography782914009744.44 111730

Michael Jackson -Heal The World [SyNtEr] 111628 Couples Retreat.2009.DvdRip.Xvid (1337x)-Noir no rar 111420 110686 Michael Jackson -The Way You Make Me Feel [1987][Subs][AVI][ 109817 The Twilight Saga New Moon 2009 HORROR TS- Scr DivX nEHAL 109446 109018 The Boondock Saints II All Saints Day (2009) DVDRip XviD-MAX 108778

Sim City 4 Deluxe Edition [ISO] -By Bobjba 107129 Funny People (2009) DVDRip XviD-MAX 104449 102440

Rihanna -Rated R CDRip 2009 [Cov+CD][Bubanee] 102304

The Hurt Locker (2008) DVDRip XviD-MAX 100945 Music Johnny Cash -Live at Montreux -1994 -TV recording 99451 Tooth.Fairy.R5.LINE.XviD-MENTiON.avi 99052 The.Fourth.Kind.2009.DVDSCR.XviD- SilentNinja270618989809.115 98634 How.I.Met.Your.Mother.S05E12.HDTV.XviD- NoTV.avi 96122 Elvis Presley Discography by Nogueira neto By Mega Seeders JP 95932 Avatar 2009 DVDScr H264 AAC-SecretMyth (Kingdom-Release) 94781 GLADIATOR[2000]DvDrip-GHZ 94568 Inglourious Basterds (2009) DVDRip XviD- MAX959480928538.983 93268 Air58437913118.6827 92039 Pink Floyd -Dark Side of the Moon -Live at Earls Court 90494 BobDylan DVD Compilation 7Letterman Grammyand more -Demonoid com 90082 Transformers 2 Revenge Of The Fallen DVDRip XviD-MAX 89935 Young Jeezy -The Inspiration 89020

No.Direction.Home.Bob.Dylan.2of2.XviD.AC3 88061

No Direction Home Bob Dylan 1of2 XviD AC3 87972 20 85899 85803 Requiem For A Dream [DVD-Rip ENG] 85186

Breaking Benjamin -Discography -2007 !!! 83482 Ultimate Avengers[Double Feature][2006]DvDrip- aXXo 83275 83230 Avatar TS XviD-IMAGiNE(No Rars) 28 82977 WALT DISNEYS PINNOCHIO[DVDRIP][ENG]- kidzcorner 82959 The.Fantastic.Mr.Fox.DVDSCR.XviD- DONEDEAL.avi 82881

21 Appendix B – Top 100 Downloads (Replication Study)

Filename Downloads Category Kick-Ass (2010) R5 XViD-MAXSPEED 323353 Movie The Wolfman (2010) DVDRip XviD-MAXSPEED 234913 Movie Wanted[2008]DvDrip-aXXo 50582 Movie Hancock[2008]DvDrip-aXXo 38566 Movie Step.Brothers[2008][Unrated.Edition]DvDrip-aXXo 33541 Movie Lost.S06E17.HDTV.XviD-NoTV 22507 TV Juno[2007]DvDrip[Eng]-aXXo 21860 Movie Gladiator[Extended.Edition]DvDrip.AC3[Eng]-aXXo 21245 Movie Stargate.Universe.S01E17.Pain.HDTV.XviD-FQM.avi 17277 TV Supernatural.S05E21.Two.Minutes.to.Midnight.HDTV.XviD- FQM.avi 16924 TV Stargate.Universe.S01E18.Subversion.REPACK.HDTV.Xvi D-FQM.avi 15789 TV FlashForward.S01E17.HDTV.XviD-2HD.avi 15219 TV Glee.S01E22.HDTV.XviD-LOL.[VTV].avi 13757 TV DISNEY PIXARS TOY STORY DVDRIP][ENG]- KIDZCORNER&J.T.R 13514 Movie [TEAM XMR] Hum Tum Aur Ghost 2010 ~ Pdvd ~ 1CD Rip ~ XVID ~ Mp3 ~ -=[BaDBoY$]=-.avi 13441 Movie Shes Out Of My League {2010} DVDRIP. Jaybob 12893 Movie www.torrent.to...Bruchreif.German.2009.DVDRip.XviD- ViDEOWELT 11386 Movie Children.Of.Men[2006]DvDrip[Eng]-aXXo 10736 Movie Teenage.Mutant.Ninja.Turtles.Pack[1990-2007]DvDrip- aXXo 10416 Movie Grindhouse-Death.Proof[2007][Unrated.Editon]DvDrip[Eng]- aXXo 10092 Movie Brooklyns Finest {2009} DVDRIP. Jaybob 9689 Movie Superman.Returns[2006]DvDrip[Eng]-aXXo 9332 Movie Desperate Housewives Season 5 Ep 01-09 HDTV-soagg 8934 TV V.2009.S01E12.HDTV.XviD-2HD.avi 8652 TV Adobe Audition 3.0+Crack [GR420] 7809 Game Transporter.2[2005]DvDrip[Eng]-aXXo.avi 7522 Movie Quantum.Apocalypse.2010.DVDRiP.XviD-DvF 7373 Movie War[2007]DvDrip[Eng]-aXXo 7339 Movie LOST SEASON 6 Episode 16 by deathmule 7206 TV Charlie.Wilson's.War[2007]DvDrip-aXXo 6859 Movie The.Vampire.Diaries.S01E12.HDTV.XviD-2HD 6760 TV Chloe {2009} DVDRIP. Jaybob 6094 Movie Mr.Brooks[2007]DvDrip[Eng]-aXXo 6020 Movie 30.Rock.S04E17.HDTV.XviD-LOL.avi 5862 TV The.Rock[1996]DvDRip-Rdgrnnr 5814 Movie Babel[2006]DvDrip[Eng]-aXXo 5767 Movie Big Ass Brothel (2010) [WMV][HD 1080p][WwW.xXxViCiOsAsZT.CoM] 5727 Porn www.torrent.to...ExTerminators.German.2009.DVDRip.XviD -ViDEOWELT 5675 Movie Kendra Exposed - The Kendra Wilkinson Sex Tape.flv 5532 Porn www.torrent.to...Dr.House.S06E06.Kopfgeburten.German.D ubbed.WS.HDTVRip.XviD-iNSPiRED.rar 5527 TV Desperate.Housewives.S06E21.HDTV.XviD-2HD.avi 5448 TV 22 Ministry Of Sound - Chilled Acoustic (2010) (MP3 320)(split tracks+cover)barney's rg 5258 Music ELVIS PRESLEY - 50 Greatest Hits 5228 Music The.Good.Shepherd[2006]DvDrip[Eng]-aXXo Torrent Download 5113 Movie Iron Man 2 - 2010 - iTALiAN MD TS XviD-FREE [IN] 5007 Movie Spider-Man.3[2007]DvDrip[Eng]-aXXo 4890 Movie Supernatural.S05E22.Swan.Song.HDTV.XviD-FQM.avi 4736 TV Private.Practice.S03E23.HDTV.XviD-2HD.[VTV].avi 4721 TV Paranormal.Activity.2007.iTALiAN.MD.Alternate.Cut.DVDRi p.XviD-THEMA.avi 4661 Movie Microsoft Office 2007 Torrent Download 4637 Software Bones.S05E14.HDTV.XviD-NoTV 4632 TV Pro.Evolution.Soccer.6.CRACK.ONLY-RELOADED 4465 Game Private.Practice.S03E20.Second.Choices.HDTV.XviD- FQM.[VTV].avi 4294 TV American.Dad.S05E15.HDTV.XviD-LOL.avi 4141 TV The Phantom of the Opera(2004)DvDrip-[Eng]- CycoPenQuin 4135 Movie Iron Man 2 (DVDRip] 2010 [ENGL)-FANTASTiC Torrent Download 4104 Movie www.torrent.to...Bathory.Die.Blutgraefin.German.2008.AC3. DVDRip.XviD-iMPERiUM 4077 Movie Hunger.2008.LiMiTED.DVDRiP.XViD-HLS 4064 Movie Windows_XP_Professional_SP3_GENUINE Torrent Download 4060 Software Hot.Shots.1991.XviD-BGR 4035 Movie Next.Avengers-Heroes.Of.Tomorrow[2008]DvDrip-aXXo 3976 Movie Penthouse Magazine Super Summer Sex - May 2010 3971 Porn Big.Love.S04E02.HDTV.XviD-2HD 3955 TV LOST SEASON 6 Episode 1 to 2 by deathmule 3920 TV The.Rocker[2008]DvDrip-aXXo 3875 Movie American.Dad.S05E16.HDTV.XviD-LOL.avi 3815 TV Lost.S06E14.The.Candidate.HDTV.XviD-FQM 3762 TV Alvin And The Chipmunks The Squeakquel {2009} DVDRIP. Jaybob 3722 Movie The Heartbreak Kid[2007]DvDrip[Eng]-FXG 3671 Movie Smallville.S09E20.HDTV.XviD-P0W4.avi 3644 TV Junior College Lesbians 2 XXX [DVDRip][Teen-Over-18- Lesbian].www.lokotorrents.com 3553 Porn www.torrent.to...FlashForward.S01E19.Kurskorrektur.GER MAN.DUBBED.WS.DVDRiP.XviD-SOF 3505 Movie Spartacus.Blood.and.Sand.S01E06.HDTV.XviD-SYS.avi 3488 TV The.Invisible[2007]DvDrip[Eng]-aXXo 3387 Movie 24.S08E08.HDTV.XviD-2HD.avi 3383 TV Desperate Housewives Season 6 3381 TV Kick-Ass DVDRip 3331 Movie PC Tom.Clancy's.Splinter.Cell.Conviction.Full-Rip.-TPTB 3329 Game Fearless[2006][Unrated.Edition]DvDrip[Eng.Dubbed]-aXXo 3286 Movie American.Dad.S05E14.HDTV.XviD-LOL.avi 3259 TV Amateur Mix-LKRG 3238 Porn 007 James Bond GoldFinger 1964 DvDrip-soagg 3233 Movie Windows 7 Loader v1.8-DAZ~DiBYA Torrent Download 3188 Software Queen - Greatest Hits 3CD 3187 Music Castle.2009.S02E20.HDTV.XviD-2HD.avi 3119 TV The.Interpreter[2005]DvDrip[Eng]-aXXo.avi 3052 Movie Clash of the Titans (2010) DVDRip XviD-MAXSPEED 3040 Movie My Sassy Girl[2008]DvDrip AC3[Eng]-FXG 2974 Movie 23 www.torrent.to...Lie.to.me.S01E11.Undercover.GERMAN.D UBBED.WS.BLURAYRiP.XviD-SOF.rar 2941 Movie The Last Airbender 2010 Encoded XviD CAM SAFCuk009+Fabreezy 2897 Movie My Big Father.2009.PDVDRip.XviD.AC3.5.1 2854 Movie 30 Rock Season 4 2850 TV The Karate Kid DVDRiP R6 XViD-KiNGDOM (Kingdom- Release) 2843 Movie Valentines Day 2010[DvdRip] Xvid [Eng] johno70 2817 Movie The Haunting in Connecticut[2009][Unrated Edition]DvDrip[Eng]-FXG 2747 Movie Headhunterz - Studio Sessions 1CD-2010 2735 Music Windows 7 Activator [spec] 2733 Software Halo.Legends.2010.DVDRip.XviD- ViSiON.[www.FilmsBT.com] Torrent ... 2731 Movie X-Men.The.Last.Stand[2006]DvDrip[Eng]-aXXo 2710 Movie Usta Usta S01E08 PL.Up.by.Gizior89 2696 TV

24 Appendix C – Results of Validation Study

Rank Search Term Category 1 jaybob movie 2 inception movie 3 true blood tv 4 axxo movie 5 iron man 2 movie 6 salt movie 7 predators movie 8 entourage movie 9 toy story 3 movie 10 starcraft 2 software 11 fxg movie 12 robin hood 2010 movie 13 prince of persia movie 14 knight and day movie 15 the sorcerer's apprentice movie 16 eclipse movie 17 the last airbender 2010 movie 18 clash of the titans movie 19 the karate kid 2010 movie 20 top gear tv 21 lie to me tv 22 twilight movie 23 despicable me movie 24 the expendables movie 25 windows 7 software 26 kick ass movie 27 porn 2010 porn 28 lost tv 29 house tv 30 heroes tv 31 get him to the greek movie 32 sex and the city 2 movie 33 hindi movie 34 lady gaga music 35 psp software 36 futurama tv 37 avatar movie 38 mad men tv how to train your 39 dragon movie 40 shutter island movie 41 wii software 42 grown ups movie 43 dexter tv 44 prison break tv 45 letters to juliet movie 25 46 pc games software 47 gay porn 48 wwe raw tv 49 death at a funeral movie 50 eminem music 51 pinoy movie 52 bleach 282 tv 53 shrek movie 54 arcade fire tv 55 the a team movie 56 ufc tv 57 wingtip movie 58 pretty little liars tv 59 dvdrip movie 60 dinner for schmucks movie 61 killers movie 62 french movie 63 glee tv 64 how i met your mother tv 65 the beatles music 66 windows xp software 67 1080p movie 68 nero software 69 movies movie 70 the hangover movie 71 splice movie 72 office 2010 software 73 eureka tv 74 sherlock movie 75 bluray movie 76 tagalog movie once upon a time in 77 mumbai movie 78 the bachelorette tv 79 photoshop software 80 katy perry music 81 knight & day movie 82 white collar tv 83 gossip girl music 84 classic porn porn 85 burn notice tv 86 pthc child porn 87 iron maiden music 88 harry potter movie 89 centurion software 90 720p movie 91 linkin park music 92 the ghost writer movie 93 teen porn 94 tere bin laden movie 95 family guy tv

26 i love you phillip 96 morris movie 97 hung tv 98 microsoft office software 99 date night movie 100 need for speed software

27