<<

A NEW YORK LAW JOURNAL SPECIAL SECTION

LitigationLitigation WWW. NYLJ.COM MONDAY, JULY 15, 2013 Internet ‘Data Scraping’ A Primer for Counseling Clients

ways to avoid liabil- BY ANTHONY J. DREYER ity for scraping. The AND JAMIE STOCKTON common theories of he proliferation of Internet access liability arising from and mobile devices has led to an scraping are copyright T exponential explosion of content infringement, trespass on the Web, creating a vast repository to chattels, breach of of “publicly available” information. This contract, and viola- includes not only news, business, and tion of the financial information, but also personal Fraud and Abuse Act , movie and restaurant reviews, con- (CFAA). This article cert ticket sales, flight information, and discusses the lead- a virtually endless array of other cat- ing cases applying egories. This same technological explo- these legal theories sion, however, has made it far easier to website scraping, for third parties to extract this data for and concludes that commercial sale and use—and to do so the most effective for free and without . This way to create potential , commonly referred to claims against scrap- as “scraping,” “crawling,” or “spidering” ers is through carefully (collectively “scraping”),1 creates legal drafted prohibitions issues and concerns for both sides of in a website’s terms this issue—those who want to scrape, of use. Conversely, the most effective infringement. However, such claims and those who want to protect against way to defend against a claim of unau- are often open to attack on several scraping of their websites. thorized scraping is to abide by such grounds. First, in order to have standing This article provides a primer on the terms of use, or to establish that scrap- to bring a claim for copyright infringe- legal framework surrounding scrap- ing constitutes a fair use and does not ment, the owner (or exclusive licensee) ing, addressing both the grounds for overburden the servers of the website of the website being scraped must also potential claims against scrapers, and being scraped. be the owner of the copyrightable con- tent that is the subject of the claim.2 Copyright Infringement This can pose a barrier to bringing a ANTHONY J. DREYER is a partner, and JAMIE STOCKTON is Scraping inherently involves copying, lawsuit if, for example, the content at an associate, with Skadden, Arps, Slate, Meagher & Flom. and therefore one of the most obvious issue is user-generated (such as vid- BRITTANY BETTMAN, a summer associate, assisted in the preparation of this article. claims against scrapers is copyright eos or reviews), and the rights in the MONDAY, JULY 15, 2013 content have not been transferred to if it taxes the plaintiff’s computer system tation of assent, so long as the terms the website owner. in such a way that would substantially do not violate other basic contract Second, copyright law does not pro- impair it, and, if so, an injunction may principles (e.g., unconscionability).17 tect ideas, but rather only tangible be granted.10 Specifically, the court held For example, in Bidder’s Edge, the expression.3 Thus, the scraping of gen- that there was a viable trespass cause court took note of the fact that the eral factual data does not give rise to of action due to the excessive scraping user agreement at the time, to which a viable claim for copyright infringe- of eBay’s website at the rate of 80,000- users were required to click “I Accept,” ment. For example, in Ticketmaster 100,000 times per day.11 expressly prohibited “any robot, spider, v. Tickets.com, the court rejected an other automatic device, or manual infringement claim because the mate- process to monitor or copy our web rial being extracted—factual informa- Scraping inherently in- pages or the content contained herein tion regarding concerts and URLs—was volves copying, and there- without our prior expressed written not copyrightable.4 permission.”18 The court stated that Third, even if the information copied fore one of the most obvious these terms of use constituted a limited by the scraper is protectable under claims against scrapers is license, and that actions not permitted copyright law, the defendant may be by this license were restricted.19 able to rely upon the “fair use” defense. copyright infringement. Browsewrap agreements, on the oth- Under the Copyright Act, courts are to er hand, involve the posting of a link consider the following factors to deter- Similarly, in Register.com v. Verio, the to terms and conditions on a website mine if a use is a fair use: (1) the pur- Court of Appeals for the Second Circuit for users to read, but do not require pose and character of the use; (2) the held that Verio’s use of search robots users to affirmatively manifest assent nature of the copyrighted work; (3) the consumed a significant portion of the to the terms and conditions—instead, amount and substantiality of the por- capacity of Register’s computer system, user consent is implied by continued tion used in relation to the work as a and that Verio was therefore engaged in use of the website.20 whole; and (4) the effect of the use upon a trespass.12 The court reasoned that if The enforceability of such agreements the potential market for or value of the it were to allow these queries, then it requires a fact-specific inquiry, and turns copyrighted work.5 For example, in Kelly was “highly probable” that other com- largely upon the location and accessibil- v. Arriba Soft, the court held that the panies would begin to do the same, ity of the terms of use.21 According to the use of scraping software by a search which would likely result in Register’s Specht court, “[r]easonably conspicu- engine to reproduce images in thumb- system being “overtaxed and [it] would ous notice of the existence of contract nail form was not a sustainable basis .”13 However, in Ticketmaster, the terms and unambiguous manifestation for a claim of copyright infringement, court held that the use of scrapers to of assent to those terms by consumers because the thumbnail images created extract data was not a trespass to chat- are essential if electronic bargaining is from the full-size scraped images were tels, because there was no evidence to have integrity and credibility.”22 “transformative” and qualified as a fair that the scraping caused any tangible For example, in Hines the court held use of the images.6 interference with the operation of Tick- that the browsewrap agreement was etmaster’s system.14 not enforceable, because in this case Trespass to Chattels the plaintiff had no actual or construc- Breach of Contract A trespass to chattels is defined as tive notice of the terms and conditions intentionally dispossessing another of Courts have held that a viable of use.23 However, in Southwest Airlines a chattel or using or intermeddling with method of preventing scraping is to v. BoardFirst, where there was evidence a chattel in the possession of another.7 include prohibitions against scraping that defendant had actual knowledge This legal theory applies to the Internet in the website’s terms of use.15 Such of Southwest’s terms and conditions, inasmuch as a website proprietor has a restrictions are generally conveyed to but nevertheless continued to use “fundamental property right to exclude website users through a “clickwrap” or Southwest’s website in violation of others from its computer system[.]”8 “browsewrap” agreement. those terms, the court held that the Moreover, even if a website is publicly A clickwrap agreement is an online browsewrap agreement was an enforce- accessible, its servers are private prop- agreement that requires the user to able contract.24 erty, and the proprietor may therefore consent to terms and conditions by Terms of use may also be binding grant conditional access to users, includ- affirmatively clicking a dialogue box where the terms are reasonably known ing prohibitions against scraping.9 agreeing to the terms before the user to the user—even in circumstances in For example, in Bidder’s Edge, the can proceed to use a website.16 Click- which the terms are not known to the court held that excessive scraping can wrap agreements are generally enforce- user before the first use of the website. support a claim for trespass to chattels able, due to the user’s clear manifes- For example, in Register.com, the user MONDAY, JULY 15, 2013 was made aware of the terms of use were prohibited from accessing and uti- or copy any of the material on this only after first accessing the informa- lizing the information on the website, the website; (iii) use any manual process tion provided on the website.25 The court held that there was no violation to monitor or copy any of the mate- court held that while the terms of use of the CFAA.33 The court concluded that rial on this website, or to engage in were technically neither a clickwrap the terms of use were not sufficiently any other unauthorized purpose nor a browsewrap agreement, because visible because the link was “buried” at without the express prior written they were only displayed after the user the bottom of the first page, in extremely consent of [CLIENT]; (iv) otherwise accessed the information on the web- fine print, and users had to scroll down use any device, software or routine site, the restrictions therein were nev- to see it, thereby rendering them insuf- that interferes with the proper work- ertheless enforceable, because the user ficient protection for the site.34 ing of this website; or (v) otherwise accessed the website repeatedly and Conclusion and Proposed Terms of Use attempt to interfere with the proper therefore was on notice during subse- working of this website. quent visits.26 In conclusion, scraping may be per- missible under U.S. law if the content ••••••••••••••••••••••••••••• In sum, while statements of assent 1. See EF Cultural Travel BV v. Zefer, 318 F.3d 58, 60 (1st Cir. such as “I agree,” which are often elic- at issue is not subject to copyright pro- 2003) (“A scraper, also called a ‘robot’ or ‘bot,’ is nothing more than a that accesses information contained ited through clickwrap agreements, are tection, if the scraping does not unduly in a succession of webpages stored on the accessed comput- burden the website’s servers, and if the er”); eBay v. Bidder’s Edge, 100 F. Supp. 2d 1058, 1060 (N.D. Cal. preferable and unequivocally reflect a 2000). While it is possible to embed instructions on websites website’s terms of use do not prohibit that inform the scraping software whether scraping is permit- manifestation of assent, the user need ted (called “robot.txt” files), compliance with such instruc- not necessarily state the magic words “I scraping or if assent to such terms has tions is voluntary. See Bidder’s Edge, 100 F. Supp. 2d at 1061. 2. See, e.g., Nautical Solutions Mktg. v. Boats.com, No. 8:02- agree” (or some similar formulation).27 not been manifested. CV-760, 2004 WL 783121, at *2-3 (M.D. Fla. April 1, 2004) (deny- ing post-trial motion for declaration of copyright infringement, However, “the website user must have However, if the client’s goal is to because, inter alia, the website that was being scraped did reduce or protect against scraping, and not own the copyright to the data and images that were being had actual or constructive knowledge copied). to establish a potential basis for liabil- 3. See Feist Publ’ns v. Rural Tel. Serv., 499 U.S. 340 (1991). of the site’s terms and conditions, and 4. See Ticketmaster v. Tickets.com, No. 99-CV-7654, 2003 WL ity, the website’s terms of use should 21406289, at *4-6 (C.D. Cal. March 7, 2003); see also Nautical have manifested assent to them” in some Solutions, 2004 WL 783121, at *2-3 (reaching similar result for manner, implicit or explicit.28 contain language to the following effect, scraping of information regarding the sale of yachts). 5. See 17 U.S.C. §107. and users should be put on reasonable 6. Kelly v. Arriba Soft, 336 F.3d 811, 819 (9th Cir. 2003). An Violation of the CFAA in-depth discussion of the nuances of the fair use doctrine is notice of such terms. This language is, of outside the scope of this article. For a discussion of fair use, The CFAA is a federal statute that pro- course, merely provided as an example: see Melville B. Nimmer, 4 Nimmer on Copyright §13.05 (Lexis 2013). vides liability for anyone who “inten- By accessing this website, you 7. See Restatement (Second) of Torts §218 (Westlaw 2012); see also Bidder’s Edge, 100 F. Supp. 2d at 1069. tionally accesses a computer without accept without limitation or quali- 8. Bidder’s Edge, 100 F. Supp. 2d at 1067. 9. Id. at 1070. authorization or exceeds authorized fication, and agree to be bound and 10. Id. at 1071-72. 11. Id. at 1071. access, and thereby obtains…informa- abide by, the following terms and 12. Register.com v. Verio, 356 F.3d 393, 404-05 (2d Cir. 2004). 29 13. Id. at 404. tion from any protected computer.” conditions (Terms of Use). [CLIENT] 14. Ticketmaster, 2003 WL 21406289, at *3. The CFAA also requires that there be 15. See, e.g., Bidder’s Edge, 100 F. Supp. 2d at 1067; Zefer, may revise and update these Terms 318 F. 3d at 62. a minimum amount of damages of at of Use from time to time in its sole 16. See Specht v. Netscape Commc’ns, 306 F.3d 17, 22 n.4 (2d 30 Cir. 2002); Hines v. Overstock.com, 668 F. Supp. 2d 362, 366-67 least $5,000 over a one-year period. discretion. Your continued use of (E.D.N.Y. 2009). 17. See Specht, 306 F.3d at 22 n.4. Similar to the breach of contract cases this website following the posting 18. Bidder’s Edge, 100 F. Supp. 2d at 1060. 19. Id. at 1067. discussed above, CFAA cases often of revised Terms of Use means that 20. See Specht, 306 F.3d at 25. hinge upon whether a user had actu- 21. See, e.g., Specht, 306 F.3d at 35; Hines, 668 F. Supp. 2d you accept and agree to any and all at 367. al or constructive knowledge of the 22. See Specht, 306 F.3d at 35 (finding a browsewrap agree- changes to the Terms of Use. You ment unenforceable). restrictive terms of a website’s terms may use this website only for law- 23. See Hines, 668 F. Supp. 2d at 367. 24. Sw. Airlines v. BoardFirst, No. 3:06-CV-0891, 2007 WL of use (i.e., knowledge that the scrap- ful purposes and in accordance with 4823761, at *7 (N.D. Texas Sept. 12, 2007). 25. Register.com, 356 F.3d at 401-04. ing was “unauthorized”). these Terms of Use, and you agree 26. Id. 27. See id. at 402-03. For example, in Southwest Airlines not to: (i) use this website in any 28. Cvent v. Eventbrite, 739 F. Supp. 2d 927, 937 (E.D. Va. v. Farechase, defendants scraped fare, 2010); see also Hines, 668 F. Supp. 2d at 367. manner that could disable, overbur- 29. 18 U.S.C. §1030(a)(4); see also 18 U.S.C. §1030(g) (provid- route, and scheduling information from den, damage, or impair this website, ing for civil liability and a private right of action). 31 30. See 18 U.S.C. §1030(a)(4). Southwest.com. The court denied or interfere with any other use of 31. Sw. Airlines v. Farechase, 318 F. Supp. 2d 435, 440 (N.D. Tex. 2004). a motion to dismiss the CFAA claim this website, including, but not lim- 32. Id. at 439-40; see also Zefer, 318 F.3d at 62-63 (upholding a preliminary injunction issued under the CFAA where defen- because Southwest alleged (i) damages ited to, any user’s ability to engage dant had knowledge that scraping was unauthorized). 33. Cvent, 739 F. Supp. 2d at 932-34. of at least $5,000, and (ii) that it had put in real-time activities through this 34. Id. defendant on actual notice that scraping website; (ii) use any robot, spider was prohibited.32 or other automatic device, process Reprinted with permission from the July 15, 2013 edition of the NEW YORK LAW However, in Cvent, even though the or means to access this website for JOURNAL © 2013 ALM Media Properties, LLC. All rights reserved. Further duplication without permission is prohibited. For information, contact 877-257- terms of use stated that competitors any purpose, including to monitor 3382 or [email protected]. # 070-07-13-19