Data Scraping’ a Primer for Counseling Clients

A NEW YORK LAW JOURNAL SPECIAL SECTION LitigationLitigation WWW. NYLJ.COM MONDAY, JULY 15, 2013 Internet ‘Data Scraping’ A Primer for Counseling Clients ways to avoid liabil- BY ANTHONY J. DREYER ity for scraping. The AND JAMIE STOCKTON common theories of he proliferation of Internet access liability arising from and mobile devices has led to an scraping are copyright T exponential explosion of content infringement, trespass on the Web, creating a vast repository to chattels, breach of of “publicly available” information. This contract, and viola- includes not only news, business, and tion of the Computer financial information, but also personal Fraud and Abuse Act data, movie and restaurant reviews, con- (CFAA). This article cert ticket sales, flight information, and discusses the lead- a virtually endless array of other cat- ing cases applying egories. This same technological explo- these legal theories sion, however, has made it far easier to website scraping, for third parties to extract this data for and concludes that commercial sale and use—and to do so the most effective for free and without authorization. This way to create potential data extraction, commonly referred to claims against scrap- as “scraping,” “crawling,” or “spidering” ers is through carefully (collectively “scraping”),1 creates legal drafted prohibitions issues and concerns for both sides of in a website’s terms this issue—those who want to scrape, of use. Conversely, the most effective infringement. However, such claims and those who want to protect against way to defend against a claim of unau- are often open to attack on several scraping of their websites. thorized scraping is to abide by such grounds. First, in order to have standing This article provides a primer on the terms of use, or to establish that scrap- to bring a claim for copyright infringe- legal framework surrounding scraping constitutes a fair use and does not ment, the owner (or exclusive licensee) ing, addressing both the grounds for overburden the servers of the website of the website being scraped must also potential claims against scrapers, and being scraped. be the owner of the copyrightable content that is the subject of the claim.2 Copyright Infringement This can pose a barrier to bringing a ANTHONY J. DREYER is a partner, and JAMIE STOCKTON is Scraping inherently involves copying, lawsuit if, for example, the content at an associate, with Skadden, Arps, Slate, Meagher & Flom. and therefore one of the most obvious issue is user-generated (such as vid- BRITTANY BETTMAN, a summer associate, assisted in the preparation of this article. claims against scrapers is copyright eos or reviews), and the rights in the MONDAY, JULY 15, 2013 content have not been transferred to if it taxes the plaintiff’s computer system tation of assent, so long as the terms the website owner. in such a way that would substantially do not violate other basic contract Second, copyright law does not pro- impair it, and, if so, an injunction may principles (e.g., unconscionability).17 tect ideas, but rather only tangible be granted.10 Specifically, the court held For example, in Bidder’s Edge, the expression.3 Thus, the scraping of gen- that there was a viable trespass cause court took note of the fact that the eral factual data does not give rise to of action due to the excessive scraping user agreement at the time, to which a viable claim for copyright infringe- of eBay’s website at the rate of 80,000- users were required to click “I Accept,” ment. For example, in Ticketmaster 100,000 times per day.11 expressly prohibited “any robot, spider, v. Tickets.com, the court rejected an other automatic device, or manual infringement claim because the mate- process to monitor or copy our web rial being extracted—factual informa- Scraping inherently in- pages or the content contained herein tion regarding concerts and URLs—was volves copying, and there- without our prior expressed written not copyrightable.4 permission.”18 The court stated that Third, even if the information copied fore one of the most obvious these terms of use constituted a limited by the scraper is protectable under claims against scrapers is license, and that actions not permitted copyright law, the defendant may be by this license were restricted.19 able to rely upon the “fair use” defense. copyright infringement. Browsewrap agreements, on the oth- Under the Copyright Act, courts are to er hand, involve the posting of a link consider the following factors to deter- Similarly, in Register.com v. Verio, the to terms and conditions on a website mine if a use is a fair use: (1) the pur- Court of Appeals for the Second Circuit for users to read, but do not require pose and character of the use; (2) the held that Verio’s use of search robots users to affirmatively manifest assent nature of the copyrighted work; (3) the consumed a significant portion of the to the terms and conditions—instead, amount and substantiality of the por- capacity of Register’s computer system, user consent is implied by continued tion used in relation to the work as a and that Verio was therefore engaged in use of the website.20 whole; and (4) the effect of the use upon a trespass.12 The court reasoned that if The enforceability of such agreements the potential market for or value of the it were to allow these queries, then it requires a fact-specific inquiry, and turns copyrighted work.5 For example, in Kelly was “highly probable” that other com- largely upon the location and accessibil- v. Arriba Soft, the court held that the panies would begin to do the same, ity of the terms of use.21 According to the use of scraping software by a search which would likely result in Register’s Specht court, “[r]easonably conspicu- engine to reproduce images in thumb- system being “overtaxed and [it] would ous notice of the existence of contract nail form was not a sustainable basis crash.”13 However, in Ticketmaster, the terms and unambiguous manifestation for a claim of copyright infringement, court held that the use of scrapers to of assent to those terms by consumers because the thumbnail images created extract data was not a trespass to chat- are essential if electronic bargaining is from the full-size scraped images were tels, because there was no evidence to have integrity and credibility.”22 “transformative” and qualified as a fair that the scraping caused any tangible For example, in Hines the court held use of the images.6 interference with the operation of Tick- that the browsewrap agreement was etmaster’s system.14 not enforceable, because in this case Trespass to Chattels the plaintiff had no actual or construc- Breach of Contract A trespass to chattels is defined as tive notice of the terms and conditions intentionally dispossessing another of Courts have held that a viable of use.23 However, in Southwest Airlines a chattel or using or intermeddling with method of preventing scraping is to v. BoardFirst, where there was evidence a chattel in the possession of another.7 include prohibitions against scraping that defendant had actual knowledge This legal theory applies to the Internet in the website’s terms of use.15 Such of Southwest’s terms and conditions, inasmuch as a website proprietor has a restrictions are generally conveyed to but nevertheless continued to use “fundamental property right to exclude website users through a “clickwrap” or Southwest’s website in violation of others from its computer system[.]”8 “browsewrap” agreement. those terms, the court held that the Moreover, even if a website is publicly A clickwrap agreement is an online browsewrap agreement was an enforce- accessible, its servers are private prop- agreement that requires the user to able contract.24 erty, and the proprietor may therefore consent to terms and conditions by Terms of use may also be binding grant conditional access to users, includ- affirmatively clicking a dialogue box where the terms are reasonably known ing prohibitions against scraping.9 agreeing to the terms before the user to the user—even in circumstances in For example, in Bidder’s Edge, the can proceed to use a website.16 Click- which the terms are not known to the court held that excessive scraping can wrap agreements are generally enforce- user before the first use of the website. support a claim for trespass to chattels able, due to the user’s clear manifes- For example, in Register.com, the user MONDAY, JULY 15, 2013 was made aware of the terms of use were prohibited from accessing and uti- or copy any of the material on this only after first accessing the informa- lizing the information on the website, the website; (iii) use any manual process tion provided on the website.25 The court held that there was no violation to monitor or copy any of the mate- court held that while the terms of use of the CFAA.33 The court concluded that rial on this website, or to engage in were technically neither a clickwrap the terms of use were not sufficiently any other unauthorized purpose nor a browsewrap agreement, because visible because the link was “buried” at without the express prior written they were only displayed after the user the bottom of the first page, in extremely consent of [CLIENT]; (iv) otherwise accessed the information on the web- fine print, and users had to scroll down use any device, software or routine site, the restrictions therein were nev- to see it, thereby rendering them insuf- that interferes with the proper work- ertheless enforceable, because the user ficient protection for the site.34 ing of this website; or (v) otherwise accessed the website repeatedly and Conclusion and Proposed Terms of Use attempt to interfere with the proper therefore was on notice during subse- working of this website.

Data Scraping’ a Primer for Counseling Clients

Automatic Retrieval of Updated Information Related to COVID-19 from Web Portals 1Prateek Raj, 2Chaman Kumar, 3Dr

Deconstructing Large-Scale Distributed Scraping Attacks

Legality and Ethics of Web Scraping

CSCI 452 (Data Mining) Dr. Schwartz HTML Web Scraping 150 Pts

Security Assessment Methodologies SENSEPOST SERVICES

Web Scraping)

The Industrial Challenges in Software Security and Protection

No Robots, Spiders, Or Scrapers: Legal and Ethical Regulation of Data Collection Methods in Social Media Terms of Service

Advanced Threats: Keeping Cisos on Their Toes

Pingintelligence

Google Spreadsheet Web Api

Detection of Web API Content Scraping an Empirical Study of Machine Learning Algorithms