Deconstructing Large-Scale Distributed Scraping Attacks

September 2018 Radware Research Deconstructing Large-Scale Distributed Scraping Attacks A Stepwise Analysis of Real-time Sophisticated Attacks On E-commerce Businesses Deconstructing Large-Scale Distributed Scraping Attacks Table of Contents 02 Why Read This E-book 03 Key Findings 04 Real-world Case of A Large-Scale Scraping Attack On An E-tailer Snapshot Of The Scraping Attack Attack Overview Stages of Attack Stage 1: Fake Account Creation Stage 2: Scraping of Product Categories Stage 3: Price and Product Info. Scraping Topology of the Attack — How Three-stages Work in Unison 11 Recommendations: Action Plan for E-commerce Businesses to Combat Scraping 12 About Radware 02 Deconstructing Large-Scale Distributed Scraping Attacks Why Read This E-book Hypercompetitive online retail is a crucible of technical innovations to win today’s business wars. Tracking prices, deals, content and product listings of competitors is a well-known strategy, but the rapid pace at which the sophistication of such attacks is growing makes them difficult to keep up with. This e-book offers you an insider’s view of scrapers’ techniques and methodologies, with a few takeaways that will help you fortify your web security strategy. If you would like to learn more, email us at [email protected]. Business of Bots Companies like Amazon and Walmart have internal teams dedicated to scraping 03 Deconstructing Large-Scale Distributed Scraping Attacks Key Findings Scraping - A Tool Today, many online businesses either employ an in-house team or leverage the To Gain expertise of professional web scrapers to gain a competitive advantage over their competitors. Scrapers plan attacks in various stages to evade the vulnerabilities of Competitive existing systems such as WAFs, Intrusion Detection Systems/Intrusion Prevention Advantage Systems (IPS/IDS), and other in-house measures that lack the historical look-back, deep learning capabilities, and the ability to sniff automated behavior in syntactically-correct HTTP requests. Usage Of Attackers build an exploit kit that comprises a combination of tools (such as proxy IPs, Custom-built multiple UAs, programmatic/sequential requests) to evade detection and perform large-scale and sophisticated scraping attacks. Websites are then hit by bots from tens Exploit Kits of thousands of new IPs that are used once, and never again. For instance, in the case that we examined, attackers scraped product information and pricing details of 651,999 products from 11,795 categories using a combination of exploit tools and fake user accounts. Systematic Our research shows that such organized and sophisticated attacks are fueled by the Attacks To growing demand for data, price, and market intelligence. All large e-commerce firms track their competitors, hence large firms are more likely to be targeted by scrapers than Continuously small and mid-size e-tailers. Gather Market Intelligence 04 Deconstructing Large-Scale Distributed Scraping Attacks Real-world Case of A Large-Scale Scraping Attack On An E-tailer 1 2 3 4 5 6 Snapshot Of The Scraping Attack Attack Overview Industry: E-commerce Total Hits: 690,015 A popular e-commerce portal was Duration of Study: 15 days inundated with scraping attacks and faced 690,015 hits on its category and product Fake Accounts Created: 2,345 pages during our 15 day-long analysis. Categories Scraped: 11,791 Products Scraped: 651,999 Attackers created 2,345 fake user accounts, scraped 11,791 category results, and Scale Of The Attack: Large-scale and distributed managed to get away with details of from thousands of locations 651,999 products, including pricing information. using various evasion techniques Business of Bots The founders of Diapers.com, which Amazon acquired in 2010, have accused Amazon of using bots to automatically adjust its prices. - Brad Stone's book ‘The Everything Store’ 05 Deconstructing Large-Scale Distributed Scraping Attacks Real-world Case of A Large-Scale Scraping Attack On An E-tailer 1 2 3 4 5 6 Snapshot Of The Scraping Attack Stages of the Attack Fake Account Scraping of Price and Product Creation Product Categories Data Scraping Attackers deployed a purpose-built scraper 1011000111001010 1011000111001010 1011000111001010 1011000111001010 engine to execute attacks. They deployed an Stage - 1 Stage - 2 Stage - 3 ‘exploit kit’ with different ready-to-use combinations of hardware and software to bypass Exploit Kit To Evade Detection web defense systems. Targeting from dierent ISPs At first, they created fake accounts to register Proxy IPs across the Globe their bots as genuine users. Then they used those fake accounts to scrape category pages in the Programmatic / Sequential second phase. Once category pages were Requests crawled, attackers regularly followed product pages to keep up with the latest pricing Multiple UAs Cookie Maintaining information and product updates. Capabilities Purpose-built Scraper Engine 06 Deconstructing Large-Scale Distributed Scraping Attacks Real-world Case of A Large-Scale Scraping Attack On An E-tailer 1 2 3 4 5 6 Stage 1: Fake Account Creation Target: Sign in/Sign up Pages Fake Unique IDs Created: 2,345 Attackers targeted the sign-up page using different attack vectors. They Sign in/ Sign up Page created 2,345 fake UIDs (User IDs) to register bots as legitimate users on the Impacted Vectors website. They used these fake accounts in combination with different device IDs, cookies, and UAs to Proxy IPs masquerade as genuine users and Multiple UAs generate perfectly-valid HTTP requests Targeting from different ISPs across the Globe to easily circumvent rule-based Exploit Kit 2345 UIDs Programmatic / conventional security measures. Sequential Requests Cookie Maintaining Capabilities Attack Vectors Attack Crawler Engine 0807 Deconstructing Large-Scale Distributed Scraping Attacks Real-world Case of A Large-Scale Scraping Attack On An E-tailer 1 2 3 4 5 6 Stage 2: Scraping of Target: Category Pages, On-site Search Category Pages Targeted: 11,795 Hits on Search Results: 374 Product Categories Scraped Category Results: 11,791 Search Results 374 Hits Using fake UIDs, attackers logged into Impacted Vectors the website and made 11,795 hits on 11,795 category pages. They managed to Category Pages Hits scrape 11,791 category results. Scrapers also performed 374 searches. Using Multiple UIDs Proxy IPs Multiple UAs Targeting from different 2345 UIDs 11791 Scraped Category Results ISPs across the Globe Exploit Kit Programmatic / Business of Bots Sequential Requests Cookie Maintaining Thousands of scrapers are Capabilities Vectors Attack listed on sites such as upwork, Crawler Engine guru.com, and freelancers.com 090808 Deconstructing Large-Scale Distributed Scraping Attacks Real-world Case of A Large-Scale Scraping Attack On An E-tailer 1 2 3 4 5 6 Stage 3: Price and Target: Product Pages Product Pages Targeted: 652, 567 Product Data Scraping Details of Products Scraped: 651,999 652,567 Hits Impacted Vectors After scraping the category pages, Product Pages attackers carried out 652,567 hits on specific product pages and managed to store the prices and product details of 651,999 products in their own database. Attack Targeted Categories The attackers maintained a real-time Sync repository of the entire product catalog Proxy IPs on the e-commerce portal. They also Multiple UAs regularly tracked the price changes to Targeting from different 2345 UIDs ISPs across the Globe 11791 Scraped Category Results keep their database updated with the Programmatic / Exploit Kit latest pricing information. Sequential Requests Cookie Maintaining Capabilities Vectors Attack Delta 651,999 Crawler Engine Engine Scrapers’ DB Products [N] Products 9 Deconstructing Large-Scale Distributed Scraping Attacks Real-world Case of A Large-Scale Scraping Attack On An E-tailer 1 2 3 4 5 6 Topology of The Attack — How Three Stages Work in Unison 374 Hits Search Results 652,567 Hits Product Pages Impacted Vectors Sign in/ Sign up Page 11,795 Category Pages Hits Using Multiple UIDs Attack Targeted Proxy IPs Categories Multiple UAs Sync Targeting from different ISPs across the Globe Exploit Kit 2345 UIDs 11791 Scraped Category Results Programmatic / Sequential Exploit Kit Requests Exploit Kit Cookie Maintaining Capabilities Attack Vectors Attack Delta 651,999 Scrapers’ DB Crawler Engine Products [N] Products All the three stages were part of a single large-scale scraping attack and worked together to perform real-time monitoring of product pages. During our analysis, we observed that rule-based systems are incapable of detecting such scraping attacks that are organized in different stages and executed using perfectly legitimate user activities. 1011 Deconstructing Large-Scale Distributed Scraping Attacks Recommendations Action Plan for E-commerce Businesses to Combat Scraping All large e-commerce platforms have sophisticated bot activity on their website, mobile apps, and APIs that can expose them to scraping and loss of Gross Merchandise Value (GMV). E-tailers must be diligent in their approach to find and mitigate malicious sources of bot activity. Spot highly active new or existing user Watch out for competitive price tracking and accounts that don’t buy monitoring E-commerce portals must track old or newly-created Many e-commerce firms deploy bots or hire professionals to scrape accounts that are highly active on the platform but product details and pricing information from their rival portals. You must haven’t made any purchase in a long time. Such regularly track competitors for signs of price and

Deconstructing Large-Scale Distributed Scraping Attacks

Automatic Retrieval of Updated Information Related to COVID-19 from Web Portals 1Prateek Raj, 2Chaman Kumar, 3Dr

Legality and Ethics of Web Scraping

CSCI 452 (Data Mining) Dr. Schwartz HTML Web Scraping 150 Pts

Security Assessment Methodologies SENSEPOST SERVICES

Web Scraping)

The Industrial Challenges in Software Security and Protection

No Robots, Spiders, Or Scrapers: Legal and Ethical Regulation of Data Collection Methods in Social Media Terms of Service

Advanced Threats: Keeping Cisos on Their Toes

Pingintelligence

Google Spreadsheet Web Api

Detection of Web API Content Scraping an Empirical Study of Machine Learning Algorithms

Current Phishing Threats