Evaluation of an Attack Approach on Google V3 Captcha

Total Page:16

File Type:pdf, Size:1020Kb

Evaluation of an Attack Approach on Google V3 Captcha Evaluation of an attack approach on Google V3 Captcha Mohit Mohit Saksham Chawala Sumit Mokashi Department of Computer Science Department of Computer Science Department of Computer Science University of Texas at Arlington University of Texas at Arlington University of Texas at Arlington Arlington,Texas Arlington,Texas Arlington,Texas Abstract— CAPTCHAs are used as the first line of defense to spaced intervals at a variable pitch or speed, often with an defend against automated account creation and service abuse. accent and distortion/noise. To solve the captcha, a user must Google’s reCaptcha is currently used by millions of websites for correctly identify the digits or words spoken in the audio protection against automated attackers(testing whether a user is truly human). In this project we present an approach where clip. Attacks have been demonstrated on these audio captchas we try to compromise the Google V3 Captcha module using with varying degrees of success in the past. This is usually techniques involving machine learning. We create a click-bot done by training local machine-learning models to identify that generates and emulates human cursor movements on the spoken words, a high-resource and time-consuming screen, in an attempt to fool the captcha module. Using our bot approach. Additionally, although researchers have explored we create a dataset that we then use to train our model. In our approach we try to use Binary classification, and we conclude using online speech recognition services, including Sphinx our project by providing details of a more comprehensive or Google Speech Recognition, these services have not been approach that involves Reinforcement Learning. accurate enough to compete with offline services or solve the captcha reliably. I. INTRODUCTION CAPTCHAs (the Completely Automated Public Turing II. BACKGROUND tests to tell Computers and Humans Apart) are systems The reCaptcha system relies on an advanced risk analysis designed to protect against automated account creation and engine. As the user interacts with reCaptcha (clicking buttons service abuse by presenting users with a challenge that and typing), the system determines a level of suspicion for is easy for humans to solve but difficult for computers. that user. Today, many users will find that they simply need to Captchas are used extensively online as a defense against click the checkbox and be verified without needing to solve automated bots and Sybil attacks, as well as preventing a captcha. This occurs when the reCaptcha is fairly confident spam. For instance, many online registration platforms, from that the user is human and not an automated attacker (this is social media services to email to ticketing systems, require called the “noCaptcha reCaptcha”). If the system is unsure the user to solve a captcha during registration to prevent if the user is a human (but is not highly suspicious either), it automated creation of fake accounts. In a similar vein, some will deliver a moderate challenge to the user (an easy image online services have recently begun requiring Tor clients to problem or a short audio string of numbers to transcribe). solve captchas before delivering web content. The security of This often occurs when a user does not yet have a long captchas is paramount to protecting services on the Internet enough history of interaction with Google. However, as from these attacks. As for the remainder of the paper, the reCaptcha system becomes increasingly suspicious, it we follow industry convention and write the acronym in delivers harder challenges: 10 digits in the audio challenge, lowercase, as “captcha”, for readability. Spread of news or prompting the user to solve multiple challenges. By and information is increasingly driven by user content on default, a user with no past history with Google services sites like Twitter, YouTube, and Reddit, bots that could will be automatically given the most difficult challenge. It defeat the captcha system and register a disproportionate is these most difficult challenges that unCaptcha attempts to number of accounts could theoretically control the flow of solve. Although the new reCaptcha system was introduced information. It is therefore unsurprising that captchas have in 2014 to replace the traditional “distorted text” captcha, been the target of attack for researchers and attackers for not much is known about its inner workings. Google has years. Until recently, captchas have featured distorted text protected the inner design of reCaptcha heavily, releasing few that users must correctly type to pass. Bursztein et al. showed details about how their software works. The captcha system these text-based captchas to be insecure by demonstrating is run from an encrypted, isolated VM (Virtual Machine) a system with near-complete (98%) accuracy. As a result, in JavaScript with a unique bytecode language. To make text-based captchas have been largely phased out in favor reverse engineering even more difficult, the bytecode has of image captchas. However, visually impaired users are direct access to JavaScript variables of its own interpreter, incapable of solving these visual captchas, prompting the and changes its own decryption key and even its own opcodes creation of audio captchas. Typical audio captchas consist numbers at many points during its own execution. A full of different speakers saying words or digits at randomly working disassembler and decompiler for the system was released, and it was determined that the captcha system, in audio captchas. Further independent studies deployed a addition to confirming the actual captcha solving, checked two-phase segment-then-classify approach and successfully for the presence of: valid plugins; a valid user-agent string; a broke older versions of Google and Yahoo audio captchas. valid screen resolution; execution time; computer timezone; These two-phase solvers usually operate by first extracting number of click, keyboard, or touch actions in the iframe portions of the captcha that contain the digit, and then of the captcha; many browser-specific functions and CSS running pre-trained machine learning algorithms to classify rules; canvas rendering properties; server side cookies; and those individual digits, rather than classifying them all at likely more. In 2016, a further analysis by Sivakorn et al. once. of the reCaptcha system explored the weaknesses of the The aim of this project is to study and evaluate the initial implementation of the image captcha. It is important Google V3 captcha systems that are in use today. With this to note that the image captcha has changed since that paper, study we propose an attack scenario that utilizes automated and their methodology is no longer sufficient to defeat scripts to compromise the system. Our work relies on a the captcha. However, their analysis of the captcha’s risk previous case study conducted by Ismail Akrout, Amal analysis system lends insight into its inner workings. In Feriani, Mohamed Akrout in their paper Hacking Google particular, Silvakorn et al. found that Google’s tracking reCAPTCHA v3 using Reinforcement Learning[1]. This cookies play an integral role in the captcha’s defenses. The paper presents a Reinforcement Learning (RL) methodology captcha system is made aware of every time a user interacts to bypass Google reCAPTCHA v3 where the agent learns with a Google service (or a page with Google’s tracking how to move the mouse and click on the reCAPTCHA button cookies, such as Google analytics). After just 9 days of to receive a high score. Authors also used a divide and automated browsing across different Google services, their conquer strategy to defeat the reCAPTCHA system for any bots’ tracking cookie was sufficient to fool the risk analysis grid resolution. Their proposed method achieves a success system into thinking they were human, and checking off rate of 97.4 the box. However, their experiments revealed each cookie In our research we also relied upon another case study could only immediately complete 8 captchas per day before conducted by Suphannee Sivakorn, Iasonas Polakis and needing to solve additional challenges. Their results also Angelos D. Keromytis in their paper I am Robot: (Deep) showed that the reCaptcha system attempts to fingerprint the Learning to Break Semantic Image CAPTCHAs(12th May browser, using canvas rendering techniques, comparing the 2016). In this paper, the authors conduct a comprehensive user-agent to what the browser reports, and potentially more. study of reCaptcha, and explore how the risk analysis Despite these impressive efforts of the risk analysis engine process is influenced by each aspect of the request. to identify a bot before the captcha, reCaptcha still remains Through extensive experimentation, they identify flaws that susceptible to low-resource attacks to its audio challenge. allow adversaries to effortlessly influence the risk analysis, Over the last decade, reCAPTCHA has continuously bypass restrictions, and deploy large-scale attacks. They evolved its technology. In reCAPTCHA v1, every user was design a novel low-cost attack that leverages deep learning asked to pass a challenge by reading distorted text and typing technologies for the semantic annotation of images. Their into a box. To improve both user experience and security, system is extremely effective, automatically solving 70.78 they introduced reCAPTCHA v2 and began to use many Another study focuses on presenting a tool called other signals to determine whether a request came from a unCaptcha, an automated system that can solve reCaptcha’s human or bot. This enabled reCAPTCHA challenges to move most difficult auditory challenges with high success rate. from a dominant to a secondary role in detecting abuse, We evaluate unCaptcha using over 450 reCaptcha challenges letting about half of users pass with a single click. Today with from live websites, and show that it can solve them with reCAPTCHA v3, sites can test for human vs. bot activities by 85.15 returning a score to tell you how suspicious an interaction is and eliminating the need to interrupt users with challenges at all.
Recommended publications
  • Recaptcha: Human-Based Character Recognition Via Web Security
    REPORTS on September 12, 2008 and blogs. For example, CAPTCHAs prevent www.sciencemag.org reCAPTCHA: Human-Based Character ticket scalpers from using computer programs to buy large numbers of concert tickets, only to re Recognition via Web Security Measures sell them at an inflated price. Sites such as Gmail and Yahoo Mail use CAPTCHAs to stop spam Luis von Ahn,* Benjamin Maurer, Colin McMillen, David Abraham, Manuel Blum mers from obtaining millions of free e mail accounts, which they would use to send spam CAPTCHAs (Completely Automated Public Turing test to tell Computers and Humans Apart) are e mail. Downloaded from widespread security measures on the World Wide Web that prevent automated programs from According to our estimates, humans around abusing online services. They do so by asking humans to perform a task that computers cannot yet the world type more than 100 million CAPTCHAs perform, such as deciphering distorted characters. Our research explored whether such human every day (see supporting online text), in each case effort can be channeled into a useful purpose: helping to digitize old printed material by asking spending a few seconds typing the distorted char users to decipher scanned words from books that computerized optical character recognition failed acters. In aggregate, this amounts to hundreds of to recognize. We showed that this method can transcribe text with a word accuracy exceeding 99%, thousands of human hours per day. We report on matching the guarantee of professional human transcribers. Our apparatus is deployed in more an experiment that attempts to make positive use than 40,000 Web sites and has transcribed over 440 million words.
    [Show full text]
  • Add a Captcha to a Contact Form
    Add A Captcha To A Contact Form Colin is swishing: she sectionalizing aphoristically and netts her wherefore. Carroll hogtying opportunely while unresolved Tre retell uncontrollably or trekking point-device. Contractible Howard cravatted her merrymakers so afire that Hugo stabilised very microscopically. Please provide this works just create customized contact form module that you can add a captcha to contact form element options can process Are seldom sure you want to excuse that? It was looking at minimum form now it to both nithin and service will be used by my front end. Or two parameters but without much! Bleeding edge testing system that controls the add a captcha contact form to. Allows you ever want to disable any spam form script that you to add and choose themes that have a contact form or badge or six letters! Even for contact template tab we work fine, add a plugin. Captcha your print perfectly clear explanation was more traditional captcha as a mix of images with no clue how do exactly what is a contact your website? Collect information and is not backward compatible with a captcha to form orders and legally hide it? Is there a way to gauge my Mac from sleeping during a file copy? Drop the Contact Form element on your desired area. Captcha widget areas in your site. How can never change the production method my products use? Honeypots are essential for our ads for us understand what you have a template is now has a weird of great option only use? This full stack overflow! The mail is sent, email and a message field.
    [Show full text]
  • Privacy Policy Interpretation and Definitions
    Privacy Policy This Privacy Policy describes Our policies and procedures on the collection, use and disclosure of Your information when You use the Service and tells You about Your privacy rights and how the law protects You. We use Your Personal data to provide and improve the Service. By using the Service, You agree to the collection and use of information in accordance with this Privacy Policy. Interpretation and Definitions Interpretation The words of which the initial letter is capitalized have meanings defined under the following conditions. The following definitions shall have the same meaning regardless of whether they appear in singular or in plural. Definitions For the purposes of this Privacy Policy: • You means the individual accessing or using the Service, or the company, or other legal entity on behalf of which such individual is accessing or using the Service, as applicable. Under GDPR (General Data Protection Regulation), You can be referred to as the Data Subject or as the User as you are the individual using the Service. • Company (referred to as either "the Company", "We", "Us" or "Our" in this Agreement) refers to Adventure City Inc., 1238 S. BEACH BLVD., SUITE E. For the purpose of the GDPR, the Company is the Data Controller. • Affiliate means an entity that controls, is controlled by or is under common control with a party, where "control" means ownership of 50% or more of the shares, equity interest or other securities entitled to vote for election of directors or other managing authority. • Account means a unique account created for You to access our Service or parts of our Service.
    [Show full text]
  • I Am Not a Robot: an Overview on Google's Captcha
    I AM NOT A ROBOT: - AN OVERVIEW ON GOOGLE’S CAPTCHA A Thesis Presented to the Faculty of California State Polytechnic University, Pomona In Partial Fulfillment Of the Requirements for the Degree Master of Science In Computer Science By Uday Prabhala 2016 SIGNATURE PAGE THESIS: I AM NOT A ROBOT: - AN OVERVIEW ON GOOGLE’S CAPTCHA AUTHOR: Uday Prabhala DATE SUBMITTED: Summer 2016 Computer Science Department. Dr. Gilbert Young ___________________________________________ Thesis Committee Chair Computer Science Dr. Fang D. Tang ___________________________________________ Computer Science Dr. Yu Sun ___________________________________________ Computer Science ii ACKNOWLEDGEMENTS I would like to express my deepest gratitude to my family members, Yashoda, Lucky, and Diskey, as well as my girlfriend Siri, who helped make this endeavor possible. Their limitless support, assistance, and encouragement during the times when I was close to giving up were greatly helpful, and I wouldn’t have been able to overcome the obstacles without them. I would also like to send my appreciation and gratitude to the Professors who were part of my thesis committee. Most notably, I would like to thank Professor Gilbert Young, chair of the committee, for his support, patience, guidance, and sharing of knowledge throughout the program. I would also like to thank Professor Tang and Professor Yusun for reviewing my paper and attending my presentation. The above three Professors not only helped me to complete my program, but also served as an excellent example by exercising professionalism, versatility, and commitment to the developing engineering students at California State Polytechnic University, Pomona. iii ABSTRACT I am not a Robot Overview on Google’s Captcha Uday Kiran Prabhala Computers are one of the greatest inventions done by humans; these devices not only made our work easy, but could also be misused in various ways.
    [Show full text]
  • Modern Password Security for System Designers What to Consider When Building a Password-Based Authentication System
    Modern password security for system designers What to consider when building a password-based authentication system By Ian Maddox and Kyle Moschetto, Google Cloud Solutions Architects This whitepaper describes and models modern password guidance and recommendations for the designers and engineers who create secure online applications. A related whitepaper, Password security ​ for users, offers guidance for end users. This whitepaper covers the wide range of options to consider ​ when building a password-based authentication system. It also establishes a set of user-focused recommendations for password policies and storage, including the balance of password strength and usability. The technology world has been trying to improve on the password since the early days of computing. Shared-knowledge authentication is problematic because information can fall into the wrong hands or be forgotten. The problem is magnified by systems that don't support real-world secure use cases and by the frequent decision of users to take shortcuts. According to a 2019 Yubico/Ponemon study, 69 percent of respondents admit to sharing passwords with ​ ​ their colleagues to access accounts. More than half of respondents (51 percent) reuse an average of five passwords across their business and personal accounts. Furthermore, two-factor authentication is not widely used, even though it adds protection beyond a username and password. Of the respondents, 67 percent don’t use any form of two-factor authentication in their personal life, and 55 percent don’t use it at work. Password systems often allow, or even encourage, users to use insecure passwords. Systems that allow only single-factor credentials and that implement ineffective security policies add to the problem.
    [Show full text]
  • Cookie Swap Party: Abusing First-Party Cookies for Web Tracking
    Cookie Swap Party: Abusing First-Party Cookies for Web Tracking Quan Chen Panagiotis Ilia [email protected] [email protected] North Carolina State University University of Illinois at Chicago Raleigh, USA Chicago, USA Michalis Polychronakis Alexandros Kapravelos [email protected] [email protected] Stony Brook University North Carolina State University Stony Brook, USA Raleigh, USA ABSTRACT 1 INTRODUCTION As a step towards protecting user privacy, most web browsers perform Most of the JavaScript (JS) [8] code on modern websites is provided some form of third-party HTTP cookie blocking or periodic deletion by external, third-party sources [18, 26, 31, 38]. Third-party JS li- by default, while users typically have the option to select even stricter braries execute in the context of the page that includes them and have blocking policies. As a result, web trackers have shifted their efforts access to the DOM interface of that page. In many scenarios it is to work around these restrictions and retain or even improve the extent preferable to allow third-party JS code to run in the context of the of their tracking capability. parent page. For example, in the case of analytics libraries, certain In this paper, we shed light into the increasingly used practice of re- user interaction metrics (e.g., mouse movements and clicks) cannot lying on first-party cookies that are set by third-party JavaScript code be obtained if JS code executes in a separate iframe. to implement user tracking and other potentially unwanted capabil- This cross-domain inclusion of third-party JS code poses security ities.
    [Show full text]
  • Check Point Threat Intelligence Bulletin
    December 31, 2018 – January 6, 2019 VS. CISCO IRON PORT YOUR CHECK POINT THREAT INTELLIGENCE REPORT TOP ATTACKS AND BREACHES Highly-sensitive personal data of more than 100 German politicians, including German Chancellor Angela Merkel, has been leaked in a recent attack. While the identity of the attackers and the method used are still unknown, the leaked data appears to have been collected from their personal smartphones. The popular browser-based game ‘Town of Salem’ has suffered a major data breach, exposing account data of more than 7.6 million players. The breached database contained players’ email addresses, hashed passwords, IP addresses and some payment information. The Ryuk ransomware has hit the cloud hosting provider “Dataresolution.net”, after the attackers used a hacked login account. The Ryuk campaign was studied last August by Check Point’s research team, who associated it with the notorious North Korean APT Lazarus Group. Check Point SandBlast and Anti-Bot blades provide protection against this threat (Trojan-Ransom.Win32.Ryuk) A new campaign targeting Chromecast adapters has been launched in order to promote the popular YouTube channel “PewDiePie”. The hackers utilized the Universal Plug and Play (UPnP) feature in Chromecasts that allows routers to forward public Internet ports to internal adapters and used it to connect to the device and display YouTube content. A data leak has affected over 2.4 million users of Blur, the password manager application. The leak potentially exposed users’ email addresses, password hashes, IP addresses and, in some cases, full names and password hints. The official website of Dublin’s tram system, the Luas, has been hacked and defaced.
    [Show full text]
  • Getting Your Google Analytics ID to Get a Google Analytics ID, Do the Following: 1
    ECinteractivePLUS®: Setting Up Google Analytics™ for Your Site You can use Google Analytics™ to learn how visitors interact with your site. Google Analytics gives you free reports on your site visitors, including their referring sites, search engines, search keywords, time on each page, pages per visit, geographic location, browser versions, and much more. This analysis can help you make informed decisions about your marketing campaigns, increase conversions from guest to loyal customer, and empower you to grow your business online. Google offers a basic set of Analytics services free of charge. For more details and to sign up, see www.google.com/analytics/. To make Google Analytics easy to implement in ECinteractivePLUS®, your Admin site has a Google Analytics ID box in Site Preferences. After you update your Google Analytics tracking code, the ECinteractivePLUS system automatically enters it onto every page of your front-end site so that Google Analytics can track your traffic. Getting Your Google Analytics ID To get a Google Analytics ID, do the following: 1. Go to the Google Analytics sign-up page and log in to your Google account. If you don’t already have a Google account, click Create Account and complete the sign-up process. 2. Complete the Google Analytics sign-up process. When prompted for your site URL, enter your site URL including the directory path that has your ECI DDMS® account number (www.ecinteractive. com/#####). The Google Analytics page displays instructions for adding a script block to each page. In this script block, look for your tracking code that begins with UA- followed by a series of numbers.
    [Show full text]
  • Apache Shindig V
    ...................................................................................................................................... Apache Shindig v. 1.0 User Guide ...................................................................................................................................... The Apache Software Foundation 2012-03-11 T a b l e o f C o n t e n t s i Table of Contents ....................................................................................................................................... 1. Table of Contents . i 2. Introduction . 1 3. Download . 3 4. Overview . 6 5. Getting Started . 16 6. Documentation Centre . 22 7. Java . 23 8. Building Java . 24 9. Samples . 28 10. PHP . 29 11. Building PHP . 30 12. Features . 32 13. Community Overview . 35 14. Getting Help . 37 15. Code Conventions . 38 16. Jira Conventions . 39 17. SVN Conventions . 40 18. Shindig Release Process . 42 19. FAQ . 46 20. Powered By . 48 21. Resources . 49 © 2 0 1 2 , T h e A p a c h e S o f t w a r e F o u n d a t i o n • A L L R I G H T S R E S E R V E D . T a b l e o f C o n t e n t s ii © 2 0 1 2 , T h e A p a c h e S o f t w a r e F o u n d a t i o n • A L L R I G H T S R E S E R V E D . 1 I n t r o d u c t i o n 1 1 Introduction ....................................................................................................................................... 1.1 Welcome To Apache Shindig ! Apache Shindig is an OpenSocial container and helps you to start hosting OpenSocial apps quickly by providing the code to render gadgets, proxy requests, and handle REST and RPC requests.
    [Show full text]
  • An Object Detection Based Solver for Google's Image Recaptcha V2
    An Object Detection based Solver for Google’s Image reCAPTCHA v2 Md Imran Hossen∗ Yazhou Tu∗ Md Fazle Rabby∗ Md Nazmul Islam∗ Hui Cao† Xiali Hei∗ ∗University of Louisiana at Lafayette †Xi’an Jiaotong University Abstract have emerged as a superior alternative to text ones as they are considered more robust to automated attacks. Previous work showed that reCAPTCHA v2’s image chal- lenges could be solved by automated programs armed with reCAPTCHA v2, a dominant image CAPTCHA service Deep Neural Network (DNN) image classifiers and vision released by Google in 2014, asks users to perform an im- APIs provided by off-the-shelf image recognition services. age recognition task to verify that they are humans and not In response to emerging threats, Google has made signifi- bots. However, in recent years, deep learning (DL) algorithms cant updates to its image reCAPTCHA v2 challenges that have achieved impressive successes in several complex image can render the prior approaches ineffective to a great extent. recognition tasks, often matching or even outperforming the In this paper, we investigate the robustness of the latest ver- cognitive ability of humans [30]. Consequently, successful sion of reCAPTCHA v2 against advanced object detection attacks against reCAPTCHA v2 that leverage Deep Neural based solvers. We propose a fully automated object detection Network (DNN) image classifier and off-the-shelf (OTS) im- based system that breaks the most advanced challenges of age recognition services have been proposed [44, 50]. reCAPTCHA v2 with an online success rate of 83.25%, the The prior work advanced our understanding of the security highest success rate to date, and it takes only 19.93 seconds issues of image CAPTCHAs and led to better CAPTCHA (including network delays) on average to crack a challenge.
    [Show full text]
  • Questionnaire to Google
    Questionnaire to Google 1. Definitions The following terms will be used for the purpose of this questionnaire: a) “Google service ”: any service operated by Google that interacts with users and/or their terminal equipment through a network, such as Google Search, Google+, Youtube, Analytics, DoubleClick, +1, Google Location Services and Google Android based software. b) “personal data ”: any information relating to an identified or identifiable natural person, as defined in article 2(a) of Directive 95/46/EC, taking into account the clarifications provided in recital 26 of the same Directive. c) “processing ”: the processing of personal data as defined in article 2(b) of Directive 95/46/EC. d) “Sensitive data ”: any type of data revealing racial or ethnic origin, political opinions, religious or philosophical beliefs, trade-union membership, or data concerning health or sex life, as defined in article 8(1) of Directive 95/46/EC (“special categories of data”). e) “new privacy policy ”: Google’s new privacy policy which took effect on 1 March 2012. f) “non-authenticated user ”: a user accessing a Google service without signing in to a Google account, as opposed to an “authenticated user ”. g) “passive user ”: a user who does not directly request a Google service but from whom data is still collected, typically through third party ad platforms, analytics or +1 buttons. h) “consent ”: any freely given specific and informed indication of the data subjects wishes by which he signifies his agreement to personal data relating to him being processed, as defined in article 2 (h) of Directive 95/46/EC.
    [Show full text]
  • Google Analytics User Guide
    Page | 1 What is Google Analytics? Google Analytics is a cloud-based analytics tool that measures and reports website traffic. It is the most widely used web analytics service on the Internet. Why should we all use it? Google Analytics helps you analyze visitor traffic and paint a complete picture of your audience and their needs. It gives actionable insights into how visitors find and use your site, and how to keep them coming back. In a nutshell, Google Analytics provides information about: • What kind of traffic does your website generate – number of sessions, users and new users • How your users interact with your website & how engaged they are – pages per session, average time spent on the website, bounce rate, how many people click on a specific link, watch a video, time spent on the webpage • What are the most and least interesting pages – landing and exit pages, most and least visited pages • Who visits your website – user`s geo location (i.e. city, state, country), the language they speak, the browser they are using, the screen resolution of their device • What users do once they are on your website – how long do users stay on the website, which page is causing users to leave most often, how many pages on average users view • When users visit your website – date & time of their visits, you can see how the user found you. • Whether visitors came to your website through a search engine (Google, Bing, Yahoo, etc.), social networks (Facebook, Twitter, etc.), a link from another website, or a direct type-in.
    [Show full text]