I Am Robot: (Deep) Learning to Break Semantic Image CAPTCHAs Suphannee Sivakorn, Iasonas Polakis and Angelos D. Keromytis Department of Computer Science Columbia University, New York, USA fsuphannee, polakis,
[email protected] Abstract—Since their inception, captchas have been widely accounts and posting of messages in popular services. Ac- used for preventing fraudsters from performing illicit actions. cording to reports, users solve over 200 million reCaptcha Nevertheless, economic incentives have resulted in an arms challenges every day [2]. However, it is widely accepted race, where fraudsters develop automated solvers and, in turn, that users consider captchas to be a nuisance, and may captcha services tweak their design to break the solvers. Recent require multiple attempts before passing a challenge. Even work, however, presented a generic attack that can be applied simple challenges deter a significant amount of users from to any text-based captcha scheme. Fittingly, Google recently visiting a website [3], [4]. Thus, it is imperative to render unveiled the latest version of reCaptcha. The goal of their new the process of solving a captcha challenge as effortless system is twofold; to minimize the effort for legitimate users, as possible for legitimate users, while remaining robust while requiring tasks that are more challenging to computers against automated solvers. Traditionally, automated solvers than text recognition. ReCaptcha is driven by an “advanced have been considered less lucrative than human solvers risk analysis system” that evaluates requests and selects the for underground markets [5], due to the quick response difficulty of the captcha that will be returned. Users may times exhibited by captcha services in tweaking their design.