Easy Strategies for Computers to Avoid the Public Turing Test
Total Page:16
File Type:pdf, Size:1020Kb
ESCAPT: Easy Strategies for Computers to Avoid the Public Turing Test Andrew Dempsey Mentor: Ming Chow Fall 2014 Abstract In this paper, we will examine the effectiveness of a variety of forms of the Completed Automated Public Turing test to tell Com- puters and Humans Apart (CAPTCHA). The test is in use on al- most all popular websites today to differentiate human users from software, reducing the threat of bots exploiting the sites for malicious purposes. CAPTCHA generally requires users to interpret images or audio segments that are understandable by humans but are too dis- torted for algorithms to dissect. However, as computer vision and hearing technology advances, the efficacy of these techniques may be reduced. Here, we will study the mechanisms of programs used to deceive CAPTCHA and explore modern, more advanced methods of the test that can provide robust defenses against automated systems. Contents 1 Introduction 1 1.1 The Turing Test . 1 1.2 The Origin of CAPTCHA . 1 1.3 CAPTCHA Variants . 2 1.3.1 Visual . 2 1.3.2 Audio . 3 1.3.3 Semantic . 3 2 Exploitation 4 2.1 Optical Character Recognition . 4 2.2 Learning . 6 2.3 Farming . 7 3 To the Community 8 4 Action Items 8 4.1 Challenge Generation . 8 4.2 Network Security . 9 4.2.1 Same-Origin Policy . 9 4.2.2 HTTPS . 9 4.3 User Behavior . 10 5 Conclusion 10 1 Introduction 1.1 The Turing Test The eponymous Turing Test, proposed by Alan Turing in the 1940s, was a concept meant to illustrate the potential difficulty to determine if one is interacting with a computer or a human as artificial intelligence progressed in quality.[1] Turing's initial idea was the following: a human user sits in front of a computer and is told to talk to some \entity" in a chatroom. The \entity" could be either a human or an artificial intelligence; the subject's job is to determine which one.[1] If a computer is able to successfully dupe the subject into believing they are into interacting with a human, the software is said to have passed the Turing Test. Along similar lines, there is another variant also of great interest: given a machine interacting with either a human or software, can the machine determine if its partner is human or not? This problem has formed the basis of a modern, very difficult security problem. 1.2 The Origin of CAPTCHA In the 1980s, online chat rooms had begun to rise in popularity. With these chat rooms came word detection, written by the site owners to raise alarms if members began talking about unscrupulous subjects.[2] To counteract these measures, many users began adopting a method of typing to obscure the con- tent of their messages to software but still keep them plainly visible to other humans. Users would substitute similar characters for others (or other sets of characters) to avoid using normal letters, e.g., \/n/n" for \M", \3" for \E" etc.[2] This method of typing would later come to be known as leetspeak. While this concept was originally invented for black-hat purposes, it was adopted at a more advanced level by AltaVista in the 1990s. AltaVista was hoping to defend their search engine against malicious bots by devising a sys- tem to deflect requests from software and allow requests from users.[2] Their defense consisted of presenting a user with an image of distorted text and asking them to enter the contents. The goal was to devise an image gener- ator that would create text too obfuscated for optical character recognition engines to recognize but allow human users to correctly interpret them.[2] This system was the birth of the Completely Automated Public Turing test to tell Humans and Humans Apart (CAPTCHA), which has now been in use on millions of sites on the web since its invention for similar purposes.[2] 1 1.3 CAPTCHA Variants 1.3.1 Visual The most common form of CAPTCHA is the visual variant. The user is presented with an image of distorted text, and needs to enter the images content into a textbox. The letters in the image are warped and rotated such that optical character recognition (OCR) algorithms cannot process them correctly, but humans can still recognize them.[3] Additionally, the im- ages will often have multicolored backgrounds with irregular patterns meant to induce noise and thus make the images even harder to decode. However, visual CAPTCHAs have been very controversial ever since their inception, as many users have difficulty decoding the messages, especially those with visual disabilities.[3] Figure 1: Example of a visual CAPTCHA image An alternative form of visual CAPTCHA, known as the Gimpy method, presents several distorted words in an image, often overlaid over each other. The user is generally required to name some (but not all) words in the image to pass the test.[3] 2 Figure 2: Example of a Gimpy CAPTCHA 1.3.2 Audio In an audio CAPTCHA, the user will often need to transcribe characters spoken in an audio clip. The voice speaking the characters is often either distorted or very low quality; the actual voice may change from character-to- character as well (e.g., older man to younger woman etc.). Once again, audio CAPTCHAs are meant to be difficult for computers to parse and easy for humans to hear, but also suffer from usability issues, as many human clients have difficulty transcribing the audio as well.[2] Audio CAPTCHAs also suffer from a longer time required to complete (due to having to listen to the entire transcription) and are dependent on a proper setting (an audio CAPTCHA in a crowded public area without headphones, for example, would be very difficult). 1.3.3 Semantic Semantic CAPTCHAs differ from the previous two variants in the sense that they do not rely on an ability to perceive. Rather, it requires the user to con- nect abstract concepts together; for example, many semantic CAPTCHAs will ask the user to fill in the blank in a sentence so that the sentence \makes sense." Such a task is very abstract, and solving it would require fairly advance artificial intelligence. Another possible variant of a semantic CAPTCHA requires the user to identify an object; this can take the form of `identify the odd one one' in a series of images (e.g., `click the dog' when presented with six pictures, five of which are pictures of cats and one is a 3 dog) or choosing an object of a certain color or size.[4] Figure 3: Example of a semantic CAPTCHA 2 Exploitation In general, CAPTCHAs do sufficiently well to protect websites from malicious robots. However, there is a critical flaw in their nature: the effectiveness of CAPTCHA tests relies on the rudimentary nature of artificial intelligence, a field which is growing more advanced every year.[5] While CAPTCHA tests are sufficiently obfuscated as to do their job well currently, this may not be the case in the future. Many researchers have developed methods to fool CAPTCHA systems that are accurate enough to pose problems.[5] In this section, we will detail some of the methods researchers used to exploit CAPTCHAs. In particular, we will focus on visual CAPTCHAs. 2.1 Optical Character Recognition The supporting material of this paper includes a script decode.py that takes a CAPTCHA image as input and outputs the text content as a string. The script uses OpenCV, an open source python library for computer vision. The algorithm is fairly rudimentary, and is intended to be run on a simple vari- ant of the SnapHost CAPTCHA, a fairly popular CAPTCHA service. While the implementation is not particularly advanced, it is capable of interpreting images with a fair amount of accuracy. The method is inspired by [6] and [7] This is a typical SnapHost CAPTCHA: 4 Figure 4: SnapHost Visual CAPTCHA as presented to the user The first step is to reduce the noise in the image. The dotted background can make it difficult for the program to detect the edges of individual letters, so we use a method known as Non-Local Means Denoising to `clean' the image. NLMD works by obtaining similarly colored rectangles of an image, grouping them together, and tracking the values.[8] By then averaging the values together, we can overwrite all of the sampled rectangles with this new color, producing a much smoother result.[8] While the colors and brightness may no longer be exact, for our purposes, this is acceptable. The result of running NLMD on Figure 4 is as follows: Figure 5: CAPTCHA image with Non-Local Means Denoising applied From here, the next step is to extract each individual character into its own image. The first step is to increase the contrast between the letters and the background to make edge detection easier. To do so, we use a method known as thresholding. Thresholding chooses a color value as a background color, and will blacken any pixels with the same value as the background.[9] All other pixels will be turned white, resulting in a very high contrast image. However, because there is still some noise in our image, picking one exact threshold boundary will not work. Therefore, we use adaptive thresholding. Similar to before, the algorithm samples smaller rectangles of the image with similar contents and computes the means of their local area.[9] Any pixels that differ significantly from the mean will be colored white. The result of adaptive thresholding run on Figure 5 is as follows: The image is now ready for segmentation.