Breaking Visual Captchas With Na?Ve Pattern Recognition Algorithms

Breaking Visual CAPTCHAs with Naïve Pattern Recognition Algorithms Chapter 1 Introduction

CAPTCHAs are short for Completely Automated Public Turing test to tell Computers and Humans Apart. The term "CAPTCHA" is coined in 2000 by Luis Von Ahn, Manuel Blum, Nicholas J. Hopper [1]. They are challenge-response tests to ensure that the users are indeed human. The purpose of a CAPTCHA is to block form submissions from spam bots – automated scripts that harvest email addresses from publicly available web forms. A common kind of CAPTCHA used on most websites requires the users to enter the string of characters that appear in a distorted form on the screen. CAPTCHAs are used because of the fact that it is difficult for the computers to extract the text from such a distorted image, whereas it is relatively easy for a human to understand the text hidden behind the distortions. Therefore, the correct response to a CAPTCHA challenge is assumed to come from a human and the user is permitted into the website. The need to create a test that can tell humans and computers apart is because of people trying to game the system -- they want to exploit weaknesses in the computers running the site. While these individuals probably make up a minority of all the people on the Internet, their actions can affect millions of users and Web sites. For example, a free e-mail service might find itself bombarded by account requests from an automated program. That automated program could be part of a larger attempt to send out spam mail to millions of people. The CAPTCHA test helps identify which users are real human beings and which ones are computer programs [2]. Spammers are constantly trying to build algorithms that read the distorted text correctly. So strong CAPTCHAs have to be designed and built so that the efforts of the spammers are thwarted. The easiest implementation of a CAPTCHA to a Website would be to insert a few lines of CAPTCHA code into the Website‘s HTML code, from an open source CAPTCHA builder, which will provide the authentication services remotely. Most such services are free. Popular among them is the service provided by www.captcha.net’s reCAPTCHA project and also Captchaservice.org. 1.1 Motivation:

Dept. of CSE, JNNCE. Page 1 Breaking Visual CAPTCHAs with Naïve Pattern Recognition Algorithms The proliferation of the publicly available services on the web is a boon for the community at large. But unfortunately it has invited new and novel abuses. Programs (bots and spiders) are being created to steal services and to conduct fraudulent transactions. Some examples:  Free online accounts are being registered automatically many times and are being used to distribute stolen or copyrighted material.  Recommendation systems are vulnerable to artificial inflation or deflation of rankings. For example, EBay, a famous auction website allows users to rate a product. Abusers can easily create bots that could increase or decrease the rating of a specific product, possibly changing people’s perception towards the product.  Spammers register themselves with free email accounts such as those provided by Gmail or Hotmail and use their bots to send unsolicited mails to other users of that email service.  Online polls are attacked by bots and are susceptible to ballot stuffing. This gives unfair mileage to those that benefit from it. In light of the above listed abuses and much more, a need is felt for a facility that checks users and allows access to services to only human users. It is in this direction that such a tool like CAPTCHA is created.

1.2 Background: The need for CAPTCHAs rose to keep out the website/search engine abuse by bots. In 1997, AltaVista sought ways to block and discourage the automatic submissions of URLs into their search engines. Andrei Broder, Chief Scientist of AltaVista, and his colleagues developed a filter. Their method is to generate a printed text randomly that only humans could read and not machine readers. Their approach is so effective that in an year, “spam- add-ons” were reduced by 95% and a patent is issued in 2001. In 2000, Yahoo’s popular Messenger chat service is hit by bots which pointed advertising links to annoying human users of chat rooms. Yahoo, along with Carnegie Mellon University, developed a CAPTCHA called EZ-GIMPY, which chose a dictionary word randomly and distorted it with a wide variety of image occlusions and asked the user to input the distorted word. In November 1999, slashdot.com released a poll to vote for the best CS college in the US. Students from the Carnegie Mellon University and the Massachusetts Institute of

Dept. of CSE, JNNCE. Page 2 Breaking Visual CAPTCHAs with Naïve Pattern Recognition Algorithms Technology created bots that repeatedly voted for their respective colleges. This incident created the urge to use CAPTCHAs for such online polls to ensure that only human users are able to take part in the polls.

1.3 CAPTCHAs and the Turing Test: CAPTCHA technology has its foundation in an experiment called the Turing Test. Alan Turing, sometimes called the father of modern computing, proposed the test as a way to examine whether or not machines can think -- or appear to think -- like humans [3]. The classic test is a game of imitation. In this game, an interrogator asks two participants a series of questions. One of the participants is a machine and the other is a human. The interrogator can't see or hear the participants and has no way of knowing which is which. If the interrogator is unable to figure out which participant is a machine based on the responses, the machine passes the Turing Test. Of course, with a CAPTCHA, the goal is to create a test that humans can pass easily but machines can't. It's also important that the CAPTCHA application is able to present different CAPTCHAs to different users. If a visual CAPTCHA presented a static image that is the same for every user, it wouldn't take long before a spammer spotted the form, deciphered the letters, and programmed an application to type in the correct answer automatically. Most, but not all, CAPTCHAs rely on a visual test. Computers lack the sophistication that human beings have when it comes to processing visual data. Humans can look at an image and pick out patterns more easily than a computer. But not all CAPTCHAs rely on visual patterns [10]. In fact, it's important to have an alternative to a visual CAPTCHA. Otherwise, the Web site administrator runs the risk of disenfranchising any web user who has a visual impairment. One alternative to a visual test is an audible one. An audio CAPTCHA usually presents the user with a series of spoken letters or numbers. It's not unusual for the program to distort the speaker's voice, and it's also common for the program to include background noise in the recording. This helps thwart voice recognition programs. Another option is to create a CAPTCHA that asks the reader to interpret a short passage of text. A contextual CAPTCHA quizzes the reader and tests comprehension skills. While computer programs can pick out key words in text passages, they aren't very good at understanding what those words actually mean. CAPTCHAs are by definition fully automated, requiring little human maintenance or intervention in administering the test. This has obvious benefits in cost and reliability. Dept. of CSE, JNNCE. Page 3 Breaking Visual CAPTCHAs with Naïve Pattern Recognition Algorithms Modern Text based CAPTCHAS are designed in such a way that they require the simultaneous use of three separate abilities—invariant recognition, segmentation, and parsing —to correctly complete the task with any consistency.

Invariant recognition refers to the ability to recognize the large amount of variation in the shapes of letters. There are nearly an infinite number of versions for each character that a human brain can successfully identify. The same is not true for a computer and teaching it to recognize all those differing formations is an extremely challenging task. Segmentation, or the ability to separate one letter from another is also made difficult in CAPTCHAs, as characters are crowded together with no white space in between. Lastly, context is also critical. A complete holistic view of the CAPTCHA must be taken to correctly identify each character. For example, in one segment of a CAPTCHA, a letter might look like a “m.” It is only when the whole word is taken into consideration that it becomes clear that it is actually a “u” and an “n.” Each of these problems pose a significant challenge for a computer even in isolation. The presence of all three at the same time is what makes CAPTCHAS so difficult to solve.

There are various CAPTCHAs that would be insecure if a significant number of sites started using them. An example of such a puzzle is asking text-based questions, such as a mathematical question ("what are 1+1"). Since a parser could easily be written that would allow bots to bypass this test, such "CAPTCHAs" rely on the fact that few sites use them, and thus that a bot author has no incentive to program their bot to solve that challenge. True CAPTCHAs should be secure even after a significant number of websites adopt them [2].

Introduction, motivation and previous works on the field of CAPTCHA were described in this section of the report. Chapter 2 and 3 describes various types of CAPTCHA and also a scheme for breaking the CAPTCHA. Chapter 2 Types of CAPTCHAs

In this section, various classifications of CAPTCHAs are described. CAPTCHAs are classified based on what is distorted and presented as a challenge to the user. They are: 2.1 Text CAPTCHAs: These are simple to implement. The simplest yet novel approach is to present the user with some questions which only a human user can solve [6]. Examples of such questions are:

Dept. of CSE, JNNCE. Page 4 Breaking Visual CAPTCHAs with Naïve Pattern Recognition Algorithms 1. What is twenty minus three? 2. What is the third letter in UNIVERSITY? 3. Which of Yellow, Thursday and Richard is a colour? 4. If yesterday is a Sunday, what is today? Such questions are very easy for a human user to solve, but it’s very difficult to program a computer to solve them. These are also friendly to people with visual disability – such as those with color blindness. Other text CAPTCHAs involves text distortions and the user is asked to identify the text hidden. The various implementations are:

2.1.1 Gimpy: Gimpy is a very reliable text CAPTCHA built by CMU in collaboration with Yahoo for their Messenger service. Gimpy is based on the human ability to read extremely distorted text and the inability of computer programs to do the same. Gimpy works by choosing ten words randomly from a dictionary, and displaying them in a distorted and overlapped manner. Gimpy then asks the users to enter a subset of the words in the image. The human user is capable of identifying the words correctly, whereas a computer program cannot. Fig. 2.1 depicts the Gimpy CAPTCHA used in yahoo messenger.

Fig 2.1 Gimpy CAPTCHA

2.1.2 Ez – Gimpy: This is a simplified version of the Gimpy CAPTCHA, adopted by Yahoo in their signup page. Ez – Gimpy randomly picks a single word from a dictionary and applies

Dept. of CSE, JNNCE. Page 5 Breaking Visual CAPTCHAs with Naïve Pattern Recognition Algorithms distortion to the text. The user is then asked to identify the text correctly. Fig. 2.2 shows the Ez-Gimpy CAPTCHA used in yahoo in the signup page.

Fig 2.2 Yahoo’s Ez – Gimpy CAPTCHA

2.1.3 BaffleText: This is developed by Henry Baird at University of California at Berkeley. This is a variation of the Gimpy. This doesn’t contain dictionary words, but it picks up random alphabets to create a nonsense but pronounceable text. Distortions are then added to this text and the user is challenged to guess the right word. This technique overcomes the drawback of Gimpy CAPTCHA because, Gimpy uses dictionary words and hence, clever bots could be designed to check the dictionary for the matching word by brute-force. The fig.2.3 is an example for the Baffle text.

finans Fig 2.3 Baffle Text Example

2.1.4 MSN Captcha: Microsoft uses a different CAPTCHA for services provided under MSN umbrella. These are popularly called MSN Passport CAPTCHAs. They use eight characters (upper case) and digits. Foreground is dark blue, and background is grey as depicted in the fig. 2.4. Warping is used to distort the characters, to produce a ripple effect, which makes computer recognition very difficult.

XTNM5YRE

L9D28229B Dept. of CSE, JNNCE. Page 6 Breaking Visual CAPTCHAs with Naïve Pattern Recognition Algorithms Fig 2.4 MSN Passport CAPTCHA

2.2 Graphic CAPTCHAs: Graphic CAPTCHAs are challenges that involve pictures or objects that have some sort of similarity that the users have to guess[6]. They are visual puzzles, similar to Mensa tests. Computer generates the puzzles and grades the answers, but is itself unable to solve it.

2.2.1 Bongo: BONGO is named after M.M. Bongard, who published a book of pattern recognition problems in the 1970s. BONGO asks the user to solve a visual pattern recognition problem. It displays two series of blocks, the left and the right. The blocks in the left series differ from those in the right, and the user must find the characteristic that sets them apart. A possible left and right series is shown in Figure 2.5

Fig 2.5 Bongo CAPTCHA These two sets are different because everything on the left is drawn with thick lines and those on the right are in thin lines. After seeing the two blocks, the user is presented with a set of four single blocks and is asked to determine to which group the each block belongs to. The user passes the test if s/he determines correctly to which set the blocks belong to. Programmer should be careful to see that the user is not confused by a large number of choices.

2.2.2 PIX: PIX is a program that has a large database of labeled images. All of these images are pictures of concrete objects (a horse, a table, a house, a flower). The program picks an object at random, finds six images of that object from its database, presents them to the user and then asks the question “what are these pictures of?” Current computer programs should not be able to answer this question, so PIX should be a CAPTCHA. However, PIX, as stated, is not a CAPTCHA: it is very easy to write a program that can answer the question “what are these pictures of?” Remember that all the code and data of a CAPTCHA should be publicly available; in particular, the image database that PIX uses should be public. Hence, writing a program that can answer the question “what are these pictures of?” is easy: search the

Dept. of CSE, JNNCE. Page 7 Breaking Visual CAPTCHAs with Naïve Pattern Recognition Algorithms database for the images presented and find their label. Fortunately, this can be fixed. One way for PIX to become a CAPTCHA is to randomly distort the images before presenting them to the user, so that computer programs cannot easily search the database for the undistorted image. Fig 2.6 shows an example of the PIX CAPTCHA.

Fig 2.6 PIX CAPTCHA

2.3 Audio CAPTCHAs: This type of CAPTCHAs are based on sound. The program picks a word or a sequence of numbers at random, renders the word or the numbers into a sound clip and distorts the sound clip; it then presents the distorted sound clip to the user and asks users to enter its contents [7]. This CAPTCHA is based on the difference in ability between humans and computers in recognizing spoken language. Nancy Chan of the City University in Hong Kong is the first to implement a sound-based system of this type. The idea is that a human is able to efficiently disregard the distortion and interpret the characters being read out while software would struggle with the distortion being applied, and need to be effective at speech to text translation in order to be successful. This is a crude way to filter humans and it is not so popular because the user has to understand the language and the accent in which the sound clip is recorded. Fig 2.7 depicts the audio CAPTCHA.

Dept. of CSE, JNNCE. Page 8 Breaking Visual CAPTCHAs with Naïve Pattern Recognition Algorithms

Fig 2.7 Audio CAPTCHA

2.4 reCAPTCHA and book digitization: To counter various drawbacks of the existing implementations, researchers at CMU developed a redesigned CAPTCHA aptly called the reCAPTCHA. About 200 million CAPTCHAs are solved by humans around the world every day. In each case, roughly ten seconds of human time are being spent. Individually, that's not a lot of time, but in aggregate these little puzzles consume more than 150,000 hours of work each day. What if one could make positive use of this human effort? reCAPTCHA does exactly that by channeling the effort spent solving CAPTCHAs online into "reading" books. To archive human knowledge and to make information more accessible to the world, multiple projects are currently digitizing physical books that were written before the computer age. The book pages are being photographically scanned, and then transformed into text using "Optical Character Recognition" (OCR). The transformation into text is useful because scanning a book produces images, which are difficult to store on small devices, expensive to download, and cannot be searched. The problem is that OCR is not perfect. reCAPTCHA improves the process of digitizing books by sending words that cannot be read by computers to the Web in the form of CAPTCHAs for humans to decipher. More specifically, each word that cannot be read correctly by OCR is placed on an image and used as a CAPTCHA. This is possible because most OCR programs alert you when a word cannot be read correctly.

Fig 2.8 reCAPTCHA But if a computer can't read such a CAPTCHA, how does the system know the correct answer to the puzzle? Here's how: Each new word that cannot be read correctly by OCR is given to a user in conjunction with another word for which the answer is already known as shown in the Fig 2.8. The user is then asked to read both words. If they solve the one for which the answer is known, the system assumes their answer is correct for the new one. The system then gives the new image to a number of other people to determine, with higher confidence, whether the original answer is correct. Currently, reCAPTCHA is employed in digitizing books as part of the Google Books Project. Currently, there are mainly two kinds of methods to implement the CAPTCHA mechanism: OCR (Optical character recognition) visual method, non-OCR visual method. The CAPTCHA based on OCR visual method takes advantage of superiority in language barrier, security and easy use, becoming the most widely used CAPTCHA. However, with the fast development of OCR technology based on neural network, as well as the emergence of a variety of character segmentation technology, CAPTCHAs of lots of websites have been attacked. Though there are many different kinds of specific implementations for non-OCR visual method, it eventually comes down to the OCR problem in general, requiring users to identify images. It is not so widely used. Up to now, except some research sites, commercial sites rarely use it. Non-OCR visual method is designed for special occasions and certain user groups, thus it has very limited applications.

Chapter 3

Dept. of CSE, JNNCE. Page 10 Breaking Visual CAPTCHAs with Naïve Pattern Recognition Algorithms Breaking Visual CAPTCHAs with Naïve Pattern Recognition Algorithms

Visual CAPTCHAs have been widely used across the Internet to defend against undesirable or malicious bot programs. In this section, methods to break the visual schemes provided at Captchaservice.org, a publicly available web service for CAPTCHA generation is documented. These schemes were effectively resistant to attacks conducted using a high- quality Optical Character Recognition program, but were broken with a near 100% success rate by the novel attacks [3].

Captchaservice.org is the first web service that is designed for the sole purpose of generating CAPTCHA challenges. It supports multiple visual and non-visual CAPTCHA schemes. Using the API provided by this service, people can obtain a chosen type of CAPTCHA challenge generated on the fly to protect their blogs from comment spam attacks, or to defend against other type of bots.

Captchaservice.org supports the following four visual schemes:

• word_image: In this scheme, a challenge is a distorted image of a six-letter word. • random_letters_image: A challenge is implemented as a distorted image of a random six- letter sequence. • user_string_image: A challenge is a distorted image of a user-supplied string of at most 15 characters. • number_puzzle_text_image: This is a multi-modal scheme, which includes a distorted image of a random number, as well as a textual description of a puzzle involving the number. A user can solve such a challenge either by recognising the number in the image, or by solving the textual puzzle. The advantage of such a multimodal scheme is mainly to improve its usability and accessibility.

All these schemes use a random_shear distortion technique, which [4] describes, thus: “the initial image of text is distorted by randomly shearing it both vertically and horizontally. That is, the pixels in each column of the image are translated up or down by an amount that

Dept. of CSE, JNNCE. Page 11 Breaking Visual CAPTCHAs with Naïve Pattern Recognition Algorithms varies randomly yet smoothly from one column to the next. Then the same kind of translation is applied to each row of pixels (with a smaller amount of translation on average)”. In this section, algorithms and enhancements for breaking the most commonly used CAPTCHAs created by using word image and random letter image are described under the headings breaking scheme1 and breaking scheme 2 respectively.

3.1 Breaking Scheme 1: The following empirical observations found from the captchaservice.org : • Only two colors were used in each challenge, one for background and another for foreground which is the distorted challenge text; the choice of colors is either random or specified by the user. Therefore, it is easy to separate the text from the background.

Table 3.1 pixel count of each letter used to create CAPTCHA

Only capital letters were used. Although a letter might be distorted into a different shape each time, it consisted of a constant number of foreground pixels in a challenge image. That is, a letter had a constant pixel count. The pixel count for each of the letters A to Z as in table 3.1, were calculated. As plotted in 3.1, most letters had a distinct pixel count. Few letters overlapped or touched with each other in a challenge, so many challenges were vulnerable to a vertical segmentation attack: the image could be vertically divided by a program into segments each containing a single character. The attack algorithm is largely based on the above observations. One of its key components is a vertical segmentation algorithm, which works as follows.

1. Obtaining the top-left pixel’s colour value, which defines the background colour of an image. Any pixel of a different colour value in this image is in foreground, i.e. part of the distorted text.

2. Identifying the first segmentation line. map the image into a coordinate system, in which the top-left pixel has coordinates (0, 0), the top-right pixel (image width, 0) and the bottom- left pixel (0, image height). Starting from point (0, 0), a vertical “slicing” process traverse pixels from top to bottom and then from left to right. This process stops once a pixel with a nonbackground colour is detected. The X co-ordinate of this pixel, x1, defines the first vertical segmentation line X = x1 -1.

Fig 3.1 Graph Of Pixel Count V/S Letters

3. Vertical slicing continues from (x1+1, 0), until it detects another vertical line that does not contain any foreground pixels – this is the next segmentation line.

4. Vertical slicing continues from a pixel to the right of the previous segmentation line. However, the next vertical line that does not contain any foreground pixel is not necessarily the next segmentation line. It could be a redundant segmentation line, which would be ignored by the algorithm. Therefore, only when the vertical slicing process cuts through the

Dept. of CSE, JNNCE. Page 13 Breaking Visual CAPTCHAs with Naïve Pattern Recognition Algorithms next letter, the next vertical line that does not contain any foreground pixels is the next segmentation line.

5. Step 4 repeats until the algorithm determines the last segmentation line (after which, the vertical slicing will not find any foreground pixels). Once a challenge image is vertically segmented, the attack program simply counts the number of foreground pixels in each segment. Then, the pixel count obtained is used to look up Table 3.1, telling the letter in each segment. Fig 3.2 shows how the basic attack has broken a challenge. First, the vertical segmentation divided the challenge into 6 segments. Second, each segment is scanned to get the number of foreground pixels in it. Then, the pixel count obtained in the previous step is used to look up the mapping table, recognising a character Ci for each segment Si (i=1, …, 6). Finally, the string ‘C1 C2 C3 C4 C5 C6’ gives the result.

Fig. 3.2 Basic Attack To Break The CAPTCHA.

Enhancement: dictionary attack

The basic algorithm would fail to break some challenges completely. Fig. 3.3 gives a failing example, where the vertical segmentation method could not separate letters ‘S’ and ‘K’ because the vertical slicing line could not split the two letters without touching both of them. The basic attack could not do anything more than to give a partially recognized result “FRI**Y” (‘*’is used to represent one unrecognized character). However, since Scheme 1

Dept. of CSE, JNNCE. Page 14 Breaking Visual CAPTCHAs with Naïve Pattern Recognition Algorithms challenges all used words, the basic attack is enhanced by the following “dictionary attack”. A dictionary of about 6,000 six-letter English words is introduced.

Fig 3.3 Failing Example Of Vertical Segmentation Method

Any partial result returned by the basic algorithm is used as a string pattern to identify candidate words in the dictionary that match the pattern. Since there could be multiple candidate words, a simple solution is introduced to find the best possible result as follows. For each dictionary entry, pixel sum is pre-computed (using Table 3.1), which is the total number of pixels this word could have when it is embedded in a CAPTCHA challenge. This pixel sum is stored along with the word in the dictionary. A pixel sum for the unbroken challenge is also worked out, which is the total number of all foreground pixels in the challenge. The first candidate word with the same pixel sum as the challenge is returned as the final recognition result.

Fig 3.4 Enhanced Algorithm

Fig 3.4 illustrates how the enhanced algorithm worked. In this case, the partial result ‘FRI**Y’ obtained by the basic algorithm is used to identify all words that start with ‘FRI’ and end with ‘Y’ in the dictionary. Five candidate words were found: ‘FRIARY’, ‘FRILLY’, ‘FRISKY’, ‘FRIZZY’ and ‘FRIDAY’. However, ‘FRISKY’ is returned as the best possible result, since it is the only candidate having a pre-computed pixel sum of 987, which equals to the pixel sum of the unbroken challenge (133+208+121+372+153=987).

To make the dictionary attack work properly, it is crucial to create a correct string pattern after the vertical segmentation process. For example, when the vertical segmentation divided an image into only four segments and the corresponding partial result is in the following form: ‘B�B�’, it is important to determine how many unrecognized letters were in each box ‘�’. Otherwise, ‘B*B***’, ‘B**B**’ or ‘B***B*’ would give totally different recognition results. If all these patterns were used to look up the dictionary, it would be likely to find many candidates with an identical pixel sum. This is a problem of indexing letters in their correct positions, and it is addressed using the following two step method.

1) For some cases, it is trivial to work out a string pattern with contextual information. For instance, if a segmented image contained only one unrecognized segment, e.g. the example in Fig 7, the number of unrecognized characters in the segment is six minus the number of all recognized characters. Another straightforward case is when no character is recognized in an

Dept. of CSE, JNNCE. Page 16 Breaking Visual CAPTCHAs with Naïve Pattern Recognition Algorithms image – then the number of unrecognized segments in the image did not really matter. For example, an image segmented into three unrecognized segments ‘��’ would be no different to one for which the vertical segmentation completely failed.

2) When the above method did not work, e.g. in the case of ‘B�B�’, the number of pixels in each unrecognized segment in order to deduce how many characters the segment contained were considered. For example, when the number of pixels in a segment is larger than 239 (the largest pixel count in Table 3.1, i.e., ‘N’) but smaller than 2×239, it is likely that this segment had two unrecognized letters. There were exceptions that could not be handled this way. Although the average pixel count for letters A-Z is 178.80, ‘J’, ‘L’ and ‘I’ had a pixel count much smaller than the average. For example, the pixel sum of ‘ILL’ or ‘LIL’ is only 343; the pixel sum of ‘LI’ or ‘IL’ is a mere 232. On the other hand, the combination of ‘LLL’, ‘JK’ and ‘KJ’ never or rarely occurs in English words.

Further enhancements

The following enhancements were developed to handle typical “troublemakers” that could not be broken by the above techniques.

Letters with an identical pixel count. Letters having an identical pixel count could confuse the basic algorithm. For example, the challenge in Fig 3.5(a) is successfully segmented into 6 parts, but it is initially recognized as “OELLEY”, leading to an incorrect result. Since ‘O’ and ‘K’ have the same pixel count, the basic algorithm had only a 50% of chance for breaking this challenge.

To overcome this problem, the following “spelling check” method is considered: if a challenge includes a letter with a pixel count of 111 (‘J’ or ‘L’), 178 (‘K’ or ‘O’), or 162 (‘P’ or ‘V’), each variant is generated, and then carry out multiple dictionary lookups to rule out candidate strings that are not proper words. For example, in the above case, both ‘OELLEY’ and ‘KELLEY’ were looked up in the dictionary. Since only “KELLEY” is in the dictionary, it is returned as the best possible result. This “spelling check” technique is also used to enhance the string pattern matching in the dictionary attack. For example, if a partial result recognized by the basic algorithm is “V*B*IC”, then both “V*B*IC” and “P*B*IC” would be valid matching patterns for identifying candidate words in the dictionary.

Fig 3.5 Typical troublemakers: a) Letters with an identical pixel count; b) Broken letters; c) Letters with additional or less pixels.

Broken characters. A few challenges contained broken letters that misled the segmentation algorithm. As shown in Fig 3.5 b), due to a break in ‘H’, the letter is segmented into two parts instead of one. To overcome this problem, a two-step method as follows is introduced. First, once the vertical segmentation is done, the algorithm tested whether a segment is complete: if the number of foreground pixels in a segment is smaller than 111, the smallest pixel count in Table 3.1, then this segment is incomplete; if the number of foreground pixels in a segment is larger than 111 but smaller than 239 (the largest pixel count in Table 3.1, i.e., ‘N’) and this number could not be found in the lookup table, then this segment is incomplete. Second, an incomplete segment would be merged with its neighboring segment(s). A proper merging of segment is one for which the combined pixel count could lead to a meaningful recognition result, e.g. the combined count is equal to or less than 239, and it could be found in the lookup table. When multiple proper combinations existed (e.g. S3 can be combined either with S2 or with S4), spelling check could serve as the last resort to find the best possible result. Additional pixel(s). In a few cases, a letter might contain additional pixel(s) against its pixel count in the lookup table. For example, an additional pixel occurred above ‘A’ in Fig 3.5 c.

Dept. of CSE, JNNCE. Page 18 Breaking Visual CAPTCHAs with Naïve Pattern Recognition Algorithms To address this problem, an approximate table lookup is used: when a pixel count for a segment could not be located in the lookup table, this segment would be recognized as the most likely letter. This method does not succeed all the time, since some letters have close pixel counts (e.g., V, E and U; D, Z and S; M and W). However, sometimes, we could resort to the spelling check technique to find the correct result. For example, when multiple candidate answers were returned by the approximate method, spelling check could be used to choose the best possible solution.

3.2 Breaking Scheme 2 In Scheme 2 (random_letters_image), each challenge is a distorted image of a random six-letter sequence, rather than an English word. However, the challenge images in Schemes 1 and 2 share many common characteristics, such as: • Each image is of the same dimension: 178 × 83 pixels. Only two colors are used in the image, one for background and another for foreground which is the distorted challenge text. • Only capital letters are used. Few letters overlap or touch with each other. • Each letter has an (almost) constant pixel count. The one-to-one mapping from a pixel count to a letter in Table 3.1 is still valid. The basic attack algorithm in the previous section is also applicable to Scheme 2. However, the dictionary attack did not work here. It is possible (but expensive) to build a dictionary of 6-random-letter strings (26^6 =308,915,776 dictionary entries). However, the pixel sum matching would often return multiple candidates. Moreover, the spelling check technique is no longer applicable to differentiate letters with an identical pixel count. To boost the success rate, a new method is developed, largely based on the following new ideas: a “snake” segmentation algorithm, which replaced the vertical segmentation since it could do a better job of dividing an image into individual letter components, and some simple geometric analysis algorithms that differentiated letters with the same pixel count.

Snake segmentation The snake segmentation method is inspired by the popular “snake” game, which is supported in most mobile phones. In this game, a player moves a growing snake on the screen, and tries to avoid collisions between the snake and dynamic blocks. In the algorithm, a snake is a line that separates the letters in an image. It starts at the top line of the image and ends at the bottom. The snake can move in four directions: Up, Right, Left and Down, and it

Dept. of CSE, JNNCE. Page 19 Breaking Visual CAPTCHAs with Naïve Pattern Recognition Algorithms can touch foreground pixels of the image but never cuts through them. Often, a snake can properly segment a challenge that the vertical segmentation fails to do. The first step of the snake segmentation is to preprocess an image to obtain the first and last segmentation lines, as illustrated in Fig 3.6

The first segmentation line (X= xfirst) is obtained as in the vertical segmentation algorithm, and then the vertical slicing started at point (width, 0), moving leftwards to locate the last segmentation line (X= xlast). The top and bottom edges of the image between these two segmentation lines were starting and ending lines for a snake. Since each letter occupies some width, chose to refine the starting line by shifting 10 pixels to each segmentation lines. That is, for snakes, all possible starting points are between (xfirst+10, 0) and (xlast - 10, 0).

Fig 3.6 Snake Segmentation

Next, the snake segmentation is started to divide the pre-processed image into segments. The following heuristics control the movement of a snake: 1. Whenever feasible, a snake moves down vertically as much as possible. That is, Down is the direction that has the highest priority. 2. A snake moves down from its starting point until it is immediately above a foreground pixel. 3. When a snake can move Left and Up only, it moves left one pixel. And then moves down as much as possible. 4. When a snake can move Right and Up only, it moves right one pixel. And then moves down as much as possible. 5. When a snake can move right and left only, it goes right. (Priority order: D > R > L > U) 6. When a snake moves left, it cannot go to any point that is to the left of a previously completed segmentation line. 7. A vertical slicing line could be a legitimate segmentation line. 8. Distance control: when a snake reaches the bottom line, it is done. 9. If a snake cannot reach the bottom, it is aborted and all its trace is deleted. Dept. of CSE, JNNCE. Page 20 Breaking Visual CAPTCHAs with Naïve Pattern Recognition Algorithms 10. No matter whether or not the previous snake succeeded in reaching the bottom, the next snake starts one pixel to the right of the previous starting point. There could be multiple snakes between two segments, see Fig 3.6(b), where for example the red block between ‘K’ and ‘S’ were in fact a set of snake lines that touched each other. Therefore, the last step is to finalize the segments. This process dealt with the following tasks: 1) Getting rid of redundant snakes: if there is no foreground pixel in a segment, then this is an empty segment and one of its segmentation lines is redundant; 2) When necessary, handling broken characters by merging neighboring segments using the normal vertical segmentation. Fig 3.6(c) shows the finalized segments of a challenge, one for which vertical segmentation would fail to segment overlapping letters T, J and K.

Enhancement technique To enhance the snake segmentation approach, simple algorithms were designed to tell apart letters with an identical pixel count by analyzing their geometric layouts.

Differentiating between ‘P’ and ‘V’. When a segment had a pixel count of 162, it could be either ‘P’ or ‘V’. To determine which letter it is, this segment would be first normalized: its left segmentation line would be adjusted to cross its left-most foreground pixel vertically and similarly for the right segmentation line. Then, a vertical line would be drawn in the middle of the normalized segment. If this middle line cut through the foreground text only once, this segment would be recognized as ‘V’; otherwise, it is recognized as ‘P’. Fig. 3.6 shows the condition. It is unlikely for the middle line to cut through ‘V’ twice, since it is rare to use a rotated ‘V’ in a challenge presumably due to a usability concern: it would be very difficult for people to differentiate a rotated ‘V’ from a distorted ‘L’ or ‘J’.

Fig 3.7 Differentiating Between ‘P’ And ‘V’

Dept. of CSE, JNNCE. Page 21 Breaking Visual CAPTCHAs with Naïve Pattern Recognition Algorithms This method would not work when a ‘P’ or ‘V’ happened to have a crack in the middle of its normalized segment. However, it is trivial to address this exception: the middle line could be shifted horizontally a number of times, and each time the number of intersections it cut through the foreground would be checked. If two or more intersections occurred more often, then one can be sure this segment is ‘P’; otherwise it is ‘V’.

Telling ‘O’ and ‘K’ apart. When a segment had a pixel count of 178, it could be either ‘K’ or ‘O’. To determine which letter it is, a vertical line would be drawn in the middle of the segment. If this line cut through the foreground text only once, this segment would be recognized as ‘K’. If this cut-through line had two intersections with the foreground, the letter could be either ‘O’ or ‘K’. However, when observed that the distance between two intersections, denoted by d, is larger for ‘O’ than for ‘K’. In the algorithm, if this distance is larger than 14 pixels (an empirical threshold), the letter is recognized as ‘O’; else, it is recognized as ‘K’. Fig 3.7 shows a segment that is normalized and then successfully recognized as ‘O’ in this way.

Fig. 3.8 Calculating the diameter However, this method is not perfect. For example, if there is a break in letter ‘O’ and this break is exactly in the middle of the normalized segment, then the cutthrough line would only cross the foreground once. Thus, this letter would be wrongly recognized as ‘K’.

Differentiating between ‘L’ and ‘J’. To tell whether a segment is ‘L’ or ‘J’, the segment is first normalized. A horizontal line would then start to slice the segment horizontally from top to bottom, until it intersected the foreground text. If the intersection is closer to the left segmentation line, then the segment is recognized as ‘L’; if the intersection is closer to the right segmentation line, then the segment is recognized as ‘J’. Fig 3.8 depicts it. If the intersection is exactly in the middle, it is guessed by default as ‘L’.

Fig 3.9 Differentiating ‘L’ and ‘J’

Chapter 4 Applications

This section describes various applications of the CAPTCHA. CAPTCHAs are used in various web applications to identify human users and to restrict access to them [2]. Some of them are: 1. Online Polls: Bots can wreak havoc to any unprotected online poll. They might create a large number of votes which would then falsely represent the poll winner in spotlight. This also results in decreased faith in these polls. CAPTCHAs can be used in websites that have embedded polls to protect them from being accessed by bots, and hence bring up the reliability of the polls.

2. Protecting Web Registration: Several companies offer free email and other services. Until recently, these service providers suffered from a serious problem – bots. These bots would take advantage of the service and would sign up for a large number of accounts. This often created problems in account management and also increased the burden on their servers. CAPTCHAs can effectively be used to filter out the bots and ensure that only human users are allowed to create accounts.

3. Preventing comment spam: Most bloggers are familiar with programs that submit large number of automated posts that are done with the intention of increasing the search engine ranks of that site. CAPTCHAs can be used before a post is submitted to ensure that only human users can create posts. A CAPTCHA won't stop someone who is determined to post a rude message or harass an administrator, but it will help prevent bots from posting messages automatically.

4. Search engine bots: It is sometimes desirable to keep web pages unindexed to prevent others from finding them easily. There is an html tag to prevent search engine

Dept. of CSE, JNNCE. Page 23 Breaking Visual CAPTCHAs with Naïve Pattern Recognition Algorithms bots from reading web pages. The tag, however, doesn't guarantee that bots won't read a web page; it only serves to say "no bots, please." Search engine bots, since they usually belong to large companies, respect web pages that don't want to allow them in. However, in order to truly guarantee that bots won't enter a web site, CAPTCHAs are needed.

5. E-Ticketing: Ticket brokers like Ticketmaster also use CAPTCHA applications. These applications help prevent ticket scalpers from bombarding the service with massive ticket purchases for big events. Without some sort of filter, it's possible for a scalper to use a bot to place hundreds or thousands of ticket orders in a matter of seconds. Legitimate customers become victims as events sell out minutes after tickets become available. Scalpers then try to sell the tickets above face value. While CAPTCHA applications don't prevent scalping; they do make it more difficult to scalp tickets on a large scale.

6. Email spam: CAPTCHAs also present a plausible solution to the problem of spam emails. Therefore one has to use a CAPTCHA challenge to verify that an indeed a human has sent the email.

7. Preventing Dictionary Attacks: CAPTCHAs can also be used to prevent dictionary attacks in password systems. The idea is simple: prevent a computer from being able to iterate through the entire space of passwords by requiring it to solve a CAPTCHA after a certain number of unsuccessful logins. This is better than the classic approach of locking an account after a sequence of unsuccessful logins, since doing so allows an attacker to lock accounts at will.

8. As a tool to verify digitized books: This is a way of increasing the value of CAPTCHA as an application. An application called reCAPTCHA harnesses users responses in CAPTCHA fields to verify the contents of a scanned piece of paper. Because computers aren’t always able to identify words from a digital scan, humans have to verify what a printed page says.

Chapter 5 Conclusion and Future Scope

CAPTCHAs are an effective way to counter bots and reduce spam. They serve dual purpose– help advance AI knowledge. Applications are varied– from stopping bots to character recognition & pattern matching. A good CAPTCHA system should give consideration both to computer security and human friendliness. Designing good CAPTCHAs is a tedious business. Text CAPTCHAs achieve a high level of practicality, but very often fall short of providing a good balance between usability and security. CAPTCHA plays an important role in protecting Internet resources from attacks by automated scripts.

However, CAPTCHA is believed to be vulnerable to various attacks to breakdown the CAPTCHA and thus breakdown the security system. Fatal design mistakes were exploited to develop simple attacks that could break, with near 100% success, two visual CAPTCHA schemes provided by Captchaservice.org -- these schemes all employed sophisticated distortions, and they were effectively resistant to OCR software attacks and appeared to be secure. It is alarming and because many of today’s CAPTCHAs are likely only to provide a false sense of security. Thus the systematically breaking representative schemes will generate convincing evidence and establish valuable insights that will benefit the design of the next generation of robust and usable CAPTCHAs. Thus breaking of CAPTCHA helps in building more robust and secured CAPTCHA and thus provides secured online authentication.

References

[1] L. von Ahn, M. Blum, N. J. Hopper, and J. Langford. CAPTCHA: Using hard AI problems for security. Proc. of Int. Conf. on the Theory and Applications of Cryptographic Techniques (EUROCRYPT 2003), vol. 2656 of LNCS, pp. 294– 311, May 2003

[2] Sarika et al., International Journal of Advanced Research in Computer Science and Software Engineering: Understanding Captcha: Text and Audio Based Captcha with its Applications, June - 2013, pp. 106-115

[3] Jeff Yan and A. S. E. Ahmad. Breaking visual CAPTCHAs with naive pattern recognition algorithms. Proc. of 23rd Annual Computer Security Applications Conference (ACSAC 2007), pp. 279–291, Dec. 2007.

[4] T Converse, “CAPTCHA generation as a web service”, Proc. of Second Int’l Workshop on Human Interactive Proofs (HIP’05), ed. by HS Baird and DP Lopresti, Springer-Verlag. LNCS 3517, Bethlehem, PA, USA, 2005. pp. 82-96

[5] Athanasopoulos.E and Antonatos.S. Enhanced CAPTCHAs: Using animation to tell humans and computers apart. Proc. of 10th Int. Conf. on Communicationsand Multimedia Security (CMS 2006), vol. 4237 of LNCS, pp. 97–108, October 2006.

[6] Ferzli, R.; Bazzi, R.; Karam, L.J.; A Captcha Based on the Human Visual Systems Masking Characteristics; IEEE International Conference on Multimedia and Expo, 2006,pp517-520.

[7] T.-Y. Chan. Using a text-to-speech synthesizer to generate a reverse Turing test. Proc. of 15th IEEE Int.Conf. on Tools with Artificial Intelligence (ICTAI 03), pp. 226–232, November 2003.

Dept. of CSE, JNNCE. Page 26 Breaking Visual CAPTCHAs with Naïve Pattern Recognition Algorithms [8] C Pope and K Kaur. “Is It Human or Computer? Defending E-Commerce with captcha”, IEEE IT Professional, March 2005, pp. 43-49

[9] HS Baird, MA Moll and SY Wang. “ScatterType: A Legible but Hard-to-Segment CAPTCHA”, Eighth International Conference on Document Analysis and Recognition, August 2005, pp. 935-939

[10] AL Coates, H S Baird and RJ Fateman. “PessimalPrint: A Reverse Turing Test”, Int'l. J. on Document Analysis & Recognition, Vol. 5, pp. 158-163, 2003.

[11] Gabriel Moy, Nathan Jones, Curt Harkless and Randall Potter. “Distortion Estimation Techniques in Solving Visual CAPTCHAs”, IEEE Conference on Computer Vision and Pattern Recognition (CVPR'04), Vol 2, June 2004, pp. 23-28

[12] J. Yan and A. S. El Ahmad. Usability of CAPTCHAs or usability issues in CAPTCHA design. In SOUPS ‘08, pp. 44–52, New York, NY, USA, 2008.

[13] Jeremy Elson, John R. Douceur, Jon Howell, and Jared Saul. Asirra: a CAPTCHA that exploits interest-aligned manual image categorization. In ACM Conference on Computer and Communications Security, pages 366–374. ACM, 2007.

[14] B. Pinkas and T. Sander, “Securing passwords against dictionary attacks,” in Proc. of the 9th CCS, V. Atluri, Ed. ACM Press, Nov. 2002, pp. 161–170.

Dept. of CSE, JNNCE. Page 27