<<

STUDIES OF DYNAMIC COGNITIVE GAME CAPTCHA

USABILITY AND STREAM RELAY ATTACKS

A Thesis

Presented to the

Faculty of

California State Polytechnic University, Pomona

In Partial Fulfillment

Of the Requirements for the Degree

Master of Science

In

Computer Science

By

Tung T Nguyen

2016 SIGNATURE PAGE

THESIS: STUDIES OF DYNAMIC COGNITIVE GAME CAPTCHA USABILITY AND STREAM RELAY ATTACKS

AUTHOR: Tung T Nguyen

DATE SUBMITTED: Fall 2016

Computer Science Department

Dr. Mohammad Husain Thesis Committee Chair Computer Science Department

Dr. Tingting Chen Computer Science Department

Dr. Yu Sun Computer Science Department

ii

ABSTRACT

CAPTCHAs are a detection mechanism widely used on the Internet to distinguish a legitimate human user from a computer program. Most of the CAPTCHA schemes are vulnerable under relay attacks, which relaying CAPTCHA challenges to remote human solvers. Dynamic Cognitive Games (DCG) CAPTCHAs are a new CAPTCHA scheme which require user to play a small simple matching object game. Because of the dynamic and continuous interactive between user and the game, DCG CAPCHAs may offer strong resistance to relay attacks. In this paper, we focus on a DCG CAPTCHA relay attack when the game frames and responses are simply streamed between the attacker and a human solver. We also present a mechanism for detecting such a relay attach based on game statistics like play duration, mouse clicks, and incorrect drags and drops. To demonstrate the correctness of our detection mechanism, we report on these three aspects: 1. the performance of legitimate DCG users, 2. the performance of human solvers in DCG CAPTCHA streaming attack, 3. the detection of streaming relay attacks in DCG CAPTCHA of game features and statistics in streaming attack. Our results show that it is possible to detect the streaming based relay attack against DCG CAPTCHA with high accuracy. With these three studies, DCG is going to be a first CAPTCHAs scheme that can have detection mechanism for relay attacks.

iii

TABLE OF CONTENTS

SIGNATURE PAGE ...... ii ABSTRACT ...... iii LIST OF ...... vi LIST OF FIGURES ...... vii CHAPTER 1: INTRODUCTION ...... 1 1.1 Motivation ...... 1 1.2 Statement of Problem...... 3 1.3 Research Goals ...... 3 1.4 Structure of Thesis ...... 4 CHAPTER 2: LITERATURE REVIEW ...... 5 2.1 of CAPTCHA ...... 5 2.1.1 Definition ...... 5 2.1.2 Applications ...... 7 2.2 Classification of CAPTCHAs ...... 9 2.2.1 Text-based CAPTCHA...... 9 2.2.2 Image-based CAPTCHA ...... 14 2.2.3 Audio-based CAPTCHA ...... 16 2.2.4 Motion Objects CAPTCHA ...... 17 2.3 Captcha Features...... 19 2.3.1 Visual Features ...... 19 2.3.2 Anti-segmentation Features ...... 20 2.3.3 Anti-recognition Features ...... 21 2.3.4 Suggested CAPTCHA Scheme ...... 22 2.4 Attacks on CAPTCHA ...... 23 2.4.1 Attacks on Text-based CAPTCHA ...... 23 2.4.2 Attacks on Image-based CAPTCHA ...... 29 2.4.3 Attacks on Motion-based CAPTCHA...... 30 CHAPTER 3: METHODOLOGY ...... 32 3.1 Security in Design Prototypes of DCG CAPCHA ...... 32

iv

3.2 DCG CAPTCHA Instances and Parameters ...... 35 3.3 Design and Implementation of DCG CAPTCHA...... 37 3.4 Stream Relay Attack...... 38 3.5 Mechanical Turk ...... 40 3.6 Usability Study Design, Goals, and Process...... 43 3.7 Stream Relay Attack Study...... 44 CHAPTER 4: EVALUATION OF RESULTS...... 47 4.1 Usability Study of DCG CAPTCHA...... 47 4.2 Relay Attacks ...... 53 4.2.1 Difficulty of Relaying DCG CAPTCHA ...... 54 4.2.2 Stream Relay Attack Study...... 56 4.3 Stream Relay Attack Detection Mechanism...... 59 4.3.1 Detection with Different Error Rates and Duration Play Time...... 59 4.3.2 Using K-Nearest Neighbor Algorithm for Detection Mechanism...... 65 CHAPTER 5: CONCLUSION...... 70 5.1 Summary ...... 70 5.2 Future Work ...... 72 5.2.1 Improving with Different DCG CAPTCHA Gameplays...... 72 5.2.2 Improving with Stream Relay Attack with Offline Feedback...... 72 5.2.3 Improving with Usability of DCG CAPTCHA...... 73 REFERENCES ...... 74

v

LIST OF TABLES

Table 1: 5 points scale for usability study ...... 43

Table 2: Demographics of usability Mechanical Turk workers ...... 47

Table 3: Completion times, error rates in usability study...... 49

Table 4: Object speeds vs error rates in usability study...... 51

Table 5: User feedbacks on game features...... 52

Table 6: Stream relay attacks finish time and error rates ...... 57

Table 7: High latency attack compared with low latency attack in time and error...... 58

Table 8: K-NN prediction in low latency connection...... 66

Table 9: K-NN prediction in high latency connection...... 67

vi

LIST OF FIGURES

Figure 1: Wikipedia text based CAPTCHA...... 9

Figure 2: Gimpy CAPTCHA...... 10

Figure 3: EZ-GIMPY CAPTCHA...... 11

Figure 4: reCAPTCHA ...... 12

Figure 5: ASIRRA image based CAPTCHA ...... 15

Figure 6: NuCaptcha ...... 18

Figure 7: Chandavale [2009] algorithm to break text-based CAPTCHA ...... 24

Figure 8: Bursztein's pipeline to break text based CAPTCHA ...... 26

Figure 9: Generic algorithm of breaking text based CAPTCHA by Bursztein [2014] ..... 27

Figure 10: Match shapes DCG CAPTCHA game ...... 36

Figure 11: Legitimate user gameplay...... 38

Figure 12: Human solvers gameplay in a stream relay attack ...... 39

Figure 13: Front page of Mechanical Turk marketplace ...... 41

Figure 14: Creating HITs to collect data for DCG CAPTCHA...... 42

Figure 15: Age distribution of Mechanical Turk workers...... 48

Figure 16: Gender distribution of usability study...... 48

Figure 17: Stream relay attacks...... 55

Figure 18: Duration time play graph...... 60

Figure 19: Difference in duration game play...... 61

Figure 20: Error rates in high latency connection...... 62

Figure 21: Error rates in low latency connection...... 64

Figure 22: Error rates in legitimate users’ gameplay ...... 64

vii

Figure 23: K-NN graph of high latency relay attacks...... 68

viii

CHAPTER 1: INTRODUCTION 1.1 Motivation

Web-based and mobile services are used for banking, bidding, purchasing goods and services, for government and healthcare services, also for socializing and sharing data.

Although they are all very important services, they are often abused by automated means, such as denial-of-service, registration of spam email accounts, or password brute force attacks. These attacks are common security problems. To prevent such abuse, a primary defense mechanism is developed, called CAPTCHA [von Ahn 2003], a program that can be used to differentiate humans from malicious computers program using a task that is easier for human but harder for computer. CAPTCHAs are deployed by many online services, such as account registration, merchandise selling, and search engines to limit different types of attacks using automated bots. For example, Facebook has used

CAPTCHAs to reduce the registration of free social account by spammers and fraudsters.

Many CAPTCHAs are based on visual challenges, such as asking users to identify characters in the string of words or characters, but there are many other CAPTCHAs have also been proposed [Hidalgo and Alvarez 2011, Xu 2012]. Unfortunately, existing

CAPTCHAs suffer from several problems and CAPTCHAs are not foolproof. Many

CAPTCHAs schemes used in real life applications have been successful attacked. The tasks of solving CAPTCHA have been made easier by commercial solving services that attackers have recently using [Motoyama 2010]. There are two common categories of attacks: automated attacks and relay attacks.

Automated attacks [Keizer 2008] have been successful developed against many existing schemes. Pattern recognition and machine learning algorithms have been

1

designed that can recognize captcha with 99.80% accuracy [Tung 2014]. Real world attacks are also successful in attacking CAPTCHA provided by big companies like

Microsoft, Yahoo, PayPal and . Automated attacks are usually utilizing image processing algorithm to solve the CAPTCHA. While having high accuracy and creating strong attacks at many CAPTCHA schemes, automated attacks required expensive cost with strong servers and human resources.

Second, low cost attacks are developed whereby CAPTCHA challenges are relayed to, and solved by users on different websites or paid human solvers in remote country.

This is categorized as relay attack [Motoyama 2010] utilize human intelligence or remotely human solvers. All of the recent CAPTCHA schemes are considered easily vulnerable to relay attacks using human solvers, including opportunistic solving or paid solving services. Relay attacks are viable than automated attacks due to their simplicity and low economical cost [Motoyama 2010].

Third, the same distortions characters that are used to hide the underlying content of a CAPTCHA from computer recognition algorithms also degrade human usability

[Bursztein 2010]. Yan [Yan 2008] showed that distortion has a clear impact on the usability of CAPTCHAs, since users would find it is difficult or impossible to recognize over-distorted characters. Depending on how easy to submit CAPTCHA, users can get frustrated and give up using the services that deployed CAPTCHA. Companies might also lose customers.

Given these three problems, there is a need to consider a new CAPTCHA design that place human user at the center of the captcha design. Game CAPTCHAs, a new trend,

2

offer a promising approach by making CAPTCHA solving is a fun activity for the users.

Game CAPTCHA are challenges that are built using games that might be enjoyable and easy to play for humans, but hard for computers.

1.2 Statement of Problem

Understanding how CAPTCHAs work is needed for developing the new CAPTCHA scheme to prevent either automated attacks or human solvers attacks. As common in computer science research, especially with computer security research, the new

CAPTCHA scheme needs be fully tested with both attack study and attack detection study. These attacks are especially dangerous because it is difficult to distinguish between the fake requests from attackers and the authentic requests from users.

Therefore, understanding how the new CAPTCHA scheme performs is very essential.

In this paper, we focus on a new CAPTCHA scheme, called Dynamic Cognitive

Game (DCG) CAPTCHAs, which can detect relay attack due to their dynamic and interactive characteristics. This CAPTCHA challenges user to perform a game like cognitive task with a dynamic interacting and changing images. We present DCG

CAPTCHA scheme which has objects moving around within the dynamic images. The user’s cognitive task is to match the objects with their respective target or target’s area.

1.3 Research Goals

There are three dimensional studies with 40 Amazon Mechanical Turk participants to study the new DCG CAPTCHA scheme. The first is to study DCG CAPTCHA usability.

To evaluate the performance of DCG CAPTCHAs when solved by legitimate users, we need to conduct a usability study with 40 participants. This study has an important

3

component in Stream Relay attacks detection mechanism. Next, we formalize, design and implement the Stream Relay attack against DCG CAPTCHAs. We also need to perform a usability study again with 40 participants to measure the performance of streamed version of DCG CAPTCHA. We set up three cases of attacks in following settings: high latency attacks, low latency attacks, and reduced game size attacks. Third, based on the data collected from phase one and two, we design and evaluate the Stream Relay Attacks detection mechanism. The mechanism utilizes real time game statistics, such as gameplay duration, mouse clicks, and incorrect drags and drops, sending to machine learning algorithms to differentiate between legitimate users and human solvers users.

1.4 Structure of Thesis

This thesis is structured as follows: chapter 2 contains a literature review of variety types of CAPTCHA, current phase of attacking on CAPTCHA. This includes text-based

CAPTCHA, image-based CAPTCHA, moving object CAPTCHA, audio-based

CAPTCHA. Chapter 3 contains the definition, implementation of the new Dynamic

Cognitive Game CAPTCHA. Chapter 3 also contains the usability study, first research goals of this research. Chapter 4 discusses the experiments and evaluation the results of the experiments Stream Relay Attacks. This chapter also conducts the Stream Relay

Attack detection mechanism. Chapter 5 contains our concluding results and our future works related to this thesis.

4

CHAPTER 2: LITERATURE REVIEW

2.1 History of CAPTCHA

CAPTCHA stands for “Completed Automated Public Turing Test to Tell

Computers and Humans Apart.” The term CAPTCHA was first introduced in 2003 by von Ahn et al [von Ahn 2003], describing a test that can differentiate humans from computers. The original system was developed in early 2000s to prevent the spam bots pretending to be people. The P for Public means the CAPTCHA code and data should be public. CAPTCHA should be difficult for human to write a computer program that can pass the hard AI problem generated by CAPTCHA even if they know how CAPTCHA works. The T for “Turing Test to Tel” is because CAPTCHA is like reverse Turing Test

[Turing]. In the original Turing Test, there are a human, a computer, and a judge. The judge will ask series of questions between human and a machine (computer). Both of the human and the computer pretend to be both human. The judge’s job is to decide which of the contestants is human, and which is machine. CAPTCHA is the same with Turing

Test, with only one difference: the judge is now also a machine. This is the reason

CAPTCHA is called reverse Turing Test. The challenge is now administered by a machine. Human’s job is to convince the machine that he is human. Under common definition, the test must be easily solved by human, easily generated and evaluated, but not easily solved by computer.

2.1.1 Definition

A CAPTCHA is a cryptographic protocol whose underlying hardness assumption is based on an AI problem.

5

Definition 1: Define the success of an entity A over a test V by

SuccAV = Pr, r’ [ = accept]

We assume that A can have precise knowledge of how V works, the only piece of information that A can’t know is r’, the hidden randomness of V.

A CAPTCHA is a test V over which most humans have success close to 1, and for which it is hard to write a computer program that has high success over V. It is hard to write a computer program that has high success over a test V. If any program that has high success over V can be used to solve a hard AI program. Precisely stating CAPTCHA is a hard AI program can be a win-win situation: the CAPTCHA is not broken and

CAPTCHA can differentiate humans from computers, or the CAPTCHA is broken and the hard AI problem is solved.

Definition 2:

A test V is said to be (α, β) – executable when there is α number portion of human population has success greater than β over V. A CAPTCHA is a test V that is (α, β) – executable. We required that V should be publicly available.

To evaluate the success CAPTCHA, these following properties must be included in

CAPTCHA scheme:

Automated: it must be possible for a machine to automatically generate and grade the challenges

6

Secure: the algorithm used to generate the challenges must be available to ensure the difficulty of the CAPTCHA scheme. Every CAPTCHA schemes should be a hard AI problem.

Usability: challenges should be easily and quickly solved by humans, but not computer programs. CAPTCHA should be solvable and independent with user’s language, location, and any background.

2.1.2 Applications

A typical CAPTCHA is an image containing several distorted characters that usually appears at the bottom of the web registration forms. Users are usually asked to type in the characters to prove they are human. CAPTCHA has many applications for security purpose, including:

Online Polls: In November 1999, a website slashdot.com created an online poll asking which the best graduate school in computer science was. The same with most of the online polls at that time, IP address of the voter is recorded to avoid multiple votes from the same users. However, students from Carnegie Mellon and MIT wrote a script to automatically vote for their schools multiple of times. At the end, CMU won the poll.

With 21,156 votes, MIT was at the second place, while other schools have less than 1,000 votes. These situations requires program that can differentiate between human and computer programs.

Another application of CAPTCHA is in account registration process involved in free email services. Several companies offer free email services like Yahoo Mail, .

Microsoft Outlook Mail… requires users to prove they are human before they can get

7

free email services. These free email services require users answer the CAPTCHA challenge during the account registration. CAPTCHA is also needed in preventing different types of attack like denial of service or password dictionary attacks involving automated bots.

CAPTCHA is also used to prevent dictionary attack in password systems. All of the password systems will be broken if the websites allow users to iterate entire space of password. We need to prevent the password guessing loop by requiring real human entering passwords.

Google improves its Gmail service by blocking access to automated spammers. EBay improves its marketplace by preventing bots place automated bidding. Facebook prevents all automated bots when creating fraudulent spammers.

CAPTCHA is also used to tagging images and digitizing . Tags for images can be improved by learning tags inputting from legitimate users who solve the image-based

CAPTCHAs. It is important to make sure tags coming from legitimate users, not from human solvers. CAPTCHA uses input from legitimate users to digitized the text based images, which have blurring text that cannot recognized by computer character recognition. These text images are considered hard AI problem, which is hard enough for computer but easy for human. When using CAPTCHA to digitizing books, it could save time and computing resources. According to our estimates, human types hundreds of million CAPTCHAs every day. In each case, they spend a few seconds typing the distorted characters. This amount of working hours wasted is large enough to consider a new method of CAPTCHAs.

8

The key to develop a successful CAPTCHA scheme is the development of human verification algorithm that is balance between the security and usability. The successful

CAPTCHA scheme is hard enough for computer and even human solvers, but easy for legitimate user.

2.2 Classification of CAPTCHAs

CAPTCHAs are classified based on what is distorted, whether it is characters, digits, audio or images. Some main types of CAPTCHAs are based on text, image, audio, video, and moving objects.

2.2.1 Text-based CAPTCHA

The most widely used CAPTCHA schemes use combination of distorted characters and obfuscation techniques that humans can recognize but that may be difficult for automated scripts. It is very effective and requires a large test bank. In text based

CAPTCHA, simple asked questions based on arithmetic calculation are displayed in distorted characters, and users are required to have correct answer.

Figure 1: Wikipedia text based CAPTCHA. Source: Wikipedia login website, 2016

9

Gimpy CAPTCHA, as shown in figure 2, is the first ever text based CAPTCHA commercial implementation, built by Carnegie Melon University (CMU) in cooperation with Yahoo for their Messenger service to keep automated bots out of their chat room.

Gimpy is also created to prevent scripts from obtaining free Yahoo email addresses.

Gimpy picks seven words out of a dictionary pool, and creates a distorted image containing the words. Gimpy presents that image to the users, which contains the distorted image and the directions. Most humans can recognize three words from the distorted image, while current computer program are simply unable to do the same. All of the words used in Gimpy are taken from an 850 word dictionary based on Ogden’s Basic

English.

Figure 2: Gimpy CAPTCHA. Source: www.captcha.net

EZ-Gimpy, as shown in figure 3, is the simpler version of Gimpy CAPTCHA. EZ-

Gimpy was also implemented on Yahoo sign up page to prevent automated bots. EZ-

Gimpy picked random words in the same Ogent’s Basic Dictionary poll and distorted it.

Both Gimpy and EZ-Gimpy CAPTCHA are added cluttered textured background to confuse the modern OCR (Optical Characters Recognition) software.

10

Figure 3: EZ-GIMPY CAPTCHA. Source: www.captcha.net

ReCAPTCHA is a new type of CAPTCHA proposed by von Ahn [von Ahn 2008] within

CyLab-funded project. ReCAPTCHA project builds on the original definition and concept of using distorted textual images by adding more layers of security to

CAPTCHAs. In the process of scanning and digitizing books and other hand-written materials (by Google Books project and the non-profit ), approximately

20% of the words could not be recognized by the modern OCR software. Physical books and other texts written before the computer age are currently digitized to preserve human knowledge and to make information more accessible to the world. The are mostly older prints with faded written characters and yellow pages. This process is useful because the books can be indexed, searched, and stored in a digital format that can be preserved and analyzed later on. Although this process is convenient, the OCR process is far from perfect and digitizing the words in bitmap process of scanned texts. In contrast, human can recognize this faded ink with high accuracy percentage. In contrast, human can recognize this faded ink with high accuracy percentage. Unfortunately, human transcribers are expensive than computer OCR program. Therefore, only important documents are manually transcribed. According to von Ahn [von Ahn 2008], more than

40,000 websites are currently using reCAPTCHA scheme. They are demonstrating that old print material can be transcribed, word by word by solving CAPTCHA around the world. Taking advantage of this glitch, reCAPTCHA displays an image that is a

11

combination of two words. The first word is taken from pool of unrecognizable words by

OCR programs. This word is placed in an image along with another recognizable word; the two words are both distorted further using random distorted algorithms. Finally, reCAPTCHA adds random warping and lines into the image. The end result is a

CAPTCHA that is more complex and more difficult for any modern recognition algorithms to read, but still have a good usability feedback from users. The results entered by users are used to improve the digitalization process. To meet the goal of a

CAPTCHA (differentiate between human and a computers), the system needs to verify the users answer. To do that, reCAPTCHA enters one more control word. If the users correctly enter the control word, the system will assume that they are human. To lower the ability of being attacked by automated programs, von Ahn [von Ahn 2008] assigned the same frequency appearance for all of the control words. The more common words appear in the same frequency with the less popular words. Also, the dictionary of control words includes more than 100,000 words [von Ahn 2008]. So, the random guess possibility would only be 1/100,000 of the time.

Figure 4: reCAPTCHA Source: google.com

ReCAPTCHA was first introduced in 2008, and it quickly becomes the most secured CAPTCHA mechanism in detecting and preventing automated bots and

12

spammers. Furthermore, reCAPTCHA, which has a large scale deployment, has enabled von Ahn and his team to save time and mental energy. Instead of spending thousands of hours and human mental to transcribe unreadable words, millions of Internet users play an important part in and preservation of human knowledge. Von Ahn [2008] showed that reCAPTCHA images are deciphered and transcribed with 99.1% accuracy, an acceptable “over 99%” industry standard for professional human transcribing service.

Von Ahn [von Ahn 2008] also showed that in just one year launching reCAPTCHA,

Internet users have correctly deciphered and transcribed more than 440 million words, the equivalent of 17,600 books.

ReCAPTCHA is adopted quickly by websites because reCAPTCHA only uses words from scanned books, which OCR failed. Therefore, reCAPTCHA provides more security advantages for your websites. Second, the words used by reCAPTCHA are the hardest words from scanned books for computer to recognize. In contrast, these hardest words can be recognized by human with ease. Usability of reCAPTCHA has more advantages compared with different types of CAPTCHAs. Von Ahn [2008] showed that reCAPTCHA is required shorter time from user to solve compared with other types of

CAPTCHA. Third, websites adopts reCAPTCHA because they believe that using reCAPTCHA can help boost the digitization process of Google Books Project and

Internet Archive. ReCAPTCHA presents a proof of concept of an idea: wasted human processing power can be used to solve hard problem that computer cannot solve [von

Ahn 2008]

13

2.2.2 Image-based CAPTCHA

Chew and Tygar were the first to use images to create a new type of CAPTCHAs, called image CAPTCHA [Chew and Tygar 2004]. They proposed three tasks to conduct their image-based CAPTCHA mechanism. First, the naming images CAPTCHA presents six images to the user. The user requires to type the common term associated with the images to pass the ground. Second, the distinguishing images CAPTCHA presents two sets of images to users. The user is required to determine whether these two set of images have the same subject or not in order to pass the ground. Third, the identifying anomalies

CAPTCHA presents six images to the user. Five images are of the same subject, and one image shows a different subject. The user must identify the anomalous image to pass the test.

Words are collected by dictionary of 627 English words from Pdictionary

[Pdictionary 2004], which is a picture dictionary. The reason they chose Pdictionary because it is easy to illustrated or visualized words. Image CAPTCHAs are usually database-based, hence requiring an image database classification. Chew and Tygar collect first 20 hits from Google’s image search on each word in the dictionary.

Microsoft’s ASIRRA (Animal Species Image Recognition for Restricting Access) is a well-known example of image CAPTCHA [Elson et al 2007] (as shown in figure 5).

14

Figure 5: ASIRRA image based CAPTCHA ASIRRA is a Human Interactive Proof that works by asking users to identify photographs of cats and dogs. ASIRRA also asks users to categorize photographs depicting either cats or dogs. This task is difficult for computers, but people can finish it quickly and accurately. ASIRRA also has a good usability feature because many users think that it’s fun. Similar with reCAPTCHA mechanism, ASIRRA’s strength comes from a partnership with Petfinder.com [Saul 2009]. Petfinder has a world largest database of cats and dogs images, highly accuracy classified by thousands volunteers working in pet shelters in North America. ASIRRA generates challenges by displaying 12 images from database of million photographs that have been classified as cats or dogs. The same with reCAPTCHA mechanism, this partnerships allow ASIRRA can use “human computation” to classified thousands image of cats and dogs. This is mutual beneficial

15

partnership, ASIRRA also helps Petfinder by allowing them do display adoptme hyperlink within the CAPTCHA, allow the homeless pets can quickly adopted by the new owners. Microsoft study [Elson et al 2007] showed that ASIRRA captcha can be solved easily by human, but not by computer program. ASIRRA include two components:

A JavaScript client component that website owner add to your web page inside a form. If the challenge is solved correctly, the client code will receive the ticket from server to allow the continue access. The validation checks need to be checked by the server when the users press submits the form.

ASIRRA also provide a social benefit by finding home for homeless cats and dogs. The main disadvantage of ASIRRA is the database dependencies. Also, the image database is not publicized, hence it violates the properties of CAPTCHA. The database needs to be updated frequently and pre-classified by human. In this case, the database is mainly depends on Petfinder pre-classified database.

2.2.3 Audio-based CAPTCHA

Audio CAPTCHA is developed for users with poor vision or have difficult with

Visual based CAPTCHA. Audio CAPTCHA is often combined in the same interface as a

Visual CAPTCHA. Audio CAPTCHA is more difficult than regular CAPTCHA, especially for blind users who use screen readers program. In this type of CAPTCHA, first the user listens and submits the spoken word in English. Most of the CAPTCHA has been developed based on English words in dictionary, it even harder for users to listen words spoken in English. Therefore, audio CAPTCHA has an unfriendly usability, especially with people came from different languages other than native English speaker.

16

In this research paper, Bursztein [Bursztein 2010] showed that users agree on only 32.2% of Audio CAPTCHA and 33.6% of users have different answers. The first reason is because of the distorted noise added in the audio. It makes it even harder for impaired users to listen correctly the audio. The second reason is that the users need to remember all of the words before answering. Third, audio CAPTCHA has no instructions on what kind of words or letters to expect. For users that come from different background with different languages, audio CAPTCHA must configure the user interface with different language to make sure users can listen to the Audio CAPTCHA correctly. Audio playback is linear. A solver of Audio CAPTCHA listens to the audio, and then focuses on the answer box to provide the correct answer. For regular users, focusing on the answer box is very easy with single mouse click, but for vision impaired users, moving mouse to the answer box is very hard because they need to use the screen reader. Bursztein

[Bursztein 2010] also questioned about the effect of native language: can non-native speakers of English perform as well as native speakers in Audio CAPTCHA? Bursztein sees that non-native speakers of English are usually somehow slower than native English speakers. Bigham and Cavender [Bigham and Cavender 2009] suggested a new interface for Audio CAPTCHA. The new interface combined the answer text box with CAPTCHA play back buttons, allowing users control directly the playback only when the answer textbox is focused. The new interface received a good feedback from impaired users with

59% faster than regular Audio CAPTCHA [Bigham and Cavender 2009].

2.2.4 Motion Objects CAPTCHA

While text based CAPTCHAs, which require users to recognize distorted characters have been the most popular forms of CAPTCHAs, motion-based or video

17

CAPTCHAs that contain moving challenge have recently been proposed as the successor to static CAPTCHA. One example of the new breed of CAPTCHA is NuCaptcha

[NuCaptcha].

Figure 6: NuCaptcha

The main idea behind this CAPTCHA scheme is to exploit human ability in recognizing complex patterns in dynamic background. In NuCaptcha, users are shown a video with moving characters across the dynamic scene from right to left, and solve the

CAPTCHA by identifying the characters of the codeword. In NuCaptcha, the codeword is displayed in two words with two different colors, adding with warping and clutter. Also, consecutive characters might overlap so that the adjacent letters become the clutter.

Overlapping characters makes it very difficult to read, but adding more security to the

NuCaptcha scheme. For the codeword in NuCaptcha (security code), NuCapcha mentioned in its security features that adding random letters in the security code will make NuCaptcha more secured. NuCaptcha also mentioned that the network communication between users and the CAPTCHA server must be enciphered to enhance the security of NuCaptcha scheme. Computer vision techniques are not good enough to

18

recognize the characters in video based motion. Real users will be able to solve the challenges with little effort, less than 5 minutes, as claimed by NuCaptcha.

2.3 Captcha Features

There are three main features of CAPTCHA: visual features, anti-segmentation features, and anti-recognition features. Bursztein [Bursztein 2010] had a large scale evaluation of thirteen most used CAPTCHA scheme on the Internet, including

CAPTCHA from Authorize.net, Baidu, Captchas.net, Digg, EBay, Google, mail.ru,

Microsoft, reCaptcha, Skyrock, Slashdot, Blizzard, and Yahoo. Based on his paper, visual features, anti-segmentation features, and anti-recognition features are three main features of any CAPTCHA scheme. Visual features are not related to the security of CAPTCHA, but they are very important to the usability of CAPTCHA schemes, such as text size and font color. Anti-segmentation features are features to prevent programs from separating the CAPTCHA into individual characters. Anti-recognition features are features used to prevent characters recognition, like Optical Characters Recognition programs.

2.3.1 Visual Features

Visual features are important features that generate usability of any CAPTCHA scheme.

Visual features include character sets, number of characters, font size, font color, font types, foreground color, and background color. Characters sets include lowercase letters

(a-z) and uppercase characters (A-Z), digits number (0-9), and any combination between lower case letters, upper case letters, and digits. CAPTCHAs are generated based on any combination of characters set. It can be lowercase letters and digits, lowercase and uppercase and digits, lowercase characters only… Some of the CAPTCHA sets require that characters set not including confusable letters like 0, o, l, 1, I, 9, and g. There is

19

discussion about the English based character distribution because it is hard for non-native

English speaker to recognize the Latin-based characters. Captcha font families also are suggested used within familiar font like Arial, Times, Verdana… Number of characters in

CAPTCHA schemes is different between 5 characters to 15 characters. The font size is between 6 pixels and 36 pixels. The foreground colors and background colors are used to add more security to the CAPTCHA, and to help users can easily detects the CAPTCHA characters. Bursztein et al [Bursztein, Martin, and Mitchell 2011] suggested that any

CAPTCHA design should randomize the CAPTCHA length to prevent attackers detecting the fixed length of CAPTCHA. Also, they suggested that the characters font and font size should be randomized also. These two suggestions are not degrading usability of CAPTCHA; instead of preventing attackers know much information about the design principles of CAPTCHA scheme.

2.3.2 Anti-segmentation Features

Anti-segmentation features are designed to make it hard for any program want to separating characters in CAPTCHA into individual characters. Avoiding the possibility of segmentation is the main security problem for any CAPTCHA scheme. There are three main techniques for anti-segmentation: using a complex image background, having a background with similar font colors, and adding noise the CAPTCHA. Complex background makes CAPTCHA is hard for any segmentation programs. There are confusions between the real text and the lines shapes inside the background. This will prevent any segmentation from isolating the characters and segmenting the CAPTCHA.

Another way to add more security to the CAPTCHA is using colors that human eyes can see the differences but very close in RGB colors scale in computer machines. The best

20

way in anti-segmentation features is adding random noise to CAPTCHA. Random noise can have size between 2px and 10px, must have the same color with the CAPTCHA colors. Another approach in anti-segmentation features is using lines, straight lines or distorted lines crossing the characters. Bursztein et al [Bursztein, Martin, and Mitchell

2011] suggested that using collapsing or lines are there only two options of adding security to the CAPTCHA. Complex background with the same colors or adding more noise is optional to the design principles. Any CAPTCHA using complex background as a main visual feature is considered insecure design.

2.3.3 Anti-recognition Features

Some anti-recognition features are designed to make it harder for computer programs to recognize individual characters after segmentation. There are several ways for anti-recognition including number of blurring characters, waving characters, rotated characters, rotated degree, character distortion, and moving up and down characters.

Rotated characters counts should be around 1 to 6 characters, and they can be rotated between 10 and 350 degrees. The more rotated characters are the more difficulties for human to recognize the words. Vertical shifting characters from original location also add more security. Vertical shifting characters have small effect to the readability of the

CAPTCHA, but also it is unclear that vertical shifting has any effect on the security of the CAPTCHA scheme. Burstein et al [Burstein, Martin, and Mitchell 2011] suggested that using too many anti-recognition features will degrade usability of CAPTCHA. They recommended that new CAPTCHA scheme design should not use distortion because it is not sufficient to prevent recognition software. Anti-recognition features are harder for legitimate users to recognize the correct CAPTCHA, compared with anti-segmentation

21

features. They also suggested that using smaller characters set to improve user’s accuracy in solving CAPTCHA instead of ineffective adding security.

2.3.4 Suggested CAPTCHA Scheme

Burstein [Bursztein 2011] suggests that the CAPTCHA scheme should have 6 or more lowercase characters letters and digits in 20 pixels black Arial font on a white background. Lowercase characters letters should exclude all of the confusable letters like l, 1, o, 0… Burstein also claims that using a larger amount of characters and digits letters does not improve the security of CAPTCHA scheme, but it might decrease the usability of CAPTCHA. Therefore, there is not a good reason to have a larger characters set. The

CAPTCHA length should be randomized between 6 and 8 characters, and the character size should be randomized also with different character size. After that, waving the

CAPTCHA increases the difficulties of finding cut point (slice line) to de-segment the

CAPTCHA. The core design of the CAPTCHA scheme is the most important techniques.

Anti-segmentation and anti-recognition techniques only provide additional security to the

CAPTCHA scheme. The more number of characters CAPTCHA scheme has, the less accuracy users input. Also, complicated character sets do not provide more security of the

CAPTCHA scheme. Therefore, Burstein suggests that changing from characters set into digits. For anti-segmentation techniques, line and character overlap are the two recommended techniques to enhance the security to automated attacks. For anti- recognition features, Burstein [Bursztein 2010] suggests that an-recognition can strengthen the overall CAPTCHA scheme security, but it might not be relied on to protect your scheme.

22

2.4 Attacks on CAPTCHA

2.4.1 Attacks on Text-based CAPTCHA

Mori and Malik [Mori and Malik 2003] proposed an algorithm to recognize characters displayed in Gimpy CAPTCHA. Users need to identify 3 of the approximately

7 words in a clutter displayed image. Based on fast pruning algorithm, Mori and Malik

[Mori and Malik 2003] presents two algorithms that can identify the words in EZ-Gimpy and Gimpy 92% of the time. The algorithms based on three steps: pruning a large space of letter locations inside CAPTCHA into a small possible locations, extracting strings of letters that might form CAPTCHA words, and using machine learning to choose the most likely words by evaluating a matching score. In 2004, Moy at al [Moy et al 2004] used distortion estimation techniques to break EZ-Gimpy with a great successful rate of 85%.

In 2009, Chandavale et al [Chandavale, Sapkal and Jalnekar 2009] proposed an algorithm, as shown in figure 6, to break Gimpy or EZ-gimpy CAPTCHA. The algorithm has different phases such as preprocessing, segmentation, feature extraction and character recognition.

23

Figure 7: Chandavale [2009] algorithm to break text-based CAPTCHA

In the first phase preprocessing, input CAPTCHA image will be converted into cleared image by converting into gray scale, and then this gray scale image will go into reduced noise process. The CAPTCHA images are removed the black mesh background, line removal, dot removal, and white mesh background. Second, this gray scale image goes to the segmentation phase. Characters inside the image need to be segmented into separate characters. Next, all of the segmented characters have some unique set of features need to be recognized using character recognition. Using machine learning algorithm, based on features extraction in phases 3, characters are recognized in phase 4.

Mori and Malik [Mori and Malik 2009] are able to get a success rate of 80% text based

CAPTCHA.

There are many automated software CAPTCHA solvers, which combines segmentation algorithms and optical character recognition to identify the text present in

CAPTCHA. It is complicated and costly when building such algorithms because

CAPTCHA are designed to prevent any vision recognition techniques. Therefore, when

24

we evaluate any automated software CAPTCHA solvers, we need to evaluate the economic context of the software also. Xrumer is a well-known forum spamming tool, described as being one of the most advanced tools for bypassing many anti-mechanisms, including CAPTCHA. The full price of Xrumer is $650 for business account. Xrumer claimed that the new ReCaptcha is not a problem in the version 12.0.11. Although the high cost of Xrumer, many websites and forums will blacklist the IP address after multiple attempts in short time. Moreover, if Xrumer is put into wide release and the high success rate of bypassing CAPTCHA will alarm the website administrator about a spamming tool in work. With the high price, Xrumer needs to turn profits to the user before any changing in CAPTCHA design to prevent spamming tool. For this approach to be attractive, the cost of the software must be lower than the cost of human solving services, which is around $0.5/1,000 CAPTCHA.

In 2011, Bursztein et al [Bursztein, Martin and Mitchell 2011] developed a new

CAPTCHA solver named Decaptcha. This software solver uses five stage pipelines: pre- processing, segmentation, post-segmentation, recognition, and post processing as shown in figure 8.

25

Figure 8: Bursztein's pipeline to break text based CAPTCHA

In the first stage, the captcha’s background is removed using some algorithms provided by Bursztein and his team, and the image is stored in binary matrix (represent for black and white color). The problem of transform an image into the binary matrix is losing the color intensity. The advantage of transforming an image into the binary matrix makes the whole process easier for implementation. Next stage, Decaptcha wants to segment the captchas using several segmentation techniques. In the post segmentation, the segments are processed individually to make the recognition easier. In this phase, the segments sizes are normalized. In recognition stage, the classifier is taught what each letter looks like. Later on, the classifier is used to recognize each letter. In the post-processing, spellchecking is performed on the classifier’s output because the captcha scheme is based on the dictionary.

26

In 2014, Bursztein et al [Bursztein 2014] proposed a new generic algorithm to break the text based CAPTCHA including four components: the Cut-Point Detector, the

Slicer, the Scorer, and the Arbiter.

Figure 9: Generic algorithm of breaking text based CAPTCHA by Bursztein [2014]

The cut-point detector is for finding all possible cuts that can segment a

CAPTCHA into individual characters. This stage generates a large number of cuts that will be processed later by the slicer. The potential cuts are found by finding the second derivative of the curve generated by the bottom pixels of the CAPTCHA and the top pixels of the CAPTCHA. In next stage, the slicer process extract potential segments based on the cut points and build the graph. A potential segment is only considered meaningful if left and right boundaries of the segment are far enough, but not too far. In the third stage, the scorer traverses the graph of potential segments, using Optical

Characters Recognition on each potential segment, and assigns a recognition score. In the final stage, the arbiter, each sequence of segments has a vote weighted by the recognition scores in the third stage. Burstein et al [Burstein 2014] introduces a novel approach to solve text based CAPTCHA that uses machine learning to separate CAPTCHA into segments and to recognize each segments into characters. Despite the fact that

Bursztein’s algorithm has a high percentage of successful recognition, there are

27

possibilities of bad segmentation and bad recognition. These misclassifications need a re- learning method using human judgement. During the classification process, all of the hard-recognized segmentation needs to be put into a different set. This set needs to be manually recognized by human. The Burstein’s algorithm has a high accuracy significant on success rate when solving text based CAPTCHA scheme from ReCAPTCHA 2011,

Baidu, Yahoo, and CNN…

Typically, Optical Character Recognition (OCR) systems separate recognition into two main sub tasks: segmentation and classification. Segmentation is much more difficult than classification for OCR systems. Any text based CAPTCHA scheme is based on hard segmentation problems. Although the character classification was still required, the main challenge was correctly segmenting the string. Also, CAPTCHA generation is a combination task, which is easier than attacking CAPTCHA, analysis task. Analysis any

CAPTCHA scheme is much more difficult than creating CAPTCHA. Creator of

CAPTCHA scheme can use randomness and creation to generate hard CAPTCHA problem, while analyzer of CAPTCHA scheme needs to be consistent with the main idea of CAPTCHA scheme.

Despite of being a target for both researchers and attackers in a long time, text based CAPTCHA is still a strong CAPTCHA scheme which is still used widely on the

Internet. Although text based CAPTCHA is claimed to be successful attacked by different research papers, there is the frontline CAPTCHA scheme: reCAPTCHA, which has strong resistant to attacks using segmentation and recognition algorithms. Recently, reCAPTCHA is acquired by Google. Von Ahn [2008] currently works with Google under

28

CyLab projects to improve reCAPTCHA security. This move will help both Google and reCAPTCHA go further in CAPTCHA research field.

2.4.2 Attacks on Image-based CAPTCHA

The most popular image based CAPTCHA is Microsoft’s ASIRRA, proposed in

2007. ASIRRA CAPTCHA scheme relies on the problem of differentiate images of cats and dogs. Golle [Golle 2008] described a high accurate classifier algorithm in telling apart the images of cats and dogs. There are three phases in this algorithm: image collections, manual classification, and building a classifier. First, Golle [Golle 2008] collected distinct images from ASIRRA implementation publicly available online. This is a big disadvantage of using a public image database for the dictionary. Golle [Golle

2008] collected around 13.000 images of ASIRRA using basic method: refreshing the website multiple times. These images served an important role in the algorithm by providing a large image database. Next, all of the images are classified in three classes: cats, dogs, and other manually. After manual classifying process, the other images are discarded. Only images of cats and dogs are kept in the classification. Third, all of the images are put into a two vectors classifier machine trained on color and texture feature of images. The classifier has a high accuracy of 82.7%. Frisch et al [2010] proposed a new approach to attack image CAPTCHA called PixelMap. First, we calculate the

PixelMap for each of test images, and compared it to the pre-calculated PixelMap of the original image database. This comparison detects the learn image is the same with the current test image. Frisch et al [2010] claimed that the algorithm PixelMap successfully identified single distorted images in HumanAuth CAPTCHA and several other

CAPTCHA image databases. Despite the fact that it is very hard to naïve attack on image

29

based CAPTCHA, there is no efficiency image-based CAPTCHA scheme. ASIRRA from

Microsoft is the most success scheme that has a wide implementation over the Internet.

All of the image-based CAPTCHA schemes require the large image database that can be recognized easily by human, but not any OCR algorithm.

2.4.3 Attacks on Motion-based CAPTCHA

Xu et al [2012] proposed a naïve attack on NuCapcha, a moving object

CAPTCHA. First, we need to extract k frames from the motion based CAPTCHA, and identifying the foreground pixels in each frames. The attack algorithm assumes that attacker already convert NuCaptcha into frames. The same with Optical Character

Recognition on regular text based CAPTCHA; we need to segment the each frame into different segments by identifying the left and right most of each character in each frame.

This naïve attacks result in a success rate of 36%. This is a good result because this naïve attacks is specified for regular text based CAPTCHA. Yu et al [2012] then provides a new segmentation method that specifies for the moving object decoding process including four stages. In the first stage, the algorithm detects important features and their motion. There are two ways to detect important features: the first choice is detect important features by matching different frames. We need to identify the most interesting object in each frame: a bounding box and the interest point’s density evaluation. For the interest points density, we can use SIFT (Scale-invariant feature transform) to evaluate each frame. Combination of both bounding box and SIFT algorithm allow us to isolate the most interest point in each frame and match it. The second choice is tracking the differences between two frames. In the second stage, the detect features are either on the background or the codewords need to be detected. In NuCaptcha, because the codewords

30

move across the entire image, so the motion path of the codeword is detectable compared with the background. Also, in this stage, we need to keep tracking the moving directions of the codeword. In stage 3, with the tracked directions in stage 2, Yu et al [2012] applies segmentations on each frame. Each of the possible characters is bounded in a rectangular with different colors. This stage is the most important because we need the most accurate segmentation. In the final stage, codewords are extracted and classified for each frame.

Using machine learning method, the extracted codewords are compared with templates of each character and assigned a score weight. The most weighted characters are the required CAPTCHA characters. In the final enhancement, we create a feedback mechanism in which we use feedback from legitimate user to score weight all of the extracted codewords.

31

CHAPTER 3: METHODOLOGY

3.1 Security in Design Prototypes of DCG CAPCHA

Given the problems in text based CAPTCHA, audio CAPTCHA, and Moving-

Object CAPTCHA, there is a need to consider a new type of CAPTCHA that human user is placed at the center of CAPTCHA design. Game CAPTCHA is a new and promising scheme by attempting to make CAPTCHA solving a fun activity for the user. These are challenges that are built using small interactive games that are fun, enjoyable, and easy to play for human, but hard for computers to solve. There is a new form of game

CAPTCHA, called Dynamic Cognitive Game (DCG) CAPTCHA. This CAPTCHA challenges users to perform a game-like cognitive task with a series of dynamic images, such as a simple matching shapes game. We characterize a DCG CAPTCHA as having following important features: continuously moving objects inside the image frames, cognitive because it involves a series of continuous semantics of the images or visual recognition because it requires users to recognize images, and a game because it requires users to play a task of fun activity. Users’ task is to match the floating objects with their respective target, and drag or drop objects to the target locations. A star-up company named “are you a human” has recently offering such Dynamic Cognitive Game

CAPTCHA, but recently they require all of the users to take down all of the implemented

CAPTCHA to go private company.

The DCG captcha design scheme is the same as regular captcha: a bot (automated solving computer program) must only be able to solve the captcha challenges with no better result than human. Human should be able to solve this DCG CAPTCHA with a high probability than any computer program. In DCG captcha implementation, the

32

movement of the objects inside the image frames should not be recognized by the client machine. For example, in regular visual captcha, the characters embedded within the images should not be leaked out to the client machine. All of the characters movements should be embedded on server machine. Therefore, to avoid such leakage in the context of DCG captcha, it is important to provide an extra underlying security game platform of the implemented captcha. Web-based games are commonly developed using Flash and

HTML5 with JavaScript or Action Script 3. Both Flash and HTML5 are operated by downloading the game code to the client machine and running it locally on client machine. Hence, if these game platforms are directly used to develop the DCG captcha, the client machine will know the current position of the objects and their corresponding target regions, which can be used by the automated computer program to solve the challenges easily. Despite the fact that to solve the game and response the correct answer to the server is hard, this will undermine the security of DCG captcha.

The above problem can be solved by adding encryption to the game source code, which will make it difficult for the attacker on the client machine to extract the game source code. This also makes it difficult for the attacker to get the correct response communication between the server and client machine. Commercial tools, such as SWF

Encrypt, can encrypt the source code of Flash game to prevent the decryption from attacker. This approach works under the assumption that attacker does not have any ability to learn the keys used to decrypt or decompile the code. Another approach to enhance the security of DCG CAPTCHA scheme is about storing all of the logic and source code on server side. This approach requires the server to stream the game output to users. The second approach of streaming is more secure but may degrade the gaming

33

experiences. Continuous streaming is expected to be more secure method, but the different latencies will lower the usability of DCG CAPTCHA.

In our model, we assume that our implementation provides the continuous feedback to the user as to whether the objects are dragged and dropped to the specific target regions corresponding to the correct answers or not. Our implementation Flash version also indicates when the game successfully finishes or time out. This feedback is important from the usability perspective because the users may get confused if the captcha is hard to guess the answer. The attackers are free to utilize all of this feedback in attempt to solve the challenges, but within the timeout range. We also assume that it is possible for the secured server to detect the brute force attacks. We assume that when the attacker tries to drag and drop the target objects to the regions repeatedly to complete the game successfully, the server can detect such attacks. The detection is possible by simply capping the number of drag and drop attempts per moving objects in each attempt.

In addition to automated attacks, the security model for DCG captcha must also consider human solver relay attacks. These human solver relay attacks are easier than automated attacks because of their simplicity and low cost. In a relay attack, the bot forwards the captcha challenges to a human solver elsewhere on the Internet. It can be an opportunistic relay attack or sweatshop relay attacks. The user solves the challenges and sends the responses to the bot; and the bot simply relays these responses to the server.

Unfortunately, most of the current captcha mechanisms are unsecured under such a relay attacks model. Character recognition captchas are broken via such relay attacks. For DCG captchas, they should provide some resistance to such human solver relay attacks.

34

3.2 DCG CAPTCHA Instances and Parameters

Due to the legal restrictions on attacking commercial DCG captchas, we must develop our own animation based on DCG captcha prototypes for the purpose of our study. Using Adobe Flash, we implemented our own version of DCG Captcha, and we analyzed these versions. With our own version, we can freely change number of objects, different types of parameters like frame per second, refresh rate, answer object speeds or direction. We also can freely change size of the game, and investigate the DCG captcha security and usability.

We can have many forms of DCG captchas. It can be based on visual matching of objects. It can be static or moving objects. Each DCG captcha can be characterized by the following distinct components

 Answer object: a moving object that should be dragged to the corresponding target

object in order to successfully complete the game.

 Target object: an object onto which the corresponding answer object should be

dragged

 Target area: the area within which the target objects locate

 Activity area: the area within which the foreground objects move.

35

Figure 10: Match shapes DCG CAPTCHA game

In this paper, we have one category and 2 different instances of DCG captcha with static target as shown in figure 5 above.

 Single target object

 Multi target objects

For each of these four instances, different parameterizations will affect security and usability. These include: (1) the number of foreground moving objects, including answer objects and other noisy objects; and (2) the speed with which the objects move. The larger of moving objects and the higher speed, the more difficult and time consuming for human user to identify the objects and drag and drop them correctly, which will degrade the usability. However, the number and speed of moving objects make it harder for computer program to finish the game successfully, which may improve the security.

36

3.3 Design and Implementation of DCG CAPTCHA.

One of the most popular existing DCG captcha is “are you a human” DCG captcha. Due to the legal consideration, we were aware about attacking directly to commercial captcha. Developing an automated attack against these commercial captchas will against this company asserted terms and conditions. It is also against the law in

America. So, we need to design our own version of DCG captchas and analyzed these versions.

The game image, frame size is 360x130 pixels, which can be easily fit into a regular web page, such as in the login web form. Each game starts by placing the objects in certain pre-specific locations on the image. Then, each object will randomly rotate on random direction. There are total 8 directions were used, namely, N, S, E, W, NE, NW,

SE, and SW. If the chosen direction is one of 4 directions E, W, S, N, the object will move across X or Y direction with the speed of 1 per frame in that direction.

Otherwise, the object will move around 1.414 pixels per frame along the diagonal movements. The frame rate is set at 40 frames per second. Therefore, the foreground objects move at an average speed of ((1 + 1.414)/2 * 40) pixels/second. The game starts when the user drags/clicks an object. Each game briefly explains to users their task, e.g.,

“Match the Shapes from the left to the correct position on the right hand side”. The game ends when the user clicks/drags all the correct objects to the corresponding targets. To successfully match an object with its target, the user clicks inside the bounding box across the shape of the object, drags the object and drops it be releasing it inside the bounding box. For each of the 2 games, we set 5 parameterizations, choosing objects speed (low, medium, high) as (12, 36) frames per second, and number of moving objects

37

as (1, 3). For each game, we used different combination of the speed and number of objects. For total, we have 18 different combinations of objects.

3.4 Stream Relay Attack.

Web based games, which are developed using Flash and HTML5 with JavaScript, are operated by downloading the game code to the client machine and executing it locally. To make it difficult for the attacker to capture the answer objects and their respective locations from the downloaded game code, DCG captcha security model must encrypt the game code. The server also needs to implement a method, or use third party security software to prevent the games from being downloaded and embedded onto the different server.

Figure 11: Legitimate user gameplay.

In a normal setting, when a legitimate user U interacting with the web service W, the server W would send the encrypted DCG code to U, U would render it locally on his/her machine, and play the game. When user U successfully finishes the game, the log of all of U’s mouse interactions with the game is send to the server W. W then runs the detection algorithm on input of this log, and responds back by accepting or rejecting user

38

U. All communication between user U and web server W takes place over a secure channel.

Stream R

Figure 12: Human solvers gameplay in a stream relay attack

Under Stream Relay attack, the attacker A capture the DCG captchas challenge from W, just like a legitimate user. The attacker runs a streaming server and the human solver S connects to the attacker machine through a streaming client, such as a VNC client embedded within a web browser. This streaming software is responsible for

39

delivering the DCG captcha frames to S and sending all of S’s mouse interactions, such as drag and drop, mouse clicks and positions to A. A then simply forward the log of this interaction between S and the game to the web server. Finally, W would run the detection algorithm on input of this log, and responds back by rejecting or accepting A. Because of the network latency, S may suffer from the degradation of game quality at his or her end.

This degradation will decrease the game performance of DCG captcha. It will make the interaction between human solver and web server distinguishable from the interaction between the legitimate user and the game. Therefore, there is a possibility for the server to detect the relay attack using human solvers.

3.5 Mechanical Turk

All of our studies were performed using Amazon Mechanical Turk System.

Amazon’s Mechanical Turk is a novel, open online market place for getting work done by others. The Mechanical Turk system is a way to connect researchers that need human resources to evaluate or survey some research with small compensation fee. The front page of Amazon Mechanical Turk is shown in Figure 8 below:

40

Figure 13: Front page of Amazon Mechanical Turk marketplace

The site has a large and diverse workforce including over 100,000 users from over

100 countries who complete tens of thousands of tasks daily [Pontin 2007]. This page allows users to choose from one of two positions: workers and requesters. A worker is a user that agrees to perform short tasks for a small compensation fee, while a requester is a user that generates these tasks. Amazon calls each task a Human Intelligent Task (HIT).

These are generally considered things that cannot be easily finished by a computer. For example, requesters can create small task like doing surveys, summarization, or doing some experiments. Workers can browse available tasks and paid upon successful completion of each task. Requesters deposit money in an account using a credit card.

Requester set the compensation for each task (HIT) before posting a task. All of the payment can be awarded automatically or manually to the workers based on the quality of each worker submission. Amazon also takes small percentage from the compensation.

Mechanical Turk marketplace is generally considered a monetary market place, but the amount awarded to the workers is generally small amount (like a few cents for each task).

Amazon creates sample templates for some common tasks such as categorization, data collections, sentiment analysis, surveys, image tagging. Also, Amazon allows requesters can create their own templates.

41

Figure 14: Creating HITs to collect data for DCG CAPTCHA.

Benefits of Mechanical Turk:

Amazon’s Mechanical Turk is a relatively new compared with other data methods. It has been released in 2005 [Paolacci et al. 2010]. However there are many studies showed that results obtained by Mechanical Turk experiments can be reliable just as traditional methods. According to Ipeirotis [Ipeirotis 2010], 50% of workers were from the United States and 40% were from India in 2010. Also, the workers are considered younger than general population. They had lower incomes than the general population. They also considered how the compensation rate affects the rate of participation for completing the tasks, according to Buhrmester et al. [2010].

Compared with traditional experiments, Mechanical Turk experiments can generally be completed faster and with a lower cost. Our studies have specific requirements, and they

42

are not finished within few minutes like other. However, compared with traditional methods of data collection, Mechanical Turk is generally considered faster.

3.6 Usability Study Design, Goals, and Process.

Our usability study involved 40 participants who were primarily students from various backgrounds. This usability study goal is to measure legitimate user’s gameplay performance of our version of DCG captchas. Forty Mechanical Turk workers were recruited, and paid $0.05 for their work that took them approximate five minutes. The study was web-based and included three phases. The first phase involved registering and briefly explaining the participants about usability study. All of the participants required to sign the consent information before proceeding with the usability study. In the next phase, all of the participants demographics were collected and then the participants playing the different DCG captcha games. In this phase, we let the users play around with the captchas. We also did not mention about the main purpose of this study is about captcha or security, but rather about the usability of the new web interface. In the last phase, all of the participants were required to answer questions about their experience with the tested DCG captchas. We required users to rate the DCG captchas in standard 5- point scale (‘1’ for strongly disagreement and ‘5’ for ‘strongly agreement’). Also, we asked users additional questions about their experience with DCG captcha usability.

Table 1: 5 points scale for usability study

1 Select if you are strongly disagreement 2 Select if you are disagreement 3 Select if you are neither disagreement or agreement 4 Select if you are agreement 5 Select if you are strongly agreement

43

In the actual usability study phase, each participants played different instances as discussed in previous section. We collected the result to study how different parameterizations impact users’ solving captcha ability. Our goal is about studying following aspects of DCG Captcha:

 Efficiency: completion time of each game.

 Robustness: percentage of not completing the game, incorrect drag and drop

attempts of all users.

 Effect of game parameters: the effect of the object speed and number on

completion time and error rates.

 User experience: participants’ feedback about their experience with the games.

3.7 Stream Relay Attack Study.

Mechanical Turk workers were hired for Stream Relay Attack Study. The

Mechanical Turk workers were asked to connect to our webpage with different latency.

The same with the usability study, the workers were then asked to fill demographics from, play different DCG captchas version, and then answer some questions about their experiences. The participants were all paid the same amounts ($0.05) for their experiences. We were used three different experiments to test various relay attack scenarios, as described below:

High latency Relay: This first scenario included all collecting data from participants residing outside the US. Like the real life relay attack, the human solvers are normally hired from the sweatshops in remote countries by an attacker residing in US.

44

This setting reflects a real life relay attack. All of the Mechanical Turk workers are required to locate outside of US to maintain high latency when connecting to the DCG captcha.

Small Game Relay: The second attack scenario involved a case when an attacker tries to minimalize communication between attacker and solvers by reducing the information transfer using reduced game size. In this experiment, we used DCG captcha games with reduced small size compared with regular game size, approximately ¼ game size of the original game size used in high latency and low latency relay. Also, all of the participants in this test case required to be resided outside of US.

Low Latency Relay: In this last study case, we tested a setting in which attackers has low latency when connecting to the server (located inside US). All of the Mechanical

Turk workers in this case are all required to locate inside of US to maintain the low latency relay when connecting to the server.

All of Amazon Mechanical Turk requires connecting to the server located at tungtnguyen.info/matchShapes.html to do the Stream Relay Attack Study. When they fill out the worker form on Amazon Mechanical Turk website, they need to provide their location. All of the Mechanical Turk workers located outside of United States are labeled as high latency relay. All of the Mechanical Turk workers located inside of United States are labeled as low latency relay. Following the instructions provided on Amazon

Mechanical Turk, these workers need to finish the match shapes game. After they finish, they are required to follow the instructions, go the local storage in their computer, and upload the required log file. They must upload the log file to Amazon Turk websites to

45

get the full compensation. For each of the upload file, the Turkers get $0.05. When of these Turkers upload the log file to the website that means they all agree with the terms of agreement. Some of the Amazon Mechanical Turk might refuse to upload the log file because of privacy.

46

CHAPTER 4: EVALUATION OF RESULTS.

In this section, we present the results of the usability and Stream Relay attack studies. Our main goal was to analyze the efficiency, robustness, and user experience provided by the DCG captchas to the legitimate users and human-solvers in a relay attack.

4.1 Usability Study of DCG CAPTCHA.

Table 2, as shown below, is the result of usability study using Mechanical Turk workers.

Table 2: Demographics of usability Mechanical Turk workers

Gender (%)

Male 75.0% Female 25.0% Age (%)

<18 5% 18-25 50% 25-35 35% >35 10% Education (%) High School 5% Bachelor 60% Master 20% Other 15% Country (%) Outside US 35%

US 65%

47

Frequency of Age Distribution 6 5 4 3 Total 2 1 0 18 19 20 21 22 23 24 25 26 28 29 30 31 35 36 37 38 40

Figure 15: Age distribution of Mechanical Turk workers.

Gender Distribution 35

30

25

20

15 Total

10

5

0 F M

Figure 16: Gender distribution of usability study.

The Mechanical Turk workers were required to sign a consent agreement, with a small survey about demographics before the experiments with 40 participants. Each of the participants was paid around $0.05 for each submission. There were 72.50 % males and

27.50% females. These participants are largely young, educated with mostly bachelor

48

degree. The aged distribution is around 18-35 years old. Our usability demographics reflect the Amazon Mechanical Turk typical demographics. For the usability study, we don’t put any restriction on location of the Mechanical Turker, and no restriction on the latency also. The experiment results focus on the following metrics:

 Overall gameplay time: the DCG Captcha gameplay time, including both complete

and incomplete game playing time.

 Successful game play time: the DCG Captcha gameplay time for all of the games that

are considered completed in the required time.

 Error Rate: The ratio of the number of incomplete games to the total number of

games

 Drag Error Rate: the number of invalid drags and drops to the total number of drags

and drops

 Click Error Rate: the number of invalid clicks. This click error rate is different with

drag error rate when users click on invalid area or invalid objects.

Table 3: Completion times, error rates in usability study

Game Name Overall Standard Error Rate Drag Click Error Time (sec) Deviation (sec) Error Rate Rate Single Objects 5.825 0.472 0.000 0.325 0.700 MultiObjects 8.130 0.785 0.000 0.575 1.000

As shown in table 3, the completion times of both single objects and multiple objects are good for solving a CAPTCHA, especially with multiple objects. The overall time for single object is 5.825 seconds, while the overall time for multi objects are 8.130 seconds. The highest times when solving single objects DCG Captchas is 6.87 seconds,

49

while the highest times when solving multiple objects DCG Captchas is 9.55 seconds.

Turn out all of the needed time to finish DCG CAPTCHA is quite fast, less than 10 seconds. The time to solve multiple objects is higher than single objects, but not too high with 2.305 seconds in difference. This increasing in time might due because of the loading time of MultiObjects. For the multiple objects DCG Captchas, Mechanical Turk workers are required to drag and drop three objects to the correspondent target position.

If workers are drop the answer objects during the drag and drop process, they need more time to finish the game. The average time to complete three objects is 1.39 higher than the single objects DCG captcha. In both Usability Study with single objects and multiple objects, the error rate both are 0.00. This is an important result that all the tested games yielded 100% accuracy (the error rate is 0%). In other words, none of the participants failed to complete both single objects and multiple objects DCG Captcha games. This suggests that the robustness of the DCG Captcha games (matching shape) is good with low human errors. With the regular Internet connection condition, all of the Mechanical

Turk general completes the game in the required interval time. Therefore, the error rate, which is the number of incomplete game, is 0.00. That means there is no incomplete game in regular condition. The number of drag and drop error rate of both single objects and multiple objects are low error rate. For single object, the drag and drop error rate is

3.25%, while the multiple objects is around 5.75%. In the Match Shapes games,

Mechanical Turk workers are required to drag and drop the answer objects go to exactly target area (exactly put the answer objects to the target area).

Therefore, the drag and drop error rate is less than the click error rate. In the

Match Shapes game, both single and multi-objects, if Mechanical Turk workers drop the

50

answer objects before successfully put it into the target area, therefore we consider it is drag error rate. If workers cannot click exactly onto the answer object and hold it, we consider it is a click error. Both drag error rate and click error rate are higher when there are more answer objects. Looking at the average timing, we consider multiple answer objects (three answer objects) DCG Captcha has better timing than the single object. For the user experiences, we also collected the rating from mechanical users. The range is from ‘1’ for strongly disagreement to ‘5’ for strongly agreement. The rate for DCG

Multiple Objects is 4.3/5, which is higher than the rating for DCG Single Objects 4/5. We consider this rating is high for our usability study experience.

Another aspect of usability study is testing the different number of objects speed in DCG Captcha game. Table 4 shows the performance of the DCG Captcha in terms of overall time and the error rate with different object speeds. At the fastest speed 36 frame per second, overall time to complete DCG Captcha game with three objects is higher than the game at 12 frames per second. With the faster frames per second, there is a higher error rate with 0.758 errors in drag and drop objects. The faster answer objects are moving, the more difficult for Amazon Mechanical Turk workers to finish the captcha game.

Table 4: Object speeds vs error rates in usability study.

Speed Overall Time Error Rate 12fps 8.13 0.575 36fps 10.27 0.758

51

One more different aspect of usability is testing different numbers of answer objects. In this case, we can see the differences between DCG captcha games with only one answer object versus DCG captcha games with three different answer objects. As shown in table 4 above, one answer object DCG captcha game is easier for Amazon

Mechanical Turk worker to finish. Compared with multiple answer objects (three answer objects in this testing case), there is 2.305 seconds in difference. At the end, we asked

Amazon Mechanical Turk workers two questions about the visualize features of the DCG

Captcha and should DCG Captcha replace the old Captcha? The workers were all required to answer these two questions in the scale from 1 to 5 with ‘1’ is for strongly disagreement and ‘5’ is for strongly agreement.

As shown in table 5, Amazon Mechanical Turk workers generally agree that DCG

Captcha has a good features with the score 3.825/5.0 with standard deviation approximately 0.666. Most of the workers want to replace the old captcha schemes

(including text based captchas, audio captchas, moving objects captchas) with the new

DCG captcha. There is a high score of 4.575 over the maximum score of 5.0 in the user feedbacks at the end of the study.

Table 5: User feedbacks on game features

Scores Good Feature 3.825 (0.666) Replace Regular Captcha? 4.575 (0.627)

Our usability results suggest that the DCG captcha has very good usability, resulting in the low completion time. DCG captcha results also have low error rate with

100% completion games in the time interval permitted. DCG captcha scheme has low

52

drag and drop error rate, which is less than 10% and low click error rate. Increasing the answer objects or objects speed will increase the possibility of decreasing game performance. With the speed of 36dps (frames per second), there is a significant increase in game completion time, and the error rate also. Overall, DCG CAPTCHA provide a good usability study compared with the old CAPTCHA schemes. With the young participant pool (Amazon Mechanical Turk workers), we should be careful in replacing regular old scheme captcha with the DCG captcha scheme. We also need to consider the simplicity of the DCD Captcha (including easy matching games or puzzles or clicking tasks) in this usability of DCG captcha scheme.

4.2 Relay Attacks

Human solvers relay attacks are a raising significant problem facing in the captcha community. Most, if not all, of the current captcha schemes are completely vulnerable to these attacks [Motoyama 2010]. There are two different types of human solvers: opportunistic solving and paid solving. Opportunistic solving relies on asking legitimate users to solve the CAPTCHA as a part of unrelated work. Amazon Mechanical

Turk is an open marketplace, and therefore it might be affected by bad requester. The requesters can open a HIT to ask all of the workers answer the CAPTCHA. Therefore, there is a Term of Service that do not allows requester to open any HIT involving a

CAPTCHA or answer the text based inside of an image. However, this type of human solvers is hard to collect enough data from unwillingness solver. Our focus is paid solving workers. The rate of paid human solvers can be different from $1/1,000

CAPTCHAs or lower like $0.5/1,000 CAPTCHAs. With the rise of cheap labor from developing countries like India, China, and Vietnam, the rise of human solvers is lower

53

than before. Since solving the CAPTCHA is an unskilled task, it can be easily sourced over the Internet. There is a lot of competition recently on CAPTCHA solving business.

4.2.1 Difficulty of Relaying DCG CAPTCHA

The attacker’s motivation behind a captcha relay attack is to completely avoid the computational headache and complexity involved in breaking the captcha via automated attacks. A pure form of a relay attack only requires the attacker to relay the captcha challenges and the response from human solvers back and forth. A relay attack in text based captcha simply requires the malware program to send the capture image of text based captcha challenge to a human solver and forward the corresponding answer to a human solver back to the server. Similarly, even the motion based with moving object captcha can also be broken with relay attacks by taking enough snapshot of the motion based captcha to cover the challenge captcha. After that, relaying all of these snapshots to the human solver and relay the corresponding answers back to the attackers. Another way of relay attack a motion based captcha is simply taking a short video of all incoming frames and relaying this video to the human solver. Most of the current captcha schemes, including the new reCaptcha model from Google, are completely vulnerable with this relay attacks.

Dynamic Cognitive Game CAPTCHA can offer some resistance to this relay attacks. The main factors of this relay attacks are simplicity, low economical cost, and easily practice. Therefore, such automated attacks that require computational resources and complexity involved are not suitable. There appears to be a few mechanisms that

Dynamic Cognitive Game Captcha could be potentially subject to a relay attack. First case, if the servers send the whole game code to the client (attacker), this attacker can be

54

simply ships the code off to the human – solvers. These human solvers can complete the game as an honest user would. Therefore, the DCG captcha security model requires that the game code is embedded, hidden, and required to be executable only in the specific domain/host authorized only the server. This would make such relay attacks the whole game code is difficult, if not possible.

The second possibility is Stream Relay Attacks. This Stream Relay Attacks relay the incoming game stream from the server to the human solver (Amazon Mechanical

Turk workers). Then, this streaming approach also relays the corresponding clicks made by the solver to the server, as shown in figure 9 below.

Figure 17: Stream relay attacks.

55

This kind of Stream Relay Attack might work, and there is a possibility that this

Stream Relay Attacks are hard to prevent. However, maintain the communication streaming between the DCG captcha and the attacker, while sending a large number of frames between the attacker and human solvers (Amazon Mechanical Turk workers), is detectable. Streaming a large number of frames (12 or 36 frames per second) may degrade the game performance, especially when attacker need to stream all of game frames to the human solvers located outside of the United States. It is similar to the streaming all of these frames over a high latency or slow connections to the attackers.

Such differences in game performance (overall game time, completion game time, drag and drop error rate…) can be a significant factors may help us to detect such Stream

Relay Attack. The differences between human solvers (Amazon Mechanical Turk workers) and honest user game plays let such Stream Relay Attack can be detected with high accuracies.

4.2.2 Stream Relay Attack Study.

For Stream Relay Attack, we utilize the scenarios of high latency connection between attackers and human solvers. We use Amazon Mechanical Turk workers to collect the data with this stream relay attack. First, for the high latency connection between the attacker and the human solvers, we required all of the Amazon Mechanical

Turk workers need to be located outside of US, in a third country. Hence, we can utilize the high latency of the human solvers when connect to our website, which is located on

United State. For this high latency stream relay attacks, we only collect data from 20 participants (20 Amazon Mechanical Turk, each paid $0.10 for each submission).

56

Table 6: Stream relay attacks finish time and error rates

Overall Complete Error Drag and Drop Click Error Time Time Rate Error Rate Rate MultiObjects 33.435 29.9475 10% 10.50% 17.50% (3.239) (2.154)

For the high latency stream relay attack results as shown in table 6, there is significant differences between the overall complete times to complete a DCG captcha game of a honest user compared with a high latency human solvers. The overall time for complete a DCG captcha raise up from 8.130 seconds to 33.435 seconds. The actual complete time (not counting the time-out from some high latency human solvers) is approximately 29.947. The error rate is going down from perfection 100% completion of

DCG captcha to 90% completion of DCG captcha. Both the drag and drop error rate and click error rate are higher than the corresponding error rate from honest user in Usability

Study. In this high latency stream relay, there is significant increasing in completion time, with two time outs (more than 40 seconds).The drag and drop error rate is 4% higher, from 5% to 9%. The longer time needed to complete the DCG captcha game, and the increasing in drag and drop error rate have a main factor of high latency connection between the attacker and Amazon Mechanical Turk workers. The attacker need to maintain the high latency streaming with all of the Turk workers located outside of

United States.

57

Table 7: High latency attack compared with low latency attack in time and error.

Overall Time Complete Time Error Drag and Drop Click Rate Error Rate Error Rate 33.435 (3.239) 29.9475 (2.154) 9% 10.50% 17.50% MultiObjects 22.087 (5.52) 20.874 (1.66) 5% 10.5 16%

For low latency stream relay attacks, we required all of the Amazon Mechanical

Turk workers need to be located inside United States. This is not considered a real low latency stream relay attacks because the Amazon Mechanical Turk workers can locate from East side of United States connecting to the server at the West side of United States.

However, this is considered low latency connection than any human solvers located outside of the United States. We can see the decrease in overall time in table 7, which is

33% lower than the overall required time of high latency stream relay attack. There is still one time out which is approximately 45 seconds. This makes the standard deviation of this low latency stream relay attacks is higher than the high latency stream relay. The error rate is 50% lower, which is down from 2 time outs to 1 timeouts between these two experiments. Both of the drag and drop error rate and click error rate considered the same between high latency stream relay attack and low latency stream relay attack. The completion time (without counting the timeout) is average 20.874, which has standard deviation is just 1.66 compared with 5.52 in overall time. So, for low latency stream relay attacks, the overall time is not accuracy enough because of the timeout setting is too high.

It appears that the low latency connection between Amazon Mechanical Turk workers and the attacker improves the game performance, but it is still not as good as the results shown in usability study. The error rate, drag and drop error rate, and click error rate in low latency stream relay attacks are 5%, 5% and 6% higher respectively.

58

4.3 Stream Relay Attack Detection Mechanism.

In the previous section, we already experienced the DCG captcha game performance in the usability study and stream relay attack study with high latency and low latency settings. We considered the high latency and low latency stream relay attack scenarios from Amazon Mechanical Turk workers are good experiences. Based on the results, we investigate whether it is possible for the captcha service, based on different behavior and gameplay, to detect such stream relay attack. We explore the following aspects of the human behavior data collected in high latency and low latency connection compared with the legitimate users data collected in usability study. We consider the following human behavior data:

 PlayDuration: overall gameplay time (in seconds) of a game instance for an

honest or remote user;

 Drag and Drop Error: the number of drag and drop error created by legitimate

user or human solvers. This number count by 1 when the users successful click on

the object, and but not successful drag them to the corresponding area.

 Click Error: the number of incorrect click in the active area of DCG CAPTCHA.

This number is higher when the human solvers could not click correctly on the

objects because of the high latency connection.

4.3.1 Detection with Different Error Rates and Duration Play Time.

We need to consider the overall time, without counting all of the time out. We need to have a good indicator to make sure all of the time out is excluded when exceeding the maximum time limitation (e.g. 40 seconds). After exclude all of the time out, we need to calculate the differences in play duration time between the honest user

59

and high latency of low latency stream relay. These two differences subset will be considered as one main factor to train a classifier. As shown in Figure 10 below, the duration time to finish DCG captcha game in high latency and low latency are not different compared with relative numbers. In high latency number, the standard deviation between each game play is not consistency. With the high latency connection from

Mechanical Turk, we expect the same result.

Duration Time 50 45 40 35 30 25 20 15 10 5 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 High Low Honest

Figure 18: Duration time play graph.

60

Differences in Duration Game Play 50 45 40 35 30 25 20 15 10 5 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 High Low Honest H-Honest L-Honest

Figure 19: Difference in duration game play.

We need to calculate the difference in duration game play time between the high latency stream relay attacks compared with the honest users. We also calculate the difference in duration game play time between the low latency stream relay attacks compared with the honest users. We will have two data sets called H-Honest and L-

Honest.

Difference = HighLatency – Honest

Difference = LowLatency - Honest

These two data sets includes all of the differences between play time durations between high and low latency attacks compared with the honest users. The average time differences between high latency stream relay attacks compared with honest users play time is approximately 26.13 seconds, while the low latency stream relay attacks have

61

differences 14.105 seconds. With these two trained datasets, we can have a classifier about timing to finish a DCG Captcha. In the case of high latency connection, we expect the finish time differences compared with honest users are around 26.13 seconds. For the low latency connection, we expect the stream relay attacks have 14.105 seconds higher than the finished time of the honest users. We also need to put one more attribute to our

DCG captcha scheme called invalid mouse drag time. Consider a legitimate user having an invalid mouse drags, he/she can quickly respond to that invalid mouse clicks. He/she realizes it immediately and has a corrective responds. In both high latency and low latency stream relay attacks. Amazon Mechanical Turk workers cannot responds quickly because of the delay in stream connection between the server, attackers, and human solvers. We build two different charts compared between the drag and drop errors rate and click errors rates in both scenarios high latency and low latency, as shown below:

Error Rates with High Latency 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 DnD Click

Figure 20: Error rates in high latency connection.

62

In the case of high latency connection, the number of drag and drop for each

Turker is usually around 1 attempt. For all the number of drag and drop attempts greater than 2, we can classified them as human solvers. The difference between drag and drop error rates and click errors rate is the success of “up” and “down” mouse click. When the users attempt to drag an object, he or she needs to click and hold an object, and drag it into the destination area. That should be count as one drag and drop. If the user drag and drop the objects during the attempt, that should be a drag and drop error. In the click error rates, if the users not correctly click at the objects (no need for drag an object yet), that should be count as 1 click errors. In both case, the errors usually come from the high latency connection. When streaming a DCG Captcha, maintaining the connection between the human solvers and the server smoothly is hard for attackers. As shown in figure 12, the attempt number 9 and 12 are considered human solvers based on the behavior. With the high number of click errors, it means the connection between human solvers and the server is not consistent. With the legitimate users, in the case of wrong click, they can easily modify the position of mouse click to make sure the next click is correct. It is not the same with human solver when attacker needs to capture the image, and stream the image to the human solver. After that, the attacker needs to maintain the connection to wait for the response from the human solver. With all of the subsets have number of click errors larger than three, we can classified them as the human solvers. The same with the high latency connection, as shown in figure 13, low latency connection has the same classifier with drag and drop error rates and click error rates. The user with high number of drag and drop and click error #13 can be classified as human solvers.

63

Error Rate with Low Latency 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 DnD Click

Figure 21: Error rates in low latency connection.

Error Rates with Legitmates Users 5

4

3

2

1

0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 DnD Click

Figure 22: Error rates in legitimate users’ gameplay

Compared with the high latency and low latency human solvers, the drag and drop error and click error rates are significant lower with average 0.3% for drag and drop error rates, and 1% for click error rates, as shown in figure 4. Most of the legitimate users successful

64

in 1st attempts of drag and drop, and they also only have 1 or 2 click error rates. All of the high number of drag and drop error and click error larger than 2 should be marked as human solvers possibility in the stream relay attack detection.

4.3.2 Using K-Nearest Neighbor Algorithm for Detection Mechanism.

Based all the collected data, we divided all of the data sets into different smaller subsets like drag and drop error clicks, high latency finish time, low latency finish time.

Next, we need to choose the classification method to classify the collected data. K

Nearest Neighbor is a simple algorithm that stores all of available data set and classifies new collected data based on the similarity measure. K Nearest Neighbor use different distance function calculation method to measure the similarity. This K Nearest Neighbor has been used in pattern recognition in early 1970s in statistic pattern recognition. Using

K Nearest Neighbor algorithm, we can predict the pattern of attacker based on the drag and drop errors rates and time to finish the CAPTCHA game. Table 9 shows the result of

K Nearest Neighbor with low latency connection.

65

Table 8: K-NN prediction in low latency connection

DnD Low Y Distance NN Sign 0 18.24 + 194 0 19.62 + 157 1 21.39 + 115 2 22.64 + 91 0 18.69 + 181 1 20.63 + 132 1 19.37 + 163 2 22.87 + 87 0 19.65 + 157 0 18.97 + 174 2 23.79 + 70 + 1 21.56 + 112 5 45.12 - 185 0 19.23 + 167 1 20.67 + 131 1 21.76 + 107 1 20.43 + 137 1 20.59 + 133 1 23.65 + 72 1 22.87 + 86

As shown in table 9, with K Nearest Neighbor algorithm, we can have a classification of two different subsets drag and drop error rates and the low latency finish time. Both of the subsets need to be pre-classified to determine what is negative and what is positive. We also decide to use the K value of 1 to compare between the need-to-be decided variable and its nearest neighbor. With different K value, the K Nearest Neighbor will have different outcomes. In the low latency finish time, as shown in table above, with the drag and drop error rates of 1, and the finish time of 32.12, K Nearest Neighbor success to classify this subset as positive (not human solver attackers). For all of subsets

66

that have low latency finish time larger than 35 seconds, the K-Nearest Neighbor success to classify all as possible human solver attackers. The main idea behind the K Nearest

Neighbor algorithm is about to choose the correct and precise K value to compare between its nearest neighbor. For low latency connection, we have the true positive of stream relay attack approximately around 89% in testing data set. The problem in recognition of stream relay attack in low latency connection is because the duration time is not much different with the duration time of legitimate users. The difference margin between two duration time data set is not big enough for K Nearest Neighbor to distinguish the legitimate users from the stream relay attackers.

Table 9: K-NN prediction in high latency connection

DnD High Y Distance NN Sign 1 32.15 + 1 0 30.21 + 2 1 31.52 + 0 + 1 33.25 + 4 1 34.16 + 9 0 31.26 + 1 0 30.62 + 1 1 33.25 + 4 4 42.01 - 126 2 36.12 + 25 1 30.18 + 1 2 37.12 + 36 3 41.28 - 106 0 32.12 + 2 0 33.62 + 7 1 35.47 + 18 0 33.64 + 7 2 37.18 + 37 0 32.19 + 2 1 34.89 + 14

67

45

40

35

30

25 Positive Negative 20 ? 15 Duration Time (seconds) Time Duration 10

5

0 0 1 2 3 4 5 Number of DnD rates

Figure 23: K-NN graph of high latency relay attacks.

As shown in table 9, we again use K Nearest Neighbor algorithm on Drag and

Drop error rates and the high latency finish time. We use the K value equal 1 to detect the possible stream relay attacks. With the higher K value, there is a possibility that the detection more precise, but no guarantee. With the K value equal 1, a case is compared with its nearest neighbor. It will be calculated the distance with each of the data subset.

With K equal to 1, we have the sign column, as shown in table above, to decide which one is positive or negative. With the drag and drop rate of 1, and the high latency finish time of 31.20, the K-Nearest Neighbor success to classify it as the legitimate user. With the drag and drop rate of 2, the K value of 3, and the finish time of 41.00, the K-Nearest

68

Neighbor success to classify it as the possible attacker. The classifier has a high successful rate of 94% in detecting the possible stream relay attacks. The difference between two subsets high latency and low latency is the accuracy rate in detection. The classifier with high latency finish time is more accurate than the low latency. With the high latency duration times are much more difference in time, the classifier is easier to detect the possible stream relay attack. The low latency duration times have not changed significantly in time, compared with each other. Hence, the classifier is much more difficult to detect the possible stream relay attack in low latency subsets. We also need to count the different in K Nearest Neighbor value. With K equal to 1, we can easily compared the case to each of the data set in the subset, but it not provide the most precisely pattern recognition. The suggestion K values should be between 3 and 5 to get the most precisely detection mechanism.

69

CHAPTER 5: CONCLUSION

5.1 Summary

Our study results offer insights look about the DCG captcha. Dynamic Cognitive

Game Captchas are generally usable. DCG captchas also provide a good ability in preventing relay attack over the different well-known types of captchas. Text based, audio, motion based captchas are all considered vulnerable with stream relay attacks, while the Dynamic Cognitive Game Captchas are not vulnerable yet with both high latency and low latency stream relay attacks.

The ability of Dynamic Cognitive Game captchas in preventing stream relay attacks based on the dynamic scheme of DCG captchas. DCG captcha requires the dynamic interaction between the legitimate users and the server. Over stream relay attack, especially with the high latency stream relay attacks, the requirement of dynamic interaction between the users and the servers cannot be fulfilled. The Dynamic Cognitive

Captcha requires the continuous interaction between legitimate users and the servers.

Hence, increasing the number of answer objects and the complexity of the required interaction can help increasing the ability of detecting stream relay attacks. Considered the multiple answer objects match shapes DCG captchas, it is considered harder for stream relay attacks compared with the single answer objects. The more interaction between users and servers, the more prevention DCG captchas have. We also need to consider if there are too may answer objects (greater than 3), the usability of the Dynamic

Cognitive Game captchas can be degraded.

Our work utilizes the Amazon Mechanical Turk workers to hire all of the participants for this research. The Amazon Mechanical Turk allow us to collect data with

70

specific requirements like location outside or inside the United States. Amazon

Mechanical Turk also let us controls the data sets all of the workers submitted. On the other hand, Amazon Mechanical Turk prevents us to test the stream relay attacks from different methods. If users connect to the servers using the Virtual Network Connection problem like TeamViewer, Microsoft Remote Desktop Control, RealVNC… we also consider the possibility of streaming relay attacks.

Our studies suggest that increasing the answer objects up to three answer objects can help increasing the prevention of stream relay attacks. Also, we can increase the target objects to enhance more interactions between the legitimate users and the servers.

We also consider the specific target objects are more secured than the target area, which just requires users to put the answer objects onto it. We suggest that adding more interaction or drag and drop between the legitimate users and the servers can against not the automated attacks but also the stream relay attacks, both high latency and low latency.

Further research in creating a mechanism to detect such stream relay attacks with a good classifier is necessary.

Our studies also success in provide a possible detection mechanism to detect the stream relay attack using human solvers. Using pattern recognition algorithm, in this case

K-Nearest Neighbor, we success to recognize the possible stream relay attack in both high latency duration time and low latency duration time. With three main subsets of drag and drop error rates, click error rates, and the duration time, we success in creating the different classifier to recognize the possible stream relay attack.

71

5.2 Future Work

5.2.1 Improving with Different DCG CAPTCHA Gameplays.

For better results, we need to create more different gameplays with different type of target areas and answer objects. We also need to consider creating the gameplay with more than three answer objects. As results shown in table 10 and 11, the more answer objects on high latency gameplay will be easier for pattern recognition algorithm to recognize the possible stream relay attacks. We also need to create the different gameplays with no specific target areas. For all of these gameplay with no specific area, we can test the usability of users in DCG CAPTCHA. With no specific target area, the usability of users should be higher than the specific target area gameplay. Legitimate users should be easier to drag and drop the answer object to the target area. One more improving feature is the refresh rate of the gameplay. With different refresh rate, we also have the different experience with legitimate users and possible stream relay attackers.

Different refresh rate with different latency connection rate are two main feature subsets that can differentiate the legitimate users and the stream relay attackers.

5.2.2 Improving with Stream Relay Attack with Offline Feedback.

We can improve stream relay attack with offline feedback from legitimate users.

Until now, detection mechanism in stream relay attack CAPTCHA is not successful enough. With the detection mechanism in this paper, we now can have a method to detect static stream relay attack. If the attacker, or human solver, can constantly streaming the offline feedback from client machine to the server to study the location of answer objects and the target areas, it is harder for any detection mechanism to detect the possible stream relay attack. Offline feedback will not provide the main mechanism in stream relay

72

attack, but it will provide the more precisely constant location of objects during different time. Hence, it will have more success to attack any CAPTCHA scheme without detected.

5.2.3 Improving with Usability of DCG CAPTCHA.

We consider the usability of DCG CAPTCHA to provide the more efficiency usability to user without degrading in security. We want to lower the reaction time between the legitimate users and the servers. We also want to reduce the duration interaction time. We consider enhancing the usability of DCG CAPTCHA by providing the DCG CAPTCHA on tough screen mobile devices. Further research will consider to study the usability and the possibility of stream relay attack detection mechanism of DCG

CAPTCHA on touch screen mobile devices.

73

REFERENCES

Ahn, L., Blum, M., Hopper, N., Langford, J. 2003. Captcha: Using Hard AI Problems for

Security. In Advances in Cryptology. EUROCRYPT

Ahn, L., Maurer, B., McMillen, Colin, Abraham, D., Blum, M. 2008. ReCAPTCHA:

Human-Based Character Recognition via Web Security Measures. ScienceMag.

VOL 321. Accessed November 17, 2016

Bigham, J. P. and Cavender A. C. 2009. Evaluating Existing Audio CAPTCHAs and an

Interface.

Buhrmester, M., Kwang, T. and Gosling, S. D. 2011. Amazon’s mechanical Turk: A

newsource of inexpensive, yet high-quality data? Perspectives on Psychological

Science6, 1, 3-5

Bursztein, E., Bethard, S., Fabry, C., Mitchel, J. C., Jurafsky, Dan., 2010. How Good Are

Humans at Solving CAPTCHAs? A Large Scale Evaluation.

Chandavale, A.A., Sapkal A.M. and Jalnekar R.M., 2009 Algorithm to Break Visual

CAPTCHA.

Chew, M., and Tygar, J. D. 2004. Image Recognition CAPTCHAs.

Elson, J., Douceur, J., Howell, J. and Saul, J. 2007. ASIRRA: a CAPTCHA that exploits

interest-aligned manual image categorization. In Proc. Of ACM CCS 2007, pp.

366-374

Golle, P., 2008. Machine Learning Attacks against the ASIRRA CAPTCHA.

74

Hidalgo J. M. G., and Alvarez G. 2011. Captchas: An Artificial Intelligence Application

to Web Security. Advances in Computers.

Ipeirotis, P. G. 2010. Demographic of mechanical Turk. http://hdl.handle.net/2451/29585

(accessed August 30, 2016)

Mori, G and Malik, J. 2003. Breaking a Visual CAPTCHA.

Motoyama, M., Levchenko, K., Kanich, C., McCoy, D., Voelker, GM., Savage, S. 2010.

Re: Captchas-Understanding CAPTCHA – Solving Services in an Economic

Context. In USENIX Security Symposium.

NuCaptcha. 2011. www.nucaptcha.com.

Paolacci, G. and Chandler, J. 2014. Inside the Turk: Understanding mechanical Turk as a

participant pool. In Current Directions in Psychological Science (San Francisco,

CA, August), 185-189.

Paolacci, G., Chandler, J. and Ipeirotis, P. G. 2010. Running experiments on Amazon

Mechanical Turk. Judgment and Decision Making 5, 5, 411-419

Pdictionary. The internet picture dictionary. http://www.pdictionary.com. 2004

Pontin, J. 2007. Artificial intelligence: With help from the humans. The

Times.http://www.nytimes.com/

Keizer, G., 2008. Spammers’ Bot Cracks Microsoft’s CAPTCHA. In Computer World.

http://www.computerworld.com/article/2536901/security0/spammers--bot-cracks-

microsoft-s-captcha.html Saul, J. Petfinder. www.petfinder.com

75

Tung, L., 2014. Google algorithm busts CAPTCHA with 99.8 percent accuracy. In

ZDNet. http://www.zdnet.com/article/google-algorithm-busts-captcha-with-99-8-

percent-accuracy/

Turing, A. M. 1950. Computing machinery and intelligence.

Xu, Y., Reynaga, G., Chiasson, S., Frahm, J., Monrose, F. and van Oorschot, P. Security

and Usability Challenges of Moving Object CAPTCHAs: Decoding Codewords

in Motion.

Yan, Jeff. and El Ahmad, Ahmad Salah. 2008. Usability of CAPTCHAs or usability

issues in CAPTCHA design.

76