Handbook of Research on Social and Organizational Liabilities in Information Security

Manish Gupta State University of New York at Buffalo, USA

Raj Sharman State University of New York at Buffalo, USA

Information science reference Hershey • New York Director of Editorial Content: Kristin Klinger Director of Production: Jennifer Neidig Managing Editor: Jamie Snavely Assistant Managing Editor: Carole Coulson Typesetter: Jeff Ash Cover Design: Lisa Tosheff Printed at: Yurchak Printing Inc.

Published in the United States of America by Information Science Reference (an imprint of IGI Global) 701 E. Chocolate Avenue, Suite 200 Hershey PA 17033 Tel: 717-533-8845 Fax: 717-533-8661 E-mail: [email protected] Web site: http://www.igi-global.com and in the United Kingdom by Information Science Reference (an imprint of IGI Global) 3 Henrietta Street Covent Garden London WC2E 8LU Tel: 44 20 7240 0856 Fax: 44 20 7379 0609 Web site: http://www.eurospanbookstore.com

Copyright © 2009 by IGI Global. All rights reserved. No part of this publication may be reproduced, stored or distributed in any form or by any means, electronic or mechanical, including photocopying, without written permission from the publisher. Product or company names used in this set are for identification purposes only. Inclusion of the names of the products or companies does not indicate a claim of ownership by IGI Global of the trademark or registered trademark.

Library of Congress Cataloging-in-Publication Data

Handbook of research on social and organizational liabilities in information security / Manish Gupta and Raj Sharman, editors.

p. cm.

Includes bibliographical references and index.

Summary: "This book offers insightful articles on the most salient contemporary issues of managing social and human aspects of information security"--Provided by publisher.

ISBN 978-1-60566-132-2 (hardcover) -- ISBN 978-1-60566-133-9 (ebook)

1. Computer security--Management--Handbooks, manuals, etc. 2. Data protection--Management--Handbooks, manuals, etc. 3. Computer crimes--Prevention--Handbooks, manuals, etc. 4. Human computer interaction--Handbooks, manuals, etc. I. Gupta, Manish, 1978- II. Sharman, Raj.

QA76.9.A25.H365 2008

658.4'78--dc22

2008035140

British Cataloguing in Publication Data A Cataloguing in Publication record for this book is available from the British Library.

All work contributed to this book set is original material. The views expressed in this book are those of the authors, but not necessarily of the publisher.

If a library purchased a print copy of this publication, please go to http://www.igi-global.com/agreement for information on activating the library's complimentary electronic access to this publication. 175

Chapter XI A Multistage Framework to Defend Against Attacks

Madhusudhanan Chandrasekaran SUNY at Buffalo, USA

Shambhu Upadhyaya State University of New York, USA

Abstract

Phishing scams pose a serious threat to end-users and commercial institutions alike. E-mail continues to be the favorite vehicle to perpetrate such scams, mainly due to its widespread use combined with the ability to easily spoof them. Several approaches, both generic and specialized, have been proposed to address this growing problem. However, phishing techniques, growing in ingenuity as well as sophisti- cation, render these solutions weak. To overcome these limitations, we propose a multistage framework – the first stage aims at detecting phishing based on their semantic and structural properties, whereas in the second stage we propose a proactive technique based on a challenge-response technique to establish the authenticity of a Web site. Using live e-mail data, we demonstrate that our approach with these two stages is able to detect a wider range of phishing attacks than existing schemes. Also, our performance analysis study shows that the implementation overhead introduced by our tool is negligibly small.

Introduction number, social security number (SSN), and bank account number. As the Internet is becoming the Phishing is a form of Web based attack where de facto medium for online banking and trade, attackers employ deceit and social engineering phishing attacks are gaining notoriety, especially to defraud users of their private and confiden- amongst hacker communities. Anonymity over tial information such as password, credit card the Internet, coupled with the potential for large

Copyright © 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited. A Multistage Framework to Defend Against Phishing Attacks

financial gains serves as strong motivation for at- based cues such as address bar, status bar, SSL tackers to perpetrate such seemingly low risk, yet certificates, and toolbar indicators and often fall high return scams. The first recorded mention of prey to such imitation sites. phishing attacks was in AOL forums (“Phishing Until recently, anti-spam techniques were - Wikipedia,”) wherein attackers posing as system employed to detect phishing emails. However as administrators tricked the registered users into phishing emails closely resemble their legitimate disclosing their account information. Since then, counterpart, they do not share similar features phishing attacks growing in sophistication and as that of spam emails. Also, there exist a vast ingenuity have affected millions of users causing number of readily available tools that can bypass heavy monetary damage. For example, in the year both the statistical and rule based spam filters. 2006 alone, phishing attacks cost $2.8 billion in Several browser extensions and plug-ins have been losses to consumers and commercial organizations proposed to detect phishing attacks. Although worldwide (Gartner Press Release, 2006). these techniques act as a first line of defense, Due to its widespread adoption and ability they suffer from many limitations. First, as these to be easily spoofed, email continues to be the approaches operate on the fake Web site, they favorite vehicle to perpetrate such scams. Email take the users a step closer to the attack giving based phishing attacks are usually carried out as little leeway for suspicion. Second, most of the a three step process: (i) In the first step, phishers existing defense mechanisms are not automated harvest email addresses of their potential victims and delegate the onus of decision making onto the from Web pages, online forums and by other so- users. Third, as these tools embrace the authentic- cial engineering mechanisms; (ii) For the second ity of the IP address as an important classification step, a large volume of specially crafted emails criterion, they fail to protect from attacks that are appearing to originate from legitimate domains launched within the realm of legitimate domain. is dispatched to the assimilated list using open For example, an attacker could compromise a SMTP servers and compromised machines. These Web server and launch phishing pages from the emails contain hyperlinks which redirect the users domain itself1. to a fake Web site similar in appearance to the To overcome these limitations, we leverage on legitimate domain; (iii) Finally, account details our prior works (Chandrasekaran, Chinchani, & and other personal information are collected from Upadhyaya, 2006; Chandrasekaran, Narayanan, the users who unsuspectingly provide them into & Upadhyaya, 2006) and present a two-stage the fake Web site thinking it to be a legitimate solution to protect users against email based one. Phishing attacks, like other social engineer- phishing attacks. The first stage aims at detect- ing attacks, for their success depend upon users’ ing phishing emails based on their semantic and lack of system knowledge. Phishers adopt a structural properties, whereas in the second stage variety of visual deception agents to imitate the a proactive approach using the challenge-response legitimate Web site’s look-and-feel (Drake, Oliver, technique is presented to test the authenticity of & Koontz, 2004). The mimicry of a legitimate links present in the email. The essential driving Web site is usually achieved through spoofing force for this two-stage approach is that cleverly the URLs with non-ASCII Unicode characters fabricated emails can evade even the most smart using customized images to mask fake URLs spam filters which are put in place for phishing and embedding the fake Web sites within images detection. In the first stage, the existing phishing that resemble a browser window. Recent studies email corpus are analyzed and context models are (Dhamija, Tygar, & Hearst, 2006) show that naïve constructed which encapsulate the underlying users are inept in identifying common browser meaning of the emails using their syntactic and

176 A Multistage Framework to Defend Against Phishing Attacks

semantic properties. These context models then to the information requested by the Web site. serve as discrimination indicators, and are used to Based on our assumption, we note that the fake distinguish the legitimate and the phishing emails Web site cannot verify the credibility of the sup- apart. The context models are constructed from plied information, and is indifferent to both the various language specific features such as usage real and the contrived response. Using live email of certain emotional words, patterns of vocabu- data it can be demonstrated that this two-stage lary usage, unusual language usage, underlying approach is able to detect a wider range of phish- content, and also other stylistic features that ap- ing attacks than other existing schemes. Also, pear frequently across the phishing emails. These preliminary performance analysis study reveals feature vectors constitute the message carried out that the implementation overhead introduced by by the phishing emails, which is used primarily the proposed tool is negligible. to trick the recipients. Since there exist a large The remainder of this chapter is organized number of such relevant features, to reduce the as follows: First, we briefly discuss the exist- curse-of-dimensionality, Simulated Annealing ing antiphishing techniques and give out their (SA) (Debuse & Rayward-Smith, 1997) is em- strengths and weaknesses. Then we talk about ployed to identify the set of relevant features. commonly adopted phishing attack vectors that Once the relevant features are identified, for the are used by the attackers to successfully launch testing phase, Support Vector Machine (SVM) phishing attacks. As classification algorithms (Drucker, Wu, & Vapnik, 1999) is used to clas- rely on identifying invariant features intrinsic sify the phishing emails even before it reaches to the phishing emails, we dissect the phishing the users’ inbox. email structure for the purpose of extracting such Although most of the present day phishing features. This also helps in training the classifier emails can be detected using our first stage solu- to detect phishing emails even before it reaches tion, phishing emails, similar to , can the user inbox. Next, a two-stage approach is be cleverly devised omitting the relevant features presented which can detect both phishing emails that are used for training in our scheme. In the and Web sites. Finally, In order to bring out the second stage, to detect such outliers a more proac- efficacy of our approach, we evaluate both stages tive standpoint is adopted by using fake responses against live phishing sites and emails and give out which mimic real users, essentially reversing the their performance details. role of the victim and the adversary. The key idea Readers may note that the flavor of our exposi- behind our approach is to protect the real user’s tion is rather technical; for a terminological, busi- identities by providing fake information to the ness and legal exposition of phishing and pharm- Web sites requesting critical information, until ing, one could refer to (Kierkegaard 2007). their authenticity is verified. Here, the detection leverages on the premise that just as an end user cannot tell legitimate and spoofed emails apart, Existing Detection similarly phishers cannot tell the responses of Techniques legitimate and phantom users apart. As the mail user agent (MUA) receives emails, their content A large number of defense efforts have been un- are analyzed for the presence of embedded links dertaken in academia and industry to tackle the and attached HTML forms. If the email con- phishing scourge. AntiPhishing Working Group tains no such suspicious characteristics, further (APWG) (“Anti-Phishing Working Group,”) is a investigation is discarded. Otherwise, it parses consortium of pan-industrial and law enforcement the appropriate tokens and supplies fake values agencies formed to record and analyze the trends

177 A Multistage Framework to Defend Against Phishing Attacks

in current day phishing attacks. It comprises of whether the images present in the legitimate sites more than 1600 companies worldwide, including are imported by unknown suspicious domains. 8 of the top 10 US Banks. From the academia Unlike stateless evaluation, the stateful page front, to date, CMU Usable Privacy and Security evaluation monitors every outgoing data using Laboratory (CUPS), Stanford computer security site specific salts so that a user does not provide lab and Indiana University’s anti-phishing work his username and password into a site he has group have been involved with projects that focus never visited before. In most cases, the final re- on antiphishing efforts. In this section, we sum- sult of these toolbars is either binary (phishing marize these existing defense solutions depend- or safe) or ternary process (where a score/color is ing on the functionality they offer to detect and displayed on the toolbar to warn users about the eliminate phishing attacks. sites’ suspiciousness). Despite their advantages, a recent study (Zhang, Egelman, Cranor, & Hong, Browser Plug-ins and Antiphishing 2007), experimented with 10 popular antiphish- Toolbars ing toolbars revealed that the toolbars failed to identify 15% of the phishing Web sites used for Several commercial and open source toolbars have testing. Also, as discussed earlier, these toolbars been proposed to test and validate the Web sites depend on the validity of IP as an important de- visited by users. Spoofstick (Spoofstick, 2004) is tection criterion and fail to protect from attacks one such browser add-on which displays the IP ad- launched from the legitimate Web sites. Lastly, dress of the connecting Web site in its toolbar, even these toolbars ignore the weak human factor and if the URL displayed in the browser is spoofed. require users to make the final decision, i.e., to As Spoofstick requires human to know the IP ad- trust a suspicious Web site or not. dress of legitimate Web sites, it does not provide an automated solution to detect phishing attacks. Digital Signing and PKI Based NetCraft antiphishing toolbar (NetCraft, 2004) is Schemes another monitoring tool that employs client-server architecture. Each toolbar subscriber acts as a Digital signing and trust propagation schemes client and is responsible for reporting suspicious have been proposed to make email secure and Web sites to a central server. The server then reliable. These schemes employ publicly avail- processes every incoming request by checking the able standards such as S/MIME, PGP and GPG domain age, hosted location and URLs, and then to encrypt, decrypt and validate email messages. puts the reported Web sites either into a whitelist The main purpose of digital signing is to guarantee or a blacklist. These lists are propagated to other two basic security properties: non-repudiation clients to assist them with their decision making. and integrity (Kapadia, 2007). Non-repudiation A disadvantage with such an approach is that as is used to ensure that the sender and the recipi- phishing Web sites are ephemeral, it might not be ent are, in fact, the parties who claimed to have possible to propagate the generated blacklists to sent or received the message, respectively. On the the clients in time. SpoofGaurd (Chou, Ledesma, other hand, integrity is used to verify that a part or Teraguchi, & Mitchell, 2004) is a browser plug-in whole of the message is not modified during tran- that examines the visited Web site using stateful sit. Spam protection framework (SPF), Certified and stateless evaluation. The stateless evaluation Sender Validation (CSV) and DomainKeys have includes check for invalid links, URL obfuscation also been proposed as an alternative mechanism attacks, valid https connection and authenticity of to authenticate emails based on their sender’s do- SSL/TLS certificates. It also checks to determine main name. DomainKeys uses digital signatures

178 A Multistage Framework to Defend Against Phishing Attacks

to authenticate domain name and the entire content adoption is not always feasible. Other session of a message, whereas SPF and CSV look at the based authentication mechanisms include email email headers to identify . Even though sequence numbering and email personalization. these schemes act as an effective anti-spoofing Unfortunately, similar to hardware tokens, they solution, they suffer from several disadvantages. also impose additional burden on the users to First, adoption of these techniques necessitates manually verify the validity of sequence number steep learning curve which might be elusive to and personal information. Moreover, these ses- everyday users. Second, these techniques require sion-based authentication schemes also suffer installation of additional software to support from man-in-the-middle attacks. S/MIME, PGP, GPG, etc. These provisions are not readily available in most of the popular Web Content Based Phishing Attack based email clients such as Yahoo Mail, Hotmail Detection and Gmail. Finally, these techniques suffer from key distribution problem, where a trusted medium Several research efforts employing machine learn- is needed to exchange keys needed to sign and en- ing and pattern recognition techniques have been crypt/decrypt messages. In the case of PGP/GPG proposed to classify phishing emails. Most of the schemes, as there is no central authority server, a earlier algorithms were tailored to detect spam phisher can infiltrate the Web of trust by digitally emails and did not perform well when applied in signing his emails. Another drawback of this PKI- the context of phishing attacks. These approaches based and authentication based approaches is that were naïve in the sense that they essentially focus both the sender and the receiver need to have the on detecting the presence of uncommon words that same signing and verification mechanisms. appear in the spam emails. As phishing emails closely imitate their legitimate counterpart, unlike Hardware Tokens and Session spam, they do not contain such random and junk Based Authentication words. In order to classify phishing emails, Fette et al. (Fette, Sadeh, & Tomasic, 2007) employ a Strong token based authentication mechanisms set of 16 different machine learning algorithms can be enforced using physical devices like operating on a predefined feature set. The feature smartcards, Bluetooth devices, USB tokens, and set consists of structural elements that indicate other devices that generate onetime passwords. presence of illegitimate hyperlinks, IP based As these generated passwords expire after a URLs, non-matching URLs and other charac- single logon, it is not possible for the phisher to teristics intrinsic to phishing emails. CANTINA impersonate the user at a later point in time. Some (Zhang, Hong, & Cranor, 2007) is another tool commercial implementations of such hardware which uses term frequency and inverse docu- tokens are RSA’s SecurID, Aladdin’s eToken, ment frequency (tf-idf) to identify commonly ActivIdentity tokens. Even though these schemes appearing words in phishing Web pages. These provide multi-way authentication by having an words along with other structural elements are out-of-band (OOB) authentication channel, they used as features for classification. Wenyin et al. are vulnerable to man-in-the-middle attacks (Wenyin, Huang, Xiaoyue, Min, & Deng, 2005) where phishers, to avoid detection, replay the propose a technique to detect phishing Web pages obtained password in the legitimate Web site and by evaluating their resemblance with other Web then present the received information back to the pages previously visited by the users. Normally, users. Moreover, these approaches incur high set- as two Web pages do not share the same content up and management costs and their large scale and layout, three metrics based on block level,

179 A Multistage Framework to Defend Against Phishing Attacks

layout level and overall similarity are used to com/2lmgd2 respectively. Other form of visual tell the fake and the legitimate apart. The main deception is brought about by replacing ASCII drawback of these approaches is that they fail to characters with special encoded characters that consider the underlying message conveyed in the use DWORD, HEX, UTF-8 encoding (“How to phishing emails using which an attacker lures his obscure any URL - How Spammers And Scam- victims to visit the fake Web site. Here, we also mers Hide and Confuse,” 2002). argue that, to decrease the true negative rates and Browser Vulnerabilities: Browsers in their to promote the overall awareness about phishing quest to support extended features and function- attacks, it is important to also include the mes- alities provide hooks to install unauthorized third sages’ tone as an essential classification feature. party plug-ins and add-ons. As these add-ons usually operate with the same security privilege as the browser, attackers can essentially exploit Commonly Adopted Attack vulnerabilities present in them to hijack the Vectors browser. With new vulnerabilities being discov- ered and patches released, it is extremely difficult Attack vectors are the means by which an attacker for a naive user to constantly update and protect can gain access to a computer or network server against the attacks. Poorly configured browsers for malicious purposes. Attack vectors enable installed with third party plug-ins are susceptible hackers to exploit system vulnerabilities, and to homographic attacks like International Domain also the human element which is regarded as the Name (IDN) spoofing and pop-up hijacking at- ‘weakest link’ in the security chain. Attack vec- tacks. Also, vulnerabilities in ActiveX controls tors include viruses, email attachments, spoofed and browser helper objects (BHO) can install URLs, pop-up windows, instant messages, and trojans which can modify the system’s host file deception. In this section, we review some of the to redirect the users to a fake Web site. Disabling commonly adopted phishing attack vectors used the features like ActiveX controls, unauthorized by the phishers to trick the user into visiting the third party plug-ins, IDN support is often viewed fake Web site. as a trade-off between extended functionality URL and Host Name Obfuscation Attacks: and security. Phishing attacks use spoofing techniques to Cross-site Scripting (XSS) and Session conceal such that an unsophisticated user cannot Hijacking Attacks: Phishers exploit the security distinguish them from legitimate ones. One trivial loopholes in Web applications and Web server’s deception method is to register a fake domain software to make the users unknowingly execute that is a misspelled variant of the legitimate one. malicious scripts. These scripts are usually embed- For example, a phisher can host the phishing site ded through encoded characters in the URL for from www.paypai.com to forge www.paypal. the purpose of redirecting the users to a malicious com (replace lower case ‘L’ with ‘i’). Third party server. For example, a user might click on the service exists that can shorten URLs so that it www.legit.com/account? URL=www.fakebank. can be compatible with existing email and Web com assuming it to be a part of the legitimate bank application systems. Such shortened URLs lose itself. Here the user is first directed to the legit. their identity and are represented as if hosted com Web site. But due to the coding flaws, the within their providers’ domain. For example, account accepts arbitrary URLs and redirects the www.google.com/accounts and http://money- control over to the www.fakebank.com. Also, by central.msn.com/banking/accounts are mapped installing packet sniffers and extracting session ID to http://tinyurl.com/8ydws and http://tinyurl. from the server side exploits, it is possible for the

180 A Multistage Framework to Defend Against Phishing Attacks

phishers to hijack the user’s current session. companies spoofed most often are Citibank, eBay, Pharming and Host Redirection Attacks: and PayPal. Also, the targeted industries are Pharming is a domain redirection attack wherein financial services, Internet retailers and Internet an attacker modifies the victim’s Domain Name Service Providers. Phishers adapt quickly and Service (DNS) (Kerner, 2005) infrastructure so target organizations such as Internal Revenue that the users are redirected to a fake Web site Service (IRS) and charity organizations that are even when the legitimate URLs are keyed in not safeguarded. Usually, the company’s im- correctly. “DNS Cache Poisoning”, exploitation age and links referring to the company’s Web of vulnerabilities in DNS software, surreptitious site are spoofed in the fake email to deceive its modification of host files in victims’ computer and customers. router can be used to achieve pharming attacks. Non-Matching URLs: In spoofed email Phishing attacks that use domain redirection are messages, the link text seen in the email is usu- difficult to detect as they do not depend upon ally different from the actual link destination. obfuscation techniques that trick the users into In the following example, though the email visiting the fake Web site. refers to “http://www.chase.com” it redirects the Botnets and Malware based Phishing At- user discretely to the site http://www.climagro. tacks: Botnets are a collection of compromised com.ar/agro/chase.htm which is the actual re- machines that act under the attackers’ command- ferred Web site http://www.chase. the dynamic DNS service. As these botnets do com. not directly root back to an attacker, they can be Age of Referred Domains: Most of the phish- used to surreptitiously send out phishing emails ing Web sites are hosted using free Web hosting and launch fake Web pages. Malware, other than services or from compromised machines that recording the users’ keystrokes and input data, are running dynamic DNS service. In general, can capture the users’ browsing history to send these fake sites are short-lived; they are detected out targeted phishing attacks known as ‘spear automatically by monitoring bots or taken down phishing.’ as a result of users’ complaint. Hence, active uptimes of most of the phishing domain are less when compared to their legitimate counterpart. Structure of Phishing Emails WHOIS query on a domain gives out the date at which it was registered along with their location In order to devise defense solutions that can information. The date and location can then be detect phishing emails, it is important to used to determine whether it is a phishing site or chart out invariant properties that are pres- not. For example, Fette et al. (Fette et al., 2007) ent in most, if not all, of the phishing emails. mark the domains registered in the past 60 days These invariant properties are mostly visual de- as phishing. ceptive agents employed by the phisher to trick Using IP Addresses instead of URLs: Fre- the users. Identifying these invariant properties quently, phishers attempt to conceal the destina- also helps in building discriminators that are ac- tion Web site by scrambling the URLs so that it is curate and less prone to false positives. hard for normal users to tell it fake. One method of Spoofing of Online Banks and Retailers: concealing the destination is to use the IP address Phishing emails closely imitate online banking of the Web site, rather than the hostname. An ex- and retailers to gain the trust of the users. The ample of an IP address used in a fraudulent email message’s URL is “http://210.14.228.66/sr/.”

181 A Multistage Framework to Defend Against Phishing Attacks

Generalization in Addressing Recipients: on a challenge-response technique to establish As the success of email based phishing attacks the authenticity of a Web site. Unlike traditional depend on their reachability to vast number of solutions that focus either on detecting phishing recipients, most of the phishing emails do not emails or Web sites, our approach is encompassing contain personalized content while addressing as the combination of these proactive or reactive their potential victims. Unlike legitimate busi- approaches can be used to protect the users from ness communication, they do not address the falling prey to both fake emails and Web sites. customers using their names for identifiers, and Here, we discuss the workings of these stages in lack embedded scrambled information such as detail and analyze their performance. ‘last four digits of account information’, which is used to establish authenticity. Although, it might Stage I: Detecting Phishing E-mails be possible for a phisher to include this informa- through Structural Properties tion, by employing social engineering and other malpractices, the success rate of such attacks is For our first stage, we perform a linguistic analysis limited; it is hard to target wide range of users. of the phishing email content in order to detect Usage of Well Defined Situational Contexts the ‘tone’ or the implied message of the email. to Lure Victims: As the objective of phishing We consider the identification of implied sense emails is to trick the users into divulging their of threat/lure, or more generally, the ‘tone’ of the confidential information, phishers modify the tone email, a critical factor towards not only identify- and underlying meaning of the message body to ing phishing emails, but also communicating the (i) invoke a sense of false urgency – an user may importance of the email to the end user. Consider be instructed to revalidate his account information the phishing emails that get past the standard in the masqueraded Web site within the 24 hour phishing filter: if our framework can provide a period, (ii) invoke a sense of threat – phishing meaningful communication to the user regarding messages may threaten the users into divulging the intentions of the email originator, it would their confidential information to prevent account not only be an effective methodology to defeat revocation, (iii) invoke a sense of concern – in the attack, but also educate the naïve user to the their emails, phishers may imply false security potential harmful effects, which, after all, is the promises such as weak password change to trick key to defeating these attacks. An important issue the users in changing the passwords in the fake is to use characteristics such as language, layout, Web site, (iv) invoke a sense of opportunity/re- and structure of phishing emails so that it is able to ward – phishers might lure victims to reveal their capture all different contexts of phishing emails, information as a part of the survey which credits with a high degree of confidence. For this pur- money to their accounts. pose, we propose using certain features relevant to language, composition and writing, such as particular syntactic and structural layout traits, A Two Stage Approach to patterns of vocabulary usage, unusual language Detect Phishing Attacks usage, stylistic and sub-stylistic features that are present in most of the phishing emails. The In this section, we present a two-stage approach identification and learning of these features with to mitigate phishing attacks. The first stage aims a sufficiently high accuracy is the most difficult at detecting phishing based on their semantic challenge during automated phishing email clas- and structural properties, whereas in the second sification. Although, the identification process is stage we propose a proactive technique based time consuming, it is a one-time effort and can be

182 A Multistage Framework to Defend Against Phishing Attacks

achieved with the help of domain experts. Fur- tion which minimizes the generalization error. thermore, we use feature selection and ranking Suppose we have N training data points {(x1, y1)… ∈ n ∈ metrics that assists in pruning away the unwanted (xi, yi)} where each xi  and yi {± 1}, we features that do not have any significance in actual would like to learn a linear separating hyperplane classification. These derived features, together classifier that separates the positive and negative with one-class Support Vector Machine (SVM) examples. The points which lie on the hyperplane are used to stop phishing emails even before it satisfy w. x + b = 0, where w is normal to the reaches the users’ inbox, essentially reducing the hyperplane, |b| / ||w|| is the perpendicular distance human exposure. from the hyperplane to the origin and ||w|| is the Euclidean norm of w. Let d+ (d-) be the shortest Feature Selection distance from the separating hyperplane to the positive (negative) example. The margin of the One of the main issues during classification of separating hyperplane is defined as (d+) + (d -). phishing emails is improving the accuracy of the Therefore, suppose that all the training data satisfy underlying algorithm by pruning the unwanted the following constraint: features that do not contribute towards accurate prediction. Therefore, the trivial and weak fea- yi (xi . w + b) - 1 ≥ 0 ∀ i tures termed as ‘classification noise’ have to be removed for proper functioning. Formally, feature The support vector machine attains better clas- selection is defined as follows. Given a set of la- sification by maximizing this margin. We adopted beled data points {(x1, y1), …, (xi, yi)} where each SVM as our underlying algorithm because it ∈ n ∈ xi  and yi {± 1}, choose a subset m (m < has been successfully used in text classification n) features that achieve the lowest classification applications, and especially in the field of com- error. As deterministic selection of m-best fea- puter security in the context of spam detection, tures is an expensive task, which is exponential hidden email construction, authorship attribution in the number of features, many heuristic search and masquerade detection. The main advantage based algorithms are used for feature selection. of using SVM as a learning algorithm is that it Here, we apply simulated annealing for feature is completely oblivious to the number of input selection, which is a well suited approximation features, and rather focuses on increasing the measure for locating global optimum in a large separable margin. search space. A subset of features identified by the domain experts is randomly chosen and evaluated Evaluation as a part of the algorithm. A control parameter is varied to narrow down the random selection of Dataset: The dataset used for evaluation consists the features such that the classification error is of 400 emails out of which 200 were phishing reduced. Thus, the algorithm converges to yield emails and the rest normal emails, also known a subset of features that performs near accurate as ham emails in the information retrieval (IR) classification. community. These phishing emails were collected over a period of six months. In order to maintain Classification uniqueness in the dataset, redundant copies of phishing emails were discarded. The ham email Support vector machine is very well suited for set consists of two parts: (i) 140 of the ham emails linear binary classification. The concept of SVM were collected from postings on public news- is based on the idea of structural risk minimiza- groups, bulletin boards, and users’ inbox; (ii) The

183 A Multistage Framework to Defend Against Phishing Attacks

remaining 60 emails were obtained from eight blank lines. In the second step, style marker and users who volunteered to provide emails sent to structural features were extracted. them from legitimate business organizations such We have used SVMlight, the support vec- as credit card statements, online purchase receipts tor machine classifier developed by Joachims from Amazon, and so on. In order to abate privacy (Joachims, 1998). Since SVM is a linear classi- concerns, we provided the participants a cleaning fier that provides only a two-way categorization, tool to sanitize all identifying information such it directly maps to our purpose. A total of 200 as account numbers, account balance, etc. emails, 100 phishing and 100 non-phishing emails Experimentation: A total of 25 features con- were used in the training phase. The confusion sisting of a mixture of style marker and structural matrix obtained as a result of applying multiple attributes are used as shown in tables 1 and 2. runs of the experiment is documented in Table Here, we explain how some of these features are 4. In runs I and II, all 18 function words were derived. All words of length less than 2 are omitted used. The only difference between the two runs for the sake of sanity. The total number of words is the usage of 2 structural features. Inclusion of (W) is calculated as the sum of all the words from the structural features results in 100% detection both header and body of the email document. A accuracy. In runs III & IV, the total number of similar method is adopted for the calculation of function words used as features was reduced to 5 the features C and U as shown in Table 1. Also, from 18 and five most frequent words that appear we have chosen 18 different functional words as in the phishing email were picked. In run III, both features for classification. These words are col- structural attributes were used and in run IV no lected by observing repositories of phishing emails structural attributes were used. Result shows that and then extracting their common properties. As the classification tends to get better when the words the existing emails intend to model or capture the that are unique to phishing emails are carefully characteristic that are unique to phishing emails picked and used as features. Removal of structural such as a sense of threat, concern or urgency, we attributes from the training set has resulted in the select several keywords that are associated with drop of accuracy by 20%. This is a good indication this context such as risk, suspended, identity, etc. of the fact that studying the structural properties A complete listing of such commonly used words of phishing emails and employing them in clas- is provided in Table 3. We believe that all such sification will have a positive effect. Runs V & functional words put together would closely give VI follow the same methodology as runs III & IV rise to a character that is not very likely to be the but with a different set of function words. This one exhibited by authentic emails. type of experiment was conducted to highlight Two simple structural properties described in the point that the function words feature selec- Table 2 are used as features. The first one is derived tion bears a profound impact on the obtainable from the subject line of the email by checking if accuracy level. The experiment was conducted it exhibits a certain pattern that makes it more multiple times by varying the number of styles suspicious of being a phishing email. The second and structural attributes used. Also, we applied one is derived in a similar way by looking at the simulated annealing to select a proper subset of salutation/greeting used in the first line of the most relevant features. This was done a number of email body. In both cases, binary classification times until a particular feature configuration was is performed and only the presence or absence repeated during simulated annealing. We believe of the pattern is used as the feature. The experi- this will provide a complete analysis of the role mentation was carried out in two steps. In the first played by several functional words and attributes step, emails were pre-processed to remove all the that are unique to phishing in classifying them apart from the genuine ones.

184 A Multistage Framework to Defend Against Phishing Attacks

The performance of our model was measured Accuracy = total # of emails classified correctly total # of emails to classify using the common techniques used in informa- tion retrieval and text categorization, namely, the (2RP) Combined F Statistic = calculation of the Precision (P) and Recall (R) 1 (R+P) metrics (Baeza-Yates & Ribeiro-Neto) as show in Table 5. Precision gives out the proportion of docu- The emails used in our experiments had little ments retrieved that are relevant, whereas Recall relation among each other. The topics picked/used indicates the proportion of relevant documents were vastly independent of one another. This fact that are actually retrieved. Formally, precision coupled with the results we obtained above is a and recall are calculated as follows: clear indicator of the fact that phishing emails have characteristics which when identified and

Precision (P) = # phishing emails classified isolated could be used to demarcate them from # phishing emails classified + # ham emails classified genuine emails. Challenge lies in the ability to # phishing emails classified Recall (R) = # phishing emails classified + # phishing emails not classified pick features that will distinguish a phishing email from the closely resembling non-phishing one it In other words, Precision is the number of rel- spoofs. Results of our experiment aim to present evant documents retrieved divided by the total that appropriate choice of structural and style at- number of retrieved documents and Recall is the tributes can aid towards such finer classification number of relevant documents retrieved divided via the application of Support Vector Machines. by the number of irrelevant documents retrieved. In addition, we use two metrics Accuracy and Stage II: Mimicking User Response

Combined F1 Statistic which are defined below. to Detect Phishing Attacks Accuracy is calculated as: Most of the present day phishing emails can be detected using our first stage, yet phishing emails,

Table 1. Style marker attributed extracted from the email document; total of 23 style marker features are used

Style Marker Attribute Total number of words (W) Total number of characters (C) Total number of unique (distinct) words (U) Vocabulary richness i.e., W/C Function word frequency distribution (18 features) (see Table 3) Total number of function words/W

Table 2. Two structural attributes extracted from the email document

Structural Attribute Structure of the Email Subject line Structure of the Greeting provided in the email body

185 A Multistage Framework to Defend Against Phishing Attacks

Table 3. List of 18 functional words used in the similar to spamming, can be cleverly devised by experiment omitting the relevant features that are used for training the classifier. To detect such outliers, a

Keywords more proactive standpoint is adopted in the second ACCOUNT stage by using fake responses which mimic the ACCESS real users, essentially reversing the role of the victim and the adversary. Here, a phishing attack BANK is viewed as a two-round game between the user CREDIT and the adversary. In the first round, the attacker CLICK sends email messages pretending to represent IDENTITY legitimate business domains for tricking the users INCONVENIENCE into divulging their personal information. The INFORMATION success of the attack lies in the phishers ability LIMITED to craft the attack in a manner that a naive user LOG is unable to differentiate between the legitimate MINUTES and the masqueraded messages, as shown in PASSWORD Figure 1. As a first step, the incoming messages RECENTLY are analyzed for the presence of embedded links RISK and attached HTML forms. If the email does not SOCIAL contain such features, further investigation is SECURITY safely discarded. Otherwise, a set of “phantom” SERVICE users, are assigned to actively communicate with SUSPENDED these Web sites with appropriate random values as shown in Figure 2. The random/fake information supplied to the Web sites act as active honeytokens (Spitzner, 2003) and the Web sites’ responses are forwarded to the decision engine for further

Table 4. 2-Way Confusion Matrix between the actual and predicted categories

Experiment Predicted Category Run Actual Category Phishing Not phishing

I Phishing 90(A) 10 (B) Not phishing 0 (C) 100(D) Phishing 100 0 II Not phishing 0 100 Phishing 60 40 III Not phishing 0 100 Phishing 80 20 IV Not phishing 0 100 Phishing 50 50 V Not phishing 0 100

186 A Multistage Framework to Defend Against Phishing Attacks

Table 5. Results of SVM classification

Run No. Precision (P) Recall (R) Accuracy Combined Statistic (F1) I 100% 90% 95% 94.7% II 100% 100% 100% 100% III 100% 60% 80% 75% IV 100% 80% 90% 88.8% V 100% 50% 75% 66.7% analysis. The key idea here is to shield the user name, password, credit card numbers, social from giving out critical personal information until security number, password, etc. Depending on authenticity of the Web site is verified. Since the the nature and type of these variables, appropri- attacker cannot distinguish between the fake and ate honeytokens are supplied to the Web site by legitimate responses, his response is indifferent phantom users. The behavior of the Web site to to both real and contrived responses. the honeytokens is recorded and analyzed for any The detailed working of Stage II is as follows: activities not conforming to reasonable response. First, the preprocessor probes the mail server for The decision engine is formalized as a rule based incoming messages. Once an email arrives, its system, which relies on set of pre-determined content is examined for the presence of embedded propositions and inference rules to deduce whether links and HTML forms. Emails with HTML forms the process has terminated in any of the known that request sensitive information are directly attack instances. tagged malicious. In the presence of an embedded URL, the control is passed to the content parser Case Studies which constructs the HTML document object model (DOM) (W3C, 1998) tree of the referred To illustrate the efficacy of Stage II, we have Web page. HTML DOM tree is the commonly evaluated it against 200 different phishing emails. used standard to encapsulate the HTML docu- At the time of testing, however, only 87 of these ments in a hierarchical structure and to provide emails contained links referring to live phishing interfaces for accessing and manipulating them. Web sites. In order to measure the false positive The DOM tree is traversed to determine if it has rates, we also tested against emails that contain forms containing variables with names as user- embedded URLs of legitimate domains. Based on

Figure 1. Defense-centric view: Who is the real Figure 2. Offense-centric view: Who is the real sender - legitimate or adversary? respondent - the real or Phantom USER?

187 A Multistage Framework to Defend Against Phishing Attacks

the tests, we show that Stage II was able to suc- fake authentication values, as shown in Figure 4, cessfully detect all email based phishing attacks the site predictably refers to a page asking credit with zero false alarms. Furthermore, to the best card related information, thereby triggering our of our knowledge, we can claim that our tool can tool to raise an alarm. As most of the observed detect most of the email based attacks listed on email based phishing scams adopts similar attack the www.antiphishing.org archive. For illustrative model, these can be easily detected. purposes, we show three different scenarios which Scenario 2: In the second example we show the exemplify the working of our tool. The interaction working of our tool on an email mimicking eBay between the phantom user and the phisher’s Web Web site. The email had a URL which redirected site is captured by hooking the detection engine, the users to the phishing Web site http://www.cba. as an ActiveX control in Internet Explorer. or.th/member/. There were two noticeable differ- Scenario 1: In the first example, we look ences in this phishing site: (i) This site attempted at a simple email based phishing attack against to spoof its URL as a legitimate site using an IE the Regions Bank. First, the phisher sends an vulnerability. On our test machine, this spoofed email in HTML format, requesting the users to URL was clearly detected since the machine was verify their account data by following the em- patched. (ii) Also, the behavior of this site was bedded link. Here, the visible link in the email different from the other cases. Upon submission https://secure.regionsnet.com/EBanking/logon/ of any value, the user was asked to enter his/her user?a=defaultAffiliate masks the reference to information again. Only when the submission was the phisher’s Web site http://www.club-daich. made a second time in the same browser session, com/.checking/regions/. Such attacks can be easily the user was directed to another page asking for determined by the preprocessing engine as shown more information. This is an excellent social in Figure 3, which bases its decisions solely based engineering tactic where the phisher assumes on the visual differences. Also, to further validate that the naive user on receiving an email about our claim, the phisher’s Web site is supplied with account suspension would hastily type in wrong fake information. Upon automatic submission of credentials. However, as the Web site accepts the

Figure 3. Preprocessor analyzing the referred Web page and auto submitting with fake random values

188 A Multistage Framework to Defend Against Phishing Attacks

Figure 4. Redirection to a page asking for more information even after submission of fake values

typed information the second time, we can repeat tested Web sites thereby eliminating the need to the same process of supplying honeytokens to the test for previously tested domains. fake Web site to ensure correctness. Scenario 3: The third example is to show the Evaluation working of our system against emails received from the legitimate domains. Here, we test our An evaluation of the second stage was conducted tool with an email containing the URL referring to quantify the performance overhead incurred to the hotmail login page. Though our tool cor- during detection. The overhead introduced by rectly identified this to be legitimate email, there our detection system highly depends upon two are two caveats in hotmail. Usually, when users parts: (a) phantom user instantiation overhead; type in their user name in hotmail and move to the (b) response analysis overhead. We performed password field, a script automatically fills in the our experiments on an Intel Pentium M, 1.3 GHz ‘@hotmail.com’. However with our tool, no such processor with 512Mb RAM. The five attacks action happens. Submission of these contrived illustrated in the www.antiphishing.org archive values resulted in a pop-up Java script box ask- were replicated on an Intel Pentium 4, 2.60GHz ing the information to be entered again. Our tool processor running Apache HTTP Server version nevertheless detects that fake inputs lead to the 1.3.33. The operating system is Redhat Linux run- same behavior and infers that this is a legitimate ning kernel version 2.4.20. We also benchmarked site. We again would like to note in passing, that the execution time of each of the subcomponents, it is trivial to maintain a list of such domains, to using customized auditing scripts. appropriately random values. While being able to Phantom User Instantiation Overhead: The detect legitimate domains correctly, it is possible overhead involved in instantiation of phantom that an attacker launch denial-of-service attacks user is the aggregation of the time taken by the by sending emails with URLs of real domain. preprocessing component plus the time needed Though this poses a serious threat, during real- to generate the fake values. However, generation time deployment we can force the traffic through of the fake values on-the-fly can be preempted by our own server, which maintains the list of all the storing the probable variable names along with

189 A Multistage Framework to Defend Against Phishing Attacks

their values beforehand. Instantiation of phantom Conclusion users, on an average took 1.2secs with a standard deviation of 510msecs for its operations. Several antiphishing toolbars and browser exten- Response Analysis Overhead: The total time sions have been proposed to address the phishing taken by the response analysis subsystem is the scourge. Although these techniques provide a time taken to post the response of phantom users first-line of defense in preventing phishing- at plus the time taken for analysis. The average delay tacks, they suffer from several limitations. Most time because of response analysis was of the order of these tools require server-side assistance for of 2.35secs with an exception that links whose functioning, incurring high setup and operational Web site did not exist took far longer because of cost. Existing client-side defenses also have the the timeout policy. From our observations, we can burden of keeping up server-side IP changes conclude that our detection framework does not making them vulnerable to server level exploits. introduce any significant computation overhead Also, phishing techniques, growing in ingenuity in the system. Also, the modular nature of the as well as sophistication, render these solutions individual subcomponents provides hooks to weak. Unless new solutions are devised to defeat replace existing modules with efficient variants, phishing attacks, this scourge will continue to without affecting the overall performance. hurt unsuspecting Internet users in their day-to- day lives. To address these deficiencies, in this Limitations chapter, we present a two-stage approach which employs a combination of content analysis and The technique given in stage II has a few limi- challenge-response mechanism to detect phishing tations. First, if this tool is widely adopted, the attacks. In the first stage, emails are identified phishers can circumvent the given defense mecha- using the linguistic and structural properties nism by replaying the response of the legitimate present in the content. As the number of such site for spurious inputs. However, such behavior features is usually large, we employ simulated is disastrous from the phisher’s standpoint, as annealing as a feature selection heuristic. Fur- it may invoke suspicion in users, if they consis- thermore, actual classification is performed by tently observe invalid data error despite provid- using SVM. Experimental results show that our ing authentic information. Second, phishers can technique is effective in identifying phishing include robot detecting schemes like CAPTCHA emails with minimum errors. Since the first stage (completely automated public Turing tests to tell is purely based on content analysis, it is possible computers and humans apart) in their Web sites to for an attacker to fabricate phishing emails which subvert the tool’s effort to enact the responses of can bypass the content analysis filter. Hence, in the legitimate users. Currently, this is not a prob- our second stage, we designate phantom users lem, as CAPTCHA is widely used for preventing to mimic the responses of legitimate users. The automated registration rather than user validation. response obtained back from the site is analyzed Finally, there might also be legal ramifications to check if it is able to tell the phantom user and of our tool consuming the sites bandwidth and the legitimate user apart. Lack of such discern- computation power for its detection purposes. ing capability is used as an imperative to deem Though the traffic can be contained by the use the site phishing. Even though when deployed in of distributed lists, like Web crawlers they also isolation these stages have some limitations, it should operate with caution, to not violate any is their tandem working that can detect the most Web site’s terms of usage. sophisticated phishing attacks.

190 A Multistage Framework to Defend Against Phishing Attacks

References Fette, I., Sadeh, N., & Tomasic, A. (2007). Learn- ing to detect phishing emails Paper presented at Anti-Phishing Working Group. Retrieved July, the 16th international conference on World Wide 11, 2007, from http://www.antiphishing.org/ Web (WWW), Banff, Alberta, Canada. Baeza-Yates, R., & Ribeiro-Neto, B. Modern Gartner Press Release. (2006). Gartner says Information Retrieval. number of phishing e-mails sent to U.S adults nearly doubles in just two years. from http://www. Chandrasekaran, M., Chinchani, R., & Upad- gartner.com/it/page.jsp?id=498245 hyaya, S. (2006). PHONEY: Mimicking User Response to Detect Phishing Attacks. Paper How to obscure any URL - How Spammers And presented at the World of Wireless, Mobile and Scammers Hide and Confuse. (2002). Retrieved Multimedia Networks (WoWMoM), Niagara July, 11, 2007, from http://www.pc-help.org/ob- Falls, Canada. scure.htm Chandrasekaran, M., Narayanan, K., & Upad- Joachims, T. (1998). Text Categorization With hyaya, S. (2006). Phishing Email Detection Based Support Vector Machines: Learning With Many on Structural Properties. Paper presented at Relevant Features. Paper presented at the 10th the New York State Cyber Security Conference European Conference on Machine Learning (NYS), Albany, NY. {ECML}-98. Chou, N., Ledesma, R., Teraguchi, Y., & Mitchell, Kapadia, A. (2007, March/April). A Case (Study) J. (2004). Client-Side Defense Against Web-Based For Usability in Secure Email Communication. . Paper presented at the 11th An- IEEE Security and Privacy, 5, 80-84. nual Network and Distributed System Security Kerner, S. M. (2005). DNS-Based Phishing At- Symposium (NDSS), San Diego, CA. tacks on the Rise Retrieved July, 11, 2007, from Debuse, J., & Rayward-Smith, V. (1997). Feature http://cws.internet.com/article/2792-.htm subset selection within a simulated annealing NetCraft. (2004). NetCraft Anti-phishing Tool- data mining algorithm. Journal of Intelligent bar. Retrieved July, 11, 2007, from http://toolbar. Information Systems. netcraft.com/ Dhamija, R., Tygar, D., & Hearst, M. (2006). Why Phishing - Wikipedia. Retrieved July 10, 2007, Phishing Works. Paper presented at the SIGCHI from http://en.wikipedia.org/wiki/Phishing conference on Human Factors in computing sys- tems, ACM Special Interest Group on Computer- Slashdot. (2006). Phishing in Yahoo! Geocities? Human Interaction. Retrieved from http://ask.slashdot.org/article. pl?sid=06/07/12/0028254 Drake, C., Oliver, J., & Koontz, E. (2004, July). Anatomy of a Phishing Email. Paper presented Spitzner, L. (2003). Honeytokens: The Other at the First Conference on Email and Anti-Spam Honeypot. Retrieved July, 11, 2007. Mountain View, CA. Spoofstick. (2004). Spoofstick Toolbar. Retrieved Drucker, H., Wu, D., & Vapnik, V. (1999). Sup- July, 11, 2007, from http://www.spoofstick.com/ port vector machines for Spam categorization. W3C. (1998). HTML Document Object Model IEEE-NN, 10, 1048--1054. (DOM). from http://www.w3c.org/DOM

191 A Multistage Framework to Defend Against Phishing Attacks

Wenyin, L., Huang, G., Xiaoyue, L., Min, Z., & Feature Selection: Feature Selection is a pro- Deng, X. (2005). Detection of phishing webpages cess of selecting a subset of relevant features so based on visual similarity. Paper presented at the that the net performance of underlying classifier Special interest Tracks and Posters of the 14th is increased. Feature selection helps to minimize international Conference on World Wide Web the presence of “noise” that adversely affects the (WWW), Chiba, Japan. model building. Zhang, Y., Egelman, S., Cranor, L., & Hong, J. Linear Binary Classification: The process

(2007, 28 February--2 March, 2007). Phinding of separating a set of m examples {(x1, y1)… (xm,

Phish: Evaluating Anti-Phishing Tools. Paper ym)} into two regions by a linearly separable presented at the 14th Annual Network and Dis- hyperplane parameterized by w such that yi (xi . tributed System Security Symposium (NDSS w + b) > 0 for all i = 1…m. Such a hyperplane 2007), San Diego, CA. is called as separating hyperplane. Zhang, Y., Hong, J., & Cranor, L. (2007). Cantina: Phishing: Phishing is a form of Web based a content-based approach to detecting phishing identity theft where attackers employ deceit and web sites Paper presented at the 16th international social engineering to defraud users of their private conference on World Wide Web (WWW), Banff, and confidential information such as password, Alberta, Canada. credit card number, social security number (SSN), and bank account number. Phishing Email Structural Properties: Key Terms Phishing email structural properties represent the set of invariant features that are present in Challenge-Response Analysis: Challenge- most, if not all, of the phishing emails. These response analysis is an authentication mechanism invariant properties are mostly visual deceptive where either one or both the communicating agents employed by the phisher to trick the users. parties adhere to a pre-agreed protocol used in These invariant properties also helps in building verifying their identities. The party which desires discriminators that are accurate and less prone to prove its identity has to provide correct response to false positives. to the challenge posed by the opposite party with which it desires to communicate. Context Models: Context models encapsulate Endnote the messages conveyed in the phishers’ emails to attract the potential victims into the fake Web sites. 1 A recent phishing attack on Yahoo Inc. Phishers usually employ some kind of threat, fake (Slashdot, 2006) showed that phishing at- reward, and false pretext in their email message tacks could be launched within the same to trick the users. domain. The phisher had actually hosted the fake login page in www.geocities. Email / Web site Spoofing: Email/Web site com/login_group_auth, where geocities. spoofing is the process by which the look-and-feel com is Yahoo Inc.’s Web hosting service and the behavior of fake Web sites/emails is forged and login_group_auth is a Yahoo account to mimic their legitimate counterpart. created by the phisher.

192