Client-side defense against web-based

Neil Chou Robert Ledesma Yuka Teraguchi Dan Boneh John C. Mitchell Computer Science Department, Stanford University, Stanford CA 94305 {neilchou, led242, yukat, dabo, jcm}@stanford.edu

Abstract experiment with this approach using a browser plug-in called SpoofGuard. The plug-in monitors a user’s In- ternet activity, computes a spoof index, and warns the Web spoofing is a significant problem involving - user if the index exceeds a level selected by the user. ulent and web sites that trick unsuspecting users While Internet-savvy users who watch the address bar, into revealing private information. We discuss some status bar, and other information carefully may not need aspects of common attacks and propose a framework SpoofGuard, the current level of accuracy and effec- for client-side defense: a browser plug-in that exam- tiveness may be sufficient to help many unsophisticated ines web pages and warns the user when requests for web users. If the methods we propose become widely data may be part of a spoof attack. While the plug- deployed, through our plug-in or through other client- in, SpoofGuard, has been tested using actual sites ob- side defensive software, then phishers will certainly take tained through government agencies concerned about steps to circumvent them. However, we expect further the problem, we expect that web spoofing and other effort and study to produce correspondingly better de- forms of identity theft will be continuing problems in fenses. Moreover, if synergistic server-side methods are coming years. deployed by concerned companies, it seems possible to thwart increasingly sophisticated attacks. SpoofGuard uses domain name, url, link, and 1 Introduction image checks to evaluate the likelihood that a given page is part of a spoof attack. For ex- Web spoofing, also known as “” or “” ample, a page with a suspicious url such as [CNN03, FBI03], is a significant form of Internet crime etrade-maintenance.suspicious.org that is launched against hundreds or thousands of indi- or [email protected]/ viduals each day. The US Secret Service and the San maintainance.asp and an E*Trade logo will have Francisco Electronic Crimes Task Force report that ap- a higher spoof index than a page with neither of these proximately 30 attack sites are detected each day. Each characteristics. SpoofGuard also uses history, such as attack site may be used to defraud hundreds or thousands whether the user has visited this domain before and of victims, and it is likely that many attack sites are whether the referring page was from an email site never detected. A typical web spoof attack begins with such as Hotmail or Yahoo!Mail. Most importantly, bulk email to a group of unsuspecting victims. Each is SpoofGuard intercepts and evaluates user posts in told that there is a problem with their account at a site light of relevant history and the spoof index of a form such as E*Trade. Victims of the spoofing attack then fol- page. SpoofGuard examines post data user name and low a link in the email message to connect to a spoofed password fields and compares posted data to previ- E*Trade site. Once a victim enters his or her user name ously entered passwords from different domains. This and password on the spoof site, the criminal has the mechanism warns a user against sending her E*Trade means to impersonate the victim, potentially withdraw- password to a site with an E*Trade logo but outside ing money from the victim’s account or causing harm in the etrade.com domain, for example. Password other ways. comparisons are done using a cryptographically secure We describe some common characteristics of recent hash, so that plaintext passwords are never stored by web spoofing attacks and propose a framework for SpoofGuard. client-side countermeasures. Like other inexact detec- Stopping web spoofing bears some similarity to intru- tion mechanisms, including virus detection and email sion detection, spam filtering, and thwarting traditional spam filtering, the approach we explore involves look- social engineering attacks. Intrusion detection systems ing for characteristics of previously detected attacks. We [Pax99, Sno03] typically monitor network and host ac-

1 tivity, compute statistical or other indices, and attempt tegrity of browser indicators such as the url indicator in to detect intrusions by comparing the index of current the status bar, not analyzing user behavior, web pages, activity against previous statistics. While web spoofing and html post data to stop leakage of sensitive user in- may be regarded as a special case of intrusion detection, formation. While we considered using an alternate term the browser seems like that appropriate place to combat such as “phishing” in this paper, we use “web spoofing” web spoofing. A browser plug-in is relatively easy to in- since this currently appears to be the term most com- stall and has access to honest and spoof pages sent over monly used by law enforcement and concerned compa- https, giving SpoofGuard a better chance of catching an nies. attack than a network proxy or other external http traffic The goals of this paper are to raise awareness of the monitors. While a plug-in alone does not have full infor- web spoofing problem and propose a framework for mation from email programs such as Outlook or Eudora client-side protection. While sophisticated and deter- that may contain the messages that launch an attack, the mined attackers will be able to circumvent our current browser does provide an indication of the referring page tests (through simple techniques we explain later in the or application, and it is possible to scan and parse pages paper), there is plenty of room for improving specific from email sites such as Hotmail or Yahoo!Mail. There- tests and tuning the coefficients of our spoof index func- fore, for non-expert users who read email through their tion. Furthermore, the web spoofing problem is impor- browser, SpoofGuard has the potential to examine every tant and we believe our SpoofGuard experience will be step of a standard web spoof attack. useful for developing more sophisticated defenses. We Like other intrusion detection efforts, it is appropriate discuss the web spoofing problem in more detail in Sec- to evaluate SpoofGuard by measuring its effective at pre- tion 2, and our solutions in Section 3. The SpoofGuard venting attacks, the false alarm rate (number of unnec- implementation and user interface are described in Sec- essary warnings), and its performance impact. Spoof- tion 4. Some SpoofGuard evaluation information ap- Guard will only be useful if it detects attacks without pears in Section 5, followed by suggestions for server- raising too many false alarms, since users will almost side methods in Section 6, some more speculative client- certainly reject any method that interferes with normal side methods in Section 7, and concluding remarks in browsing activity. We have evaluated the false-alarm Section 8. rate by using SpoofGuard ourselves over a period of Throughout the paper we use the following terminol- time, and we have evaluated its effectiveness for pre- ogy. venting attacks using actual spoof sites brought to our • Spoof site or Spoof page: the site or page that is a attention by members of the San Francisco Electronic malicious copy of some legitimate web page. Crimes Task Force. While this is not an extensive • Attacker: the person or organization who sets up enough test to draw broader conclusions, SpoofGuard the spoof site. does catch the sample attacks found in the wild and does • Honest site or honest page: the legitimate site or not add any noticeable delay to ordinary web browsing. page that is being spoofed. Since web spoofing attacks begin with bulk email, a • Spoof index: a measure of the likelihood that a spe- good general spam solution [Bri03, Din03] could reduce cific page is part of a spoof attack, described in Sec- the incidence of web spoofing attacks. However, current tion 3. spam solutions are only partly effective at blocking un- wanted email, and we are not aware of any spam efforts A prototype version of SpoofGuard will be made pub- aimed specifically at identify theft. While the browser- licly available shortly. based techniques we explore in this paper are comple- mentary and independent of spam filtering, there may be 2 Theproblem additional ways of combining email scanning with web According to Agents of the U.S. Secret Service San page analysis that will lead to better spoof prevention in Francisco Electronic Crimes Task Force [Von03], the the future. U.S. Government’s Complaint Center re- Previous efforts by the Princeton Secure Internet Pro- ceived over 75,000 complaints in 2002. Of this number, gramming group and others [FBDW97, EY01] have ad- 48,000 cases resulted in further action requests. This is dressed another form of “web spoofing” in which an a three-fold increase over 2001. The total dollar losses attacker causes all html page requests from a victim are estimated at more than $54 million compared to $17 to pass through the attacker’s site. This form of web million for 2001. A majority of these fraud complaints spoofing allows the attacker to monitor all of the vic- are intrusions, auction fraud, credit card/debit fraud, and tim’s activities, including posted passwords or account computer intrusion. Agents of the U.S. Secret Service numbers. However, previous methods for countering San Francisco Electronic Crimes Task Force report that this form of attack have focused on maintaining the in- web spoofing was first noticed in late 2001 and grew in

2 popularity in 2002, correlating with the large increase in gan Stanley’s Discover unit, eBay Inc. and its Pay- Internet Fraud. Further, a majority of the $37 million Pal unit, Wachovia Corp.’s First Union unit and the increase in losses from 2001 to 2002 can be attributed Massachusetts State Lottery reported phishing scams in to web spoofing. Agents working fraud cases in the Bay recent months. Some general information about web Area also report that a majority of their Internet cases spoofing, including additional news articles and records involve web spoofing. of actual attacks, may be found at http://www. One factor that adds to the severity of web spoofing antiphishing.org/, a web site provided by Tum- attacks is that many users use the same username and bleweed Communications. password at several sites. This allows a phisher who reels in a victim to use this information on more than one 2.2 Properties of recent attacks site. For this reason, companies that provide password- We describe common properties of ten spoof web sites protected services are dependent on each other for their recently found in the wild. Figure 4 gives an example of security. This is not only true with regard to web spoof- an Ebay spoof (partially obscured by a SpoofGuard pop- ing, but for other kinds of attacks as well. If passwords up warning the user). from one site can be stolen by attacking the site itself, • Logos. The spoof site uses logos found on the hon- these may also be used at other sites that protect their est site to imitate its appearance. password database more effectively. • Suspicious urls. Spoof sites are located on servers that have no relationship with the hon- 2.1 Sample attack est site. The spoof site’s url may contain A recent attack described in a New York Times ar- the honest site’s url as a substring (http: ticle [HF03] actually mentioned fraudulent email, indi- //www.ebaymode.com), or may be simi- cating some level of public awareness of spoof attacks. lar to the honest url (http://www.paypaI. On June 18, 2003, thousands of fraudulent e-mails with com). IP addresses are sometimes used to dis- the subject “Fraud Alert” were sent out, hoping to reach guise the host name (http://25255255255/ Best Buy customers. The e-mails attempted to con- top.htm). Others use @ marks to ob- vince customers that Best Buy’s fraud department re- scure their host names (http://ebay.com: quired additional customer information, “in our effort to top@255255255255/top.html), or contain deter fraudulent transactions.” To further lure unsuspect- suspicious usernames in their urls (http:// ing victims, the e-mail provided a link that purported middleman/http://www.ebay.com.) to reach a “special Fraud Department” at the Best Buy • User input. All spoof sites contain messages to fool web site. Instead, the link actually pointed to a fraudu- the user into entering sensitive information, such as lent page unrelated to Best Buy. The Best Buy attacker’s password, social security number, etc. Some suc- page resembled an official Best Buy page, using the Best cessful spoofs have even been so bold as to ask for Buy logo, incorporating elements from an official Best name, address, mother’s maiden name, driver’s li- Buy page, and providing links to other Best Buy re- cense, and so on. sources. The page requested a customer’s social security • Short lived. Most spoof sites are available for only number and credit card information. a few hours or days – just enough time for the at- A web page from the Michigan Attorney General tacker to spoof a high enough number of users. The [Cox03] cites “a few giveaways to this particular scam:” implication is that defensive methods that alert the • The [email] message did not issue from an user to a spoof site are more effective than reactive @bestbuy.com address, methods that attempt to shutdown the site. • The link embedded in the message does not • Copies. Attackers copy html from the honest site take the user to a “special Fraud Depart- and make minimal changes. Two consequences ment page” on Best Buy’s site, but to a are: (i) some spoof pages actually contain links to page hosted under a completely different do- images (e.g. logos and buttons) on the honest site, main name (such as digitalgamma.com or rather than storing copies, (ii) the names of fields your-instant-credit-reporter.org), and html code remain as on the honest site. We • The “National Credit Bureau” mentioned in the note that when a spoof site refers to the honest site scam does not exist. for embedded images it gives the honest site an op- portunity to detect the spoof: the honest site detects The Michigan Attorney General also points out that an http request for an embedded image where the the Best Buy spoof is similar to spoofs imitating Pay- referral header is not the honest site. Such requests Pal and eBay. A more recent Dow Jones Newswires should not occur unless the honest site is being pla- story [Ber03] states that EarthLink, Citibank, Mor- giarized.

3 • Sloppiness or lack of familiarity with English. commonly used in intrusion detection systems and spam Many spoof pages have silly misspellings, gram- filters [Din03]. matical errors, and inconsistencies. In the Best Buy The scoring function not only sums individual tests, scam, the fake web page listed a telephone number but also sums products of pairs, triples, and larger sub- with a Seattle area code for a Staten Island, NY, sets of tests. The reason for product terms is that when mailing address. certain combinations of events occur the likelihood of • HTTPS is uncommon. Most spoof web sites do not the page being a spoof increases dramatically. For exam- use https even if the honest site does. This simpli- ple, if a company logo appears on an unauthorized page fies setting up the spoof site. and the page contains password and creditcard fields, the page is very likely to be a spoof. Consequently, the term 3 Solutions corresponding to the product of these three tests is given substantial weight. A number of tests can be used to distinguish spoof pages from honest pages. We present the tests we im- 3.2 Stateless page evaluation plemented and evaluated in three groups: stateless meth- ods that determine whether a downloaded page is suspi- We begin by describing a collection of tests that work cious, stateful methods that evaluate a downloaded page by examining the current page only. in light of previous user activity, and methods that eval- uate outgoing html post data. Our browser plug-in ap- Url check There are various methods that attackers plies these tests to all downloaded pages and combines can use to produce misleading urls. For example, an the results using a scoring mechanism described below. @ in a url causes the string to the left to be disregarded, The total spoof index of a page determines whether the with the string on the right treated as the actual url for plug-in alerts the user and determines the severity and retrieving the page. Combined with the limited size of type of alert. Since pop-up warnings are intrusive and the browser address bar, this makes it possible to write annoying, we attempt to warn the user through a passive urls that appear legitimate within the address bar, but toolbar indicator in most situations. A user checkbox actually cause the browser to retrieve a page from an ar- can eliminate all pop-ups if desired. bitrary site. We note that server-side methods, such as tracking server image requests, may also be effective in identi- fying spoof sites. However, the focus of this paper is on Image check Spoof sites usually contain images taken client-side browser solutions. In section 6, we comment from the honest site. For example, the eBay logo ap- on some ways that server-side modifications may make pears on spoofed eBay pages to give the user the im- our client-side methods more reliable and effective. pression that they are communicating with eBay. If the eBay logo appears on a login page unrelated to eBay, 3.1 Scoring that page is suspicious. The same applies to other iden- tifiably eBay-specific images such as banners and but- Given a downloaded web page and some browser state tons. We note that corporate logos often legitimately ap- as input, our plug-in applies tests T1,...,Tn, with test pear on many e-commerce sites (e.g., the Amazon logo Ti producing a number Pi in the range [0, 1]. By con- appears on sites that sell products through Amazon) and vention, Pi = 1 indicates that the page is likely to be a therefore we only count this test for pages that ask for spoof and Pi = 0 indicates the opposite. Most of our private user input. tests return either 0 or 1, but some can return a value In order to apply this check in a stateless way, the between 0 and 1. SpoofGuard plug-in is supplied with a fixed database of We combine the test results into a total spoof score, images and their associated domains. Since attackers TSS, using a standard aggregation function: generally do not have email lists for customers of spe- n cific sites, they must try to spoof sites that are used by a TSS(page) = Pi=1 wiPi n significant fraction of web users. Thus SpoofGuard can + Pi,j=1 wi,j PiPj n be useful even if we only account for relatively small + Pi,j,k=1 wi,j,kPiPjPk + . . . number of frequently spoofed domains such as eBay, PayPal, AOL, and so on. When the browser downloads The w’s are preset weights selected to minimize the false a login page all images on the page are compared to im- alarm rate. Note that most of the w’s are set to zero so ages in the SpoofGuard database. The spoof-score for that the actual number of terms in the expression is rel- the page is increased if a match is found but the page’s atively small. This approach of applying multiple tests domain is not a valid domain for the image. and combining the results using a scoring mechanism is What if the spoof page contains a slight modification

4 of the real image? The image comparison test might fail Referring page When a user follows a link, the to detect the spoof. Fortunately, as noted earlier, attack- browser maintains a record of the referring page. Since ers often directly copy or link to images on the honest the typical web spoofing attack begins with an email site. Nevertheless, we defend against small image mod- message, a referring page from a web site where the user ification by storing an image hash rather than the ac- may have been reading email (such as Hotmail) raises tual image. Image hashing refers to a hashing algorithm the level of suspicion. One complication associated with that produces the same hash for similar images. While Hotmail, for example, is that Hotmail uses numeric IP present technology does not provide ideal image hashes, addresses instead of symbolic host names. Therefore, there has been some progress in this area [VKJM00]. In when a user clicks on a link in a Hotmail message, the our case, image hashing can be strengthened by asking browser provides a numeric IP address to SpoofGuard e-commerce sites to use images that are especially well as the referring page. In this situation, SpoofGuard uses suited for image hashing. For example, in many cases reverse DNS to find the domain name associated with a we could use optical character recognition (OCR) as the numeric address, allowing us to identify Hotmail as the image hashing algorithm. An added benefit of image referring site. hashing is that storing an image hash rather than the full image reduces plug-in storage requirements. We discuss Image-domain associations The image check de- other aspects of this test in Section 5.3. scribed above (in section 3.2) relies on a database as- sociating images such as corporate logos with domains. Link check The links contained within a page are ex- The initial static database can be assembled using a web amined. The link check fails for a page if at least one- crawler or other tool, or it can be augmented using an in- fourth of the links fail the url check described above. dividual’s browsing history. An early version of Spoof- Guard used a fixed database; the current SpoofGuard Password check Pages that request a password merit implementation uses a hashed image history file. closer scrutiny than pages that do not. If a page re- quests a password (or other sensitive information), we 3.4 Evaluating post data also check whether https is used and, if so, whether the Evaluating post data is a critical part of any client-side certificate check succeeded or failed. defense against web-spoofing attacks, since the point of any defense is to prevent malicious sites from gain- 3.3 Stateful page evaluation ing confidential information from an honest web user. In stateful page evaluation, the browser history file When a user fills in form data, SpoofGuard intercepts and additional history stored by SpoofGuard are used to and checks the html post data, allowing the actual post evaluate the referring page. Since it is important to min- to proceed only if the spoof index is below the user- imize the number of false alarms, SpoofGuard does not specific threshold for posts. If a user confines pop-up issue any warnings for visiting a site that is in the user’s warnings to posts only, then the user will never be inter- history file. The rationale for this is that if the user is rupted when reading web pages, only (possibly) when warned the first time, and decides to proceed, the user is filling in forms. Note, however, that even if no warnings assumed to have sufficient reason to trust the site. are generated on pages leading up to the spoof form, the page checks described above are used, in combination with analysis of the post data, to determine the spoof Domain check If the domain of a page closely resem- index associated with an html post. bles a standard or previously visited domain, the page may be part of a spoof. Although crude, we currently compare domains by Hamming (edit) distance. For ex- Outgoing password check SpoofGuard maintains a ample efrade.com will raise the domain check if database of hdomain, user name, passwordi triples. If etrade.com is in the file of commonly spoofed sites the user reuses a password on a new domain, this or in the user history. Clearly, it is possible to improve trips the password check. To avoid the possibility of our comparison algorithm by studying the way people leaking sensitive information, the stored passwords are are fooled; this is a significant direction for future work. hashed using SHA-1 and the comparison is performed A related issue is that some businesses outsource some on hashed values. of their web operations to contractors with different do- main names. This poses an interesting challenge that Interaction with image check In our spoof index cal- we believe can be addressed. However, outsourced web culation, the image check interacts with the outgoing activity leads to false alarms in the current version of password check non-linearly. For example, if a user en- SpoofGuard. ters her E*Trade user name and password to a site that is

5 not at etrade.com, this raise the spoof index a certain 4.1 Plug-in interface and access to browser amount (determined in part by user-selected weights). data The spoof index is raised multiplicatively higher if the site also contains the E*Trade logo. SpoofGuard consists of a COM component that ex- tends IDeskband, an interface that causes IE to load SpoofGuard as a registered toolbar, and a few other modules that run in response to actions of the toolbar Check of all post data There are several ways that component. SpoofGuard is written in Visual C++, and a web page might request a password. For example, a uses both Windows Template Library (WTL) 7.0 and clever spoof site might use an image of the word “pass- Microsoft Foundation Class Library (MFC). Two Spoof- word” instead of html text to request the user’s pass- Guard window classes implement the CWindowImpl word. To protect against this form of spoof attack, all interface to define the appearance and user interaction outgoing data in an html post can be hashed and checked of the toolbar. The interaction between the main mod- against a database of passwords and other information ules, described below, is shown in Figure 1. deemed sensitive. In this way, we can still detect pass- word leakage, even if the spoof page does not contain • WarnBar: This is a COM component that houses the text “password.” the SpoofGuard toolbar. All site evaluations and post data checks are carried out here. • ReflectionWnd: This CWindowImpl class imple- ments a transparent window that sits on top of the Exception for search engines Since a user may enter toolbar and reflects user messages (e.g. mouse any data into a search engine, SpoofGuard is not suspi- clicks) to UWToolBar. WarnBar requests Reflec- cious of known search engines at known domains. It tionWnd to pop-up a warning message when the is also possible to ignore data posted into a “search” user tries to send sensitive information to a suspi- or “find” field in an arbitrary page. This allows for cious server. shopping sites, for example, that allow a customer to • UWToolBar: This CWindowImpl class defines the search the catalog by keyword or product name. Of appearance of the toolbar. UWToolBar stores the course, good password practice would prevent an intel- user settings (e.g. check index, threshold, etc.) dur- ligent user from using a product name or other English ing runtime. WarnBar requests UWToolBar for word as an important password, rendering this exception these settings to determine the traffic lights color unnecessary. and the warning messages that appear in the Cur- rent Page Status dialog. User settings are stored in 4 SpoofGuard architecture the registry when SpoofGuard closes. • ConfigDlg opens an Options window when the SpoofGuard is an Internet Explorer browser helper user clicks the Options button. UWToolBar up- object, or “plug-in.” A browser helper object is a COM dates the user settings based on the result that Con- component that is loaded when IE starts up; it runs figDlg returns when the window terminates. in the same memory context as the browser. In gen- • DomainDlg opens the Current Page Status window eral, a browser helper object may perform any action when the user clicks on the traffic light icon. It con- on IE windows and modules, including manipulating the tains the warning messages specific to the current browser menu and toolbar, detecting and responding to page. browser events, and creating additional windows. SpoofGuard accesses the Internet Explorer history file Browser data and events When Internet Explorer and uses three additional files stored in the user profile launches, it calls the SetSite method in the IOb- directory. One is a read-only file of host names of email jectWithSite interface to initialize WarnBar. WarnBar sites such as Hotmail and Yahoo!Mail, used in the re- receives a pointer to the web browser object, and passes ferring page check. The other two files are the file of the object to the ReflectionWnd and UWToolBar, al- hashed password history (domain, user name, and pass- lowing SpoofGuard to constantly check the browser word) and the file of hashed image history. SpoofGuard contents during execution. Internet Explorer’s DWeb- can use reverse DNS to find domain names for numeric BrowserEvents2 class exports BeforeNavigate2 and IP addresses, but does not otherwise send or receive any DocumentComplete event handlers, which are both information on the network. The browser history file implemented in WarnBar class. A BeforeNavigate2 can be reset using a browser dialog box and the addi- event occurs before navigation. This gives WarnBar the tional SpoofGuard histories can be reset using a button url that the browser is attempting to navigate to, the out- on the SpoofGuard configuration panel. going post, and a chance to cancel the navigation. The

6 ReflectionWnd ConfigDlg

Sends user messages (e.g. button click) Requests Options window (user settings) to pop up. Requests a pop up warning if WarnBar shows UWToolBar that a site’s total alert value is higher than the Requests Current Page user’s threshold. Status window to pop up. Sends current user settings (e.g. weight values, alert level)

WarnBar DomainDlg

Figure 1. SpoofGuard architecture

7 providers (e.g., Hotmail) provide links to other parts of their service using numeric IP addresses instead of sym- bolic host names. Given a numeric address, SpoofGuard performs a reverse-DNS lookup to reveal the host name. Figure 2. SpoofGuard toolbar Another way to estimate the likelihood that an email link was used is to see whether the referrer field is empty. url check using the history list, the domain name check, The referring page field is empty with the browser is ini- the email referring page check and post data check are tially started, and when the user types in a url. Although carried out after a BeforeNavigate2 event. A Docu- an empty referrer field does not always imply that the mentComplete event occurs when a web site finishes user has clicked on an e-mail link, the field is empty if loading completely. The image check, link check, and the user launches the browser through a link in his or password check are carried out after a DocumentCom- her e-mail software. While it is conservative to treat an plete event. empty referrer field in the same way as a link from Hot- mail, this may give false alarms since the referrer field 4.2 User interface is empty whenever a new Internet Explorer window is opened. The SpoofGuard toolbar is shown in Figure 2. The options button can be used to configure the tool, while the traffic light (green, yellow, or red) provides an in- Different input names for usernames and passwords dication to the user about the current page. Clicking Since different sites have different input field names on the traffic light also pops up additional information for usernames and passwords, twenty username and ten about the current page. When the spoof rating is above password variations are predefined in SpoofGuard, and the user-specified threshold, SpoofGuard will pop-up an they are used to identify sensitive information in the additional warning window that requires user consent to obtained post data structure. These predefined names send user web form input (or other http post data) out to are used by many online bank forms, commercial sites a web site. such as Amazon and eBay, and web-based e-mail sites. The configuration pop-up, shown in Figure 3, lets the SpoofGuard currently does not recognize username and user select a spoof rating threshold, and set independent password combinations from sites that use other input weights and sensitivity levels for the domain name, url, field names. link, password, and image checks. The user may also disable pop-ups, set history cache, enable image hash Frames Frames have historically proven troublesome caching, and enable highlighting of suspicious links. to both web browsers and users. For simplicity, Spoof- SpoofGuard has two methods to convey its analysis to Guard currently treats frames as independent pages, the user. To keep SpoofGuard as unobtrusive as possi- without parsing the frameset to determine its frames. ble, we use a traffic light symbol on the browser bar to For example, if a frameset includes frames located on indicate the degree of spoof by color: red, yellow, and different hosts, SpoofGuard may flag this situation as green. The actual colors displayed are determined by a possible malicious redirect. Related to this is a cus- the user’s threshold settings. Should the user want to tomizable security setting in Internet Explorer, “Navi- read the details of SpoofGuard’s analysis, he or she can gate sub-frames across different domains”, which gives click the traffic light, and more information appears. In the user a choice to allow or disallow this behavior. We extreme situations, SpoofGuard may also halt a post and expect to improve the handling of frames in a future ver- ask the user if she wishes to continue. In combination, sion of SpoofGuard. the traffic light and popup provide an effective means of alerting the user to suspicious web pages, while avoiding POST data vs AutoComplete SpoofGuard compares the annoyance of consistent popup windows. outgoing passwords with its database of triples rather than with values stored in Internet Explorer’s AutoComplete repository. Detecting whether the user has clicked on an e-mail When a user submits a form, SpoofGuard obtains link A typical phisher sends unsolicited e-mails that the post data as a SafeArray structure, which Internet contain links to the spoof site. SpoofGuard therefore Explorer passes to SpoofGuard via an event handler. attempts to determine whether the user was directed to SpoofGuard then scans it for sensitive information, and a site from an e-mail message. One way is to check stores a resulting hash into a file. There are two ad- the referrer field against a list of host names associ- vantages to using post data. First, post data checking ated with web-based e-mail providers. However, some is more secure, because SpoofWatch hashes the pass-

8 Figure 3. SpoofGuard configuration pop-up

9 words, whereas AutoComplete encrypts them with a Explorer’s history and restarted the browser before load- known key that is stored on the user’s computer. Sec- ing each spoof page; this kept the history of previous ond, many Internet Explorer users turn off AutoCom- tests from biasing the analysis of another spoof page on plete either due to frequent pop-up windows that ask for the same spoof server (our test server). Finally, in order the users permission before storing the data, or for pri- to provide SpoofGuard with some information about the vacy. Therefore, SpoofGuard is more secure while ef- honest site, we visited eBay’s web site and navigated to fectively serving a larger user base. the sign-in page before each eBay spoof. At the honest eBay sign-in, we performed a mock sign-in using ‘hello’ Redirects Currently, a page that redirects to another and ‘test’ as the username and password, respectively. page may cause SpoofGuard to flash yellow or red or Although eBay did not accept these as a legitimate user pop-up a small post data warning box, depending on name and password pair, they were recorded by Spoof- user configuration. In the next version of SpoofGuard, Guard, which was all that we needed for the test. we plan to recognize redirects in html pages more effec- The results of our test were: tively and eliminate these spurious warnings. • All fourteen spoof pages have password input fields and SpoofGuard successfully noted this. Spoof- 5 Evaluation Guard also noted that the form submissions were insecure because the pages were retrieved from our We evaluated the effectiveness of our plug-in using web server without using secure http (https). several criteria. First, does the plug-in detect the sample • All fourteen pages include inlined images, such spoofs found in the wild? Second, is the false alarm rate as the eBay logo, that are retrieved directly from sufficiently small? Third, how difficult is it to write a eBay servers. These images were already in Spoof- spoof page that is not detected by our plug-in? Finally, Guard’s image file as a result of the initial naviga- how does our plug-in affect browser performance. We tion to the honest site. In the test, SpoofGuard cor- discuss each of these below. rectly noted that the spoof pages with eBay images matched those images from the honest eBay site. 5.1 Detection of spoof attacks • We performed a mock sign-in on the spoof pages In addition to debugging tests to make sure each to test SpoofGuard’s outgoing password check. For SpoofGuard measurement works properly, we evaluated each page, we used ’hello’ and ’test’, the same pair SpoofGuard’s overall effectiveness by testing it against used on the honest eBay site in the initialization fourteen actual spoof web pages sent to us by the U.S. part of the experiment. SpoofGuard successfully Secret Service. Nine of the fourteen pages are spoofs identified the user name and password from the of eBay’s sign-in page. Two spoof pages purport to honest site and popped up a warning to the user, be “identity and billing verification” pages that request as shown in Figure 4. a large amount of personal information, such as eBay We believe that the SpoofGuard image check and out- username and password, residence information, credit going password check are important strengths, since to- card information, ATM card PIN, bank account rout- gether these checks stop outgoing data and they are not ing number, social security number, mother’s maiden redundant with information that may be found in the name, date of birth, and driver’s license number and is- browser address bar, status bar, or rendered html. While suing state. One spoof site states that because of “regu- the image checker’s hashing algorithm can be improved lar maintenance of our security measures, your account to detect slight modifications to the images, the current has been randomly selected for this maintenance,” and checks successfully catch spoofs observed in the wild. requests a username and password. The last two suggest that the user could win a car if a username and password 5.2 False alarm rate are provided. We tested SpoofGuard on all fourteen spoof pages The false alarm rate depends in part on how frequently using the default settings and recorded all SpoofGuard the user establishes new accounts and how frequently messages for each page. Since SpoofGuard does not an- the user clears the browser history cache. We have used alyze html files stored locally on a user’s computer, we SpoofGuard ourselves over several weeks. With de- set up a web server that hosted the fourteen spoof pages. fault settings, there are occasional spurious yellow lights Since most spoof web sites do not use https, our server while browsing, and sometimes the first use of a legiti- used ordinary insecure http. Each page was retrieved mate site with user name and password input will trigger from our web server by entering the url directly into a false post warning. Many of the unnecessary warnings the address bar in Internet Explorer. In order to force are the result of frame or redirection problems (noted in SpoofGuard to analyze each page, we cleared Internet section 4.3) that we expect to resolve in the next version

10 Figure 4. SpoofGuard detects honest user name and password on spoof site.

11 of SpoofGuard. If the user opens a new account, and in- to performance with a dummy plug-in containing only tentionally uses the same password as another account, the timing checks. To try to account for the fact that net- this will also produce an unwanted warning. However, work latency shows up in the timing numbers, we also second and subsequent visits (without clearing the his- ran a second round of tests with all pages in the cache. tory cache) do not lead to additional false alarms for this Although the measurements do not capture SpoofGuard situation. timing with great accuracy, the numbers we obtained seem to support our belief that there is no noticeable 5.3 Security degradation of user browsing experience. The performance tests were carried out on TrafficMar- The solutions described in Section 3 are certainly not ketplace’s list of 30 most visited sites, using a 1GHz fool-proof. An attacker with a reasonable understanding Pentium III with 128MB of RAM, connected through of web-site construction and a day or two of time can a 10 Mbps Ethernet card. Retrieving pages over the net- circumvent our current tests. For example, here are sim- work, it took an average of 779 milliseconds to navigate ple ways of fooling the SpoofGuard password and image from one page to another without SpoofGuard installed, checks: and 911 milliseconds with SpoofGuard. With pages in • Some of our tests compare user input on a particu- the cache, these numbers dropped to 484 milliseconds lar page to passwords that the user used at previous and 601 milliseconds, respectively. These measure- sites. An attacker could fool these tests by break- ments suggest that the sequence of checks carried out by ing the password input field on the spoof page into SpoofGuard take on the order of 100–250 milliseconds two adjacent fields that would look contiguous to on an older processor. Furthermore, the CPU usage was the user, but would cause our password compari- 30% without SpoofGuard, and 40% with SpoofGuard, son tests to fail. Similarly, javascript on the spoof although the variance in CPU usage was high while the page could encode post data sent from the page so variance in timing numbers was low. Overall, however, as to defeat our post data tests. it seems safe to conclude that SpoofGuard does not im- • Some of our checks compare images (logos) on a pose a significant performance penalty. The non-expert spoof page to images that appear on honest pages. users for whom the plugin is designed are unlikely to An attacker could defeat these tests by slicing an notice the computation overhead; they are particularly image into adjacent vertical slices and presenting unlikely to notice the SpoofGuard overhead if their net- these slices one next to the other. None of the indi- work connections are slow. vidual slices would match images in the plug-in’s database, but to the user the complete image would 6 Server-side assistance look authentic. This would defeat some of our im- age tests. The techniques we have implemented and tested are designed to detect web-spoofing attacks without any co- These limitations notwithstanding, our methods clearly operation from web sites that are spoofed. However, we make it harder for attackers to setup effective spoof sites. could do much more with the help of e-commerce web Given the extremely low level of sophistication we’ve sites. For example, the two methods suggested below seen so far in actual spoof attacks, it is difficult to pre- add simple tags to honest web pages. The additional in- dict how quickly phishers would respond to deployment formation gathered from honest sites can be used detect of SpoofGuard or related methods. In addition, should spoofs more effectively. more sophisticated spoof sites appear, the framework we have adopted in SpoofGuard can be extended with more 6.1 Mark form fields with confidentiality tags sophisticated checks and defensive password manage- ment. The outgoing password check compares outgoing data to stored (and hashed) sensitive data sent previously to 5.4 Performance honest login pages. E-commerce sites can help Spoof- Guard identify sensitive fields by marking them with an As SpoofGuard users, we have not noticed any per- additional html attribute. formance degradation as a result of the browser plug-in. We propose adding a CONFIDENTIALITY attribute to We attempted to confirm this subjective impression by the html element. For a sensitive field (pass- making rudimentary measurements. We inserted tim- word, username, creditcard) the html element would ing checks at the browser BeforeNavigate2 and Docu- look like mentComplete events, measuring CPU usage and nav- We compared performance with SpoofGuard installed where CONFIDENTIALITY is one of username, pass-

12 word, creditcard, SSN and possibly other pre-specified images referenced with the SPOOFGUARD attribute are strings. The confidentiality attribute helps SpoofGuard stored in the database. determine how to process the field. Note that we do not When the browser downloads a login page containing use the NAME attribute to infer confidentiality so as to an image whose hash is in the database it does the fol- give the site complete freedom over its field names. lowing: Confidentiality tags could improve the detection rate • Check to see if the page’s domain is in the list of and reduce the false alarm rate. Specifically, if data not domains associated with the image. If so, let F be currently tracked by SpoofGuard is marked confidential, the frequency for the domain. If not, set F to 0. SpoofGuard will be able to warn the user when this con- • Let M be the maximum frequency in the image’s fidential data is exposed. If confidentiality tags become record. The test return the value p = 1 − F/M. widely used, then SpoofGuard could become less likely to spuriously track information that is not confidential, To see how this works, consider the eBay login page reducing the likelihood of false alarms. The proposed and consider an image marked with SPOOFGUARD on html confidentiality tags may also have other uses be- that page. Now, consider the record in the plug-in’s im- yond spoof identification. For example, a kiosk browser age database corresponding to this image. Most likely, could close a window and flush short-term cookies after the eBay login page will have the highest frequency in a certain idle time if entry to the site involved confiden- the image record. Consequently, the eBay login page tial data. will have p = 0 indicating that it is not likely to be a spoof. Other pages containing this image with a SPOOF- 6.2 Image tagging GUARD attribute (either set up by an attacker or by someone attempting to DoS eBay) will have a much The image check described in Section 3.2 is useful lower frequency and therefore result in a p value close in identifying spoof login pages since these pages need to 1. This technique prevents abuse of this test for denial to reproduce the look-and-feel of the honest site. We of service. already mentioned in Section 3.2 that e-commerce sites can help strengthen this mechanism by choosing images 6.3 Password hashing and site-specific salt that can be hashed robustly. In addition, e-commerce sites can help make this test stateful rather than stateless. Users often use the same password at many different To do so we propose adding a new attribute to the IMG sites. For example, the same password may be used element in an html page. The attribute enables honest for an E*Trade account as for a newspaper site. We sites to identify images on their login page that are not can combat this problem using site-specific password supposed to appear on login pages outside the site. For salt. Password salt, or other improvements of the stan- example, the IMG element pointing to the eBay logo on dard password mechanism, also help with other secu- eBay’s login page would look like: rity problems. In particular, when attackers break into name/password combinations at various financial sites. The SPOOFGUARD attribute indicates that if this image As a result, a web site implementing proper security appears on a non-ebay web page requesting sensitive policies suffers when other sites do not apply recent user input then it is likely the page is a spoof. Poten- patches and store passwords in the clear. tially, the SPOOFGUARD attribute could include a value Passwords at one site can be made independent of (low, high) thus giving the site administrator some con- passwords at other sites by adding a new SALT attribute trol over the score added to the total spoof score for the to the html element. This attribute lets a site page. We note that this attribute should only be used on specify per-server salt; per-user salt is not possible since sensitive html pages such as a login page. it is supplied before the user is identified. With this new Next, we describe how the SpoofGuard plug-in would attribute, password fields would look like: use this attribute. The difficulty is in ensuring that this reorganize the plug-in’s image database. Each record in where the site developers ensure that their salt is unique the database is as follows (one record per image): to their site. For example, one could use the domain- image-hash (d1, f1) (d2, f2) . . . (ds, fs) name as a salt. where di is a domain on which this image was found, When processing a password field the browser first and frequency fi is the number of times the user visited computes Epwd[salt], where E is block cipher, pwd is the page. The plug-in either maintains the frequency the password entered by the user, and salt is the salt value itself (adding one every time the page is loaded) from input html element. The browser transmits the or uses the browser’s history file to compute it. Only resulting value rather than the user’s password. If the

13 salt attribute is not present in the html page, the browser function. Second, this mechanism could potentially be uses 0 as the salt. The main point is that with the block used to launch a denial of service attack against an hon- cipher, it is hard to compute Epwd[X] from Epwd[Y ] est site. for X 6= Y . Consequently, a newspaper break-in will not compromise an E*Trade password. Note that this is Search engines to the rescue. Spoof sites are often similar to challenge-response , except that direct copies of pages on the honest site. Therefore, each site uses a fixed and unique challenge rather than when viewing a sensitive page (a page that requests a a random challenge. This way the site need only store user password), our browser plug-in could do a Google in its database a hash of the submitted password value search on some key phrases or links on the page. If a rather than the plaintext submitted password. page similar to the current page is found at a different One difficulty in deploying site-specific salt is that all domain, the plug-in would increase the page’s spoof- browsers must be simultaneously modified and web sites score. must re-authenticate their users after this mechanism is We consider this method to be speculative since pages deployed. Another problem is that this mechanism it- are often cached at various sites on the web and it would self is susceptible to spoofing. As presented, a spoof be difficult to distinguish a spoof from a cached page. site need only contact the honest site to obtain the site- In addition, if every browser in the world automatically specific salt, then pass the same salt on to the victim. issued a Google query for every password page it en- This will cause the victim’s browser to send the spoof counters, the resulting traffic would likely overwhelm site the exact password needed to gain access to the hon- Google. There is also no business incentive for Google est site. Although we have not done a thorough study, to support such a service. site-specific salt may still be useful when the request comes over https and the certificate check establishes Forensics. Suppose internal tests at E*Trade indicate a reliable association between the salt and the request- that a user’s password has been compromised. E*Trade ing domain. With this limitation, site-specific salt will suspects that this is the result of a spoof E*Trade site. produce distinct passwords for distinct sites, and pre- They wish to quickly determine where the site is. One vent a phisher who sets up an insecure (i.e., non-https) option is to examine the user’s history file since it con- site from obtaining a password associated with a more tains all the sites the user visited recently, including the secure (https) site. spoof site. However, a well-minded user would likely refuse to hand over the browser’s history file due to pri- 7 More Speculative Techniques vacy concerns. To reduce the user’s exposure, Spoof- We describe a few techniques that might be useful in Guard could keep track of sites where the user entered a combating spoof sites. We did not experiment with these password identical to his E*Trade password. Only those since we consider them to be more speculative at the sites would be handed over to E*Trade. moment. We consider this method to be speculative since most spoof sites are active for only a few days. Most likely the process required for obtaining data from the user would Collaborative Methods. Several projects [Bri03] use take more time than that. Nevertheless, the problem of collaborative methods to identify spam email. A similar quickly locating spoof sites is important and deserves mechanism might apply to blocking spoof sites. Con- attention. We may experiment with using web crawlers sider a user who uses our plug-in, but ignores the warn- for this task in the future. ing issued by the plug-in when visiting a site. The user enters his identifying information, submits the data, and 8 Conclusion then realizes that he just entered private information on a spoof site. At that point the user might want to alert Most of the $37 million increase in losses from In- the authorities as well as alert other users to avoid the ternet fraud observed between 2001 to 2002 has been site. By providing a “send alert” button in our browser attributed to web spoofing [Von03]. While web spoof- plug-in the user could notify a central server that the cur- ing (or phishing) may become more sophisticated in the rent page is a spoof. If enough users identify the page future, we propose a set of methods that appear effective as a spoof the server could alert all plug-ins to block for the kind of simple attacks observed by law enforce- the page. This might dramatically reduce the number of ment and affected companies. SpoofGuard uses a com- users who get duped by the spoof site. bination of stateless page evaluation, stateful page evalu- We consider this method to be speculative for two rea- ation, and examination of outgoing post data to compute sons. First, user’s who are duped by spoof sites are also a spoof index. When a user enters a username and pass- likely to be unaware of the “send alert” button and its word on a spoof site that contains some combination of

14 suspicious url, misleading domain name, images from for example, were widely deployed, then a spoof site an honest site, other measures discussed in section 3, authenticating a user would not have any way to imper- and a username and password that have previously been sonate the user on the honest site. In this sense, Spoof- used at an honest site, SpoofGuard will intercept the post Guard helps patch over a weakness in current web prac- and warn the user with a pop-up that foils the attack. We tices that could be solved more effectively by stronger have tested SpoofGuard with actual attacks found in the known technology. However, the history of the Internet wild and found the mechanisms generally unobtrusive suggests that once a convention is widely adopted, it is and effective. While technically savvy Internet profes- very difficult to introduce new standards. sionals probably do not need SpoofGuard themselves, there are many less sophisticated users who may benefit Acknowledgments from this tool. In order to effectively reduce the impact of Internet Thanks to Alissa Cooper, Greg Crabb, Tom Pageler, fraud based on web spoofing, SpoofGuard must be dis- Robert Rodriguez, and Chris Von Holt. tributed and deployed, or the mechanisms tested here must be adopted by browser companies and integrated References into standard browser security mechanisms. While the [Ber03] Tara Siegel Bernard. Citigroup’s initial tests from our research effort are promising, we logo used in identity-theft attempt. expect to continue to refine SpoofGuard and subject SmartMoney.com, August 18, 2003. the components of our method to more rigorous sta- http://www.smartmoney. tistical testing. Especially if some of the server-side com/bn/ON/index.cfm?story= methods described in section 6 are adopted by compa- ON-20030818-000809-1407%. nies subject to web spoofing fraud, such as EarthLink, Citibank, Morgan Stanley’s Discover unit, eBay, Pay- [Bri03] Brightmail inc. http://www. Pal, banks and state lotteries [Ber03], we believe that brightmail.com, 2003. SpoofGuard methods will reduce fraud. In addition to reducing the direct loss figure mentioned above, good [CNN03] ‘phishing’ scams reel in your iden- protection against web spoofing would significantly re- tity. http://www3.cnn.com/ duce customer support costs. 2003/TECH/internet/07/21/ A second consequence of deploying the methods de- phishing.scam/, July 22, 2003. scribed in this paper is that phishers will have to work harder to spoof web users into revealing sensitive infor- [Cox03] Mike Cox. Fraudulent - thieves mation. As discussed in section 5.3 and elsewhere, many intend to steal your personal informa- of our tests can be circumvented by relatively simple tion 6/2003, 2003. Posting from the http: modifications to spoof pages. Like virus detection and Michigan Attorney General, //www.michigan.gov/ag/0, spam filtering, we expect that any serious effort to com- 1607,7-164--70494--,00.html bat web spoofing will lead to more sophisticated spoofs . and the need for more sophisticated defenses. As men- [Din03] Theo Van Dinter. Spamassassin, 2003. tioned throughout the paper, the methods currently im- http://useast.spamassassin. plemented in SpoofGuard can be improved. Individual org/. page tests can be improved, more page test can be added, and the formula for computing the spoof index can be [EY01] S.W. Smith E.Z. Ye, Y. Yuan. Web refined. Furthermore, if e-commerce sites act on their spoofing revisited: Ssl and beyond, 2001. concern about the problem, server-side techniques offer http://www.cs.dartmouth.edu/ significant promise for combating web spoofing. ˜pkilab/demos/spoofing/. From a broader perspective, web spoofing takes ad- vantage of the unauthenticated email and weak web-site [FBDW97] Edward W. Felten, Dirk Balfanz, Drew authentication. As a Tumbleweed Communications web Dean, and Dan S. Wallach. Web spoofing: site regarding phishing [Tum03] points out, one counter- An internet con game. In Proceedings of measure is “the use of digitally signed email to protect 20th National Information Systems Security against phishing hacker attacks and spam email.” While Conference, 1997. this is certainly true, digitally signed email has been [FBI03] FBI web spoofing warning, 2003. technically feasible for many years, yet the adoption rate http://www.fbi.gov/pressrel/ remains small. Strong web site authentication could also pressrel03/spoofing072103. eliminate web spoofing. If challenge-response methods, htm.

15 [HF03] Katie Hafner and Laurie J. Flynn. E-mail swindle uses false report about a swindle. NY Times, June 21, 2003.

[Pax99] Vern Paxson. Bro: a system for detect- ing network intruders in real-time. Com- puter Networks (Amsterdam, Netherlands: 1999), 31(23–24):2435–2463, 1999.

[Sno03] Snort: The open source network intrusion detection system, 2003. http://www. snort.org/.

[Tum03] Tumbleweed Communications. Dig- itally signed email to protect against phishing hacker attacks, 2003. http://www.tumbleweed.com/ en/solutions/phishing.html.

[VKJM00] R. Venkatesan, S.-M. Koon, M. H. Jakubowski, and P. Moulin. Robust im- age hashing. In Proceedings of the Inter- national Conference on Image Processing, 2000.

[Von03] C. T. Von Holt. Resident Agent In Charge, US Secret Service, San Jose, CA. Private communication, 2003.

16