The Extension and Customisation of the Maltego Data-Mining Environment into an Anti-Phishing System

Submitted in partial fulfilment of the requirements of the degree of

Bachelor of Science (Honours)

of Rhodes University

Matthew Marx

Grahamstown, South Africa November 2, 2014 Contents

1 Introduction 1

1.1 Problem Statement and Research Goals ...... 1

1.2 Scope ...... 2

1.3 Document Structure ...... 3

2 Background 4

2.1 History and background ...... 4

2.1.1 Phishing and Pharming ...... 5

2.1.2 The Anatomy of a phishing attack ...... 5

2.2 The cost of a phishing attack ...... 7

2.2.1 Phishing and Data Breaches ...... 8

2.3 Online Identity ...... 9

2.3.1 ICANN ...... 9

2.3.2 WHOIS ...... 10

2.3.3 Certificate Authorities ...... 12

2.3.3.1 The role of Certificate Authorities in Phishing and Anti- Phishing ...... 12

2.3.4 PhishTank ...... 14

1 CONTENTS 2

2.4 Types of phishing attacks ...... 14

2.4.1 Clone Phishing ...... 14

2.4.2 Tabnabbing ...... 16

2.4.3 Spear Phishing ...... 16

2.5 Anti-Phishing methodologies ...... 17

2.5.1 Anti-Phishing Collectives ...... 18

2.5.1.1 Anti-Phishing Work Group ...... 18

2.5.1.2 US-CERT ...... 18

2.5.1.3 PhishTank ...... 19

2.5.2 Website take down ...... 19

2.5.3 Browser Anti-Phishing mechanisms ...... 19

2.5.3.1 Anti-Phishing Heuristics ...... 20

2.5.3.2 Phishing Blacklists ...... 20

2.5.4 Email Filtering and Content Filtering ...... 20

2.6 Abuse Reporting Mechanisms ...... 21

2.6.1 Blacklisting services ...... 21

2.6.2 Placing an abuse report with domain registrar ...... 21

2.7 Summary ...... 22

3 Design 23

3.1 System Goals ...... 23

3.2 Underlying Architecture ...... 24

3.2.1 Maltego ...... 24 CONTENTS 3

3.2.2 Maltego Machines ...... 29

3.2.3 Programming Languages ...... 30

3.2.4 Transforms ...... 31

3.3 Entities ...... 32

3.3.1 Domain ...... 34

3.3.2 Email Address: ...... 34

3.3.3 IPv4 ...... 34

3.3.4 Email Source ...... 34

3.3.5 Abuse Report Email ...... 35

3.3.6 EmailSourceDirectory ...... 35

3.3.7 Potential Phishing URL ...... 35

3.3.8 Suspicious Email Address ...... 35

3.3.9 Phishing Target ...... 36

3.3.10 Confirmed Phishing URL ...... 36

3.3.11 Phishing Kit ...... 36

3.4 Transforms ...... 36

3.4.1 Verify Phishing Link ...... 37

3.4.2 Generating Abuse Report Emails ...... 38

3.4.3 Directory Monitoring ...... 39

3.4.4 Link Extraction and analysis ...... 39

3.4.5 WHOIS ...... 40

3.5 Automating the process ...... 41 CONTENTS 4

4 Case Studies 43

4.1 An attack launched from a compromised server ...... 43

4.1.1 Background ...... 43

4.1.2 Exploration and Fingerprinting ...... 44

4.1.3 Analysis ...... 46

4.2 Correlating relationships between larger data-sets ...... 48

4.2.1 Background ...... 49

4.3 Automated monitoring ...... 52

5 Conclusion 55

5.1 Analysis of Goals ...... 55

5.2 Future Work ...... 56

5.2.1 Introducing additional online services ...... 56

5.2.2 Extension into analysis of attachments ...... 57

5.2.3 Reporting Mechanism ...... 57

5.2.4 Tool Integration ...... 57

References 59

A Appendix 62 List of Figures

2.1 The mechanics of a Phishing attack ...... 6

2.2 A Typical Phishing URL ...... 7

2.3 An example of a certificate verifying the identify of an online service . . . . 13

2.4 A typical phishing email ...... 15

2.5 Tabnabbing ...... 16

2.6 Mechanics of a spear-phishing attack ...... 17

3.1 Creating a new graph ...... 26

3.2 Running a transform ...... 26

3.3 The transform produces a new IPv4 entity ...... 27

3.4 A more complex set of entities and relationships ...... 28

3.5 Block Layout ...... 29

3.7 Circular Layout ...... 29

3.6 Hierarchical Layout ...... 30

3.8 Regular expression used to extract links and URLs ...... 40

3.9 Regular expression used to email addresses ...... 40

4.1 Creating an email source entity ...... 45

5 LIST OF FIGURES 6

4.2 Analysis of the emailSource entity ...... 46

4.3 Exploring the domain involved in the attack ...... 46

4.4 http://www.venisetours.com ...... 47

4.5 The redirected page ...... 47

4.6 The redirected page ...... 48

4.7 Multiple emails represented in Phishtego ...... 49

4.8 Closed Systems ...... 50

4.9 Related Attacks ...... 50

4.10 Related Attacks with Malicious Links reported ...... 51

4.11 Closed Systems ...... 53

4.12 Closed Systems ...... 54

4.13 Automated email retrieval and transforms ...... 54 List of Tables

3.1 Maltego : Minimum Hardware Requirements ...... 25

3.2 A Summary of Phishtego Entities ...... 33

3.3 Verify Phishing URL ...... 37

3.4 Generating abuse report emails ...... 38

3.5 Monitoring a local directory for emails ...... 39

3.6 Link extraction and analysis ...... 39

3.7 WHOIS ...... 40

7 Abstract

Phishing attacks prove to remain one of the most serious threats to data assets. In particular, the ease and lack of cost associated with setting up and running a successful attack mean that there is no substantial barrier to entry into the phishing world. One of the most important means of understanding and combating a phishing attack is to fingerprint the attack by extrapolating information contained in a phishing email. This includes a substantial amount of information that is contained in the emails headers that is often ignored in the viewing of an email. This project looks to provide an extension to the Maltego framework to provide exploration and reaction to a phishing campaign. In doing so it provides abuse reporting mechanisms and integration with both Google’s SafeBrowsing and the Phishtank API. ACM Computing Classification System Classification

Thesis classification under the ACM Computing Classification (2012 version, valid through 2014)

I.4.3.2 [Security and privacy]: Phishing

M.2.6.1 [Social and professional topics]: Computer Crime

General terms: Phishing, Abuse Reporting, Attack Fingerprinting Acknowledgements

This work was undertaken in the Distributed Multimedia CoE at Rhodes University, with financial support from Telkom SA, Tellabs, Genband, Easttel, Bright Ideas 39, THRIP and NRF SA (TP13070820716). The authors acknowledge that opinions, findings and conclusions or recommendations expressed here are those of the author(s) and that none of the above mentioned sponsors accept liability whatsoever in this regard. I would like to thank everyone that has supported me in this long journey. In particular I’d like to thank the following people. Professor Barry Irwin for his outstanding guidance, resources and valuable input during the writing of this thesis. I’d like to thank my parents, Paul and Cathy Marx for their input and support both financially and otherwise and for allowing me the opportunity to attend such a prestigious university. MWRInfoSecurity for their support financially as well as with funding and supporting other independent research throughout the course of this year. Chapter 1

Introduction

The ushering in of the Digital Age has presented the world with previously unimagined inter-connectivity, sharing of information and the ability to process enormous volumes of data very quickly. However, the developments bring with them new challenges. One of the challenges at the forefront of this development is the need to store sensitive data safely. Data assets are as important as physical assets to most companies and protecting these is complex and difficult. One of the most serious threats posed by malicious external actors to a companies assets is the attack on the individuals within the company in order to procure information out of the business or institution that will result in gaining access to data assets. Phishing is an example of this kind of attack.

Financial loss due to phishing campaigns stands to be one of the more worrying con- cerns within information security. This is largely due to two characteristics of a phishing campaign. The first characteristic is that a successful phishing campaign requires very little technical know how to conduct. The underground black market sells hundreds of variations of pre-made phishing kits that can be purchased and require very little effort to setup to a functional state. The second overarching concern is how successful such attacks are. Given the relative ease and low cost involved in setting up a phishing campaign the success rate at which credentials are gathered is economically viable for an attacker.

1.1 Problem Statement and Research Goals

This situation places the people responsible for defending their data assets at a disad- vantage. The ability to better track, correlate and gather information in an automated

1 1.2. SCOPE 2 fashion around potential phishing campaigns would no doubt stand the person responsible for this in better stead from the point of view of being able to make informed decisions around the attack, including counter measures and reporting mechanisms available.

As companies and organisations expand, there is an increasingly likely chance that they will hold some form of data asset. The asset itself could potentially be one of many kinds of data ranging from intellectual property to credit card information. As the stockpile of data assets grows, the company becomes increasingly attractive to attackers. The research will look primarily into developing a piece of that can be used to analyse and explore phishing campaigns with the hope of providing useful information that will eventually inform decisions that are made regarding counter measures around phishing campaigns. The goal is not to produce a one size fits all solution for monitoring and tracking a phishing campaign but rather provides what in general would prove to be useful in gathering intelligence around an attack, with the hope of informing counter measures. Visualisation plays an important role in providing useful and manageable intelligence gathered from available data in terms of creating something that is useful and understandable when considering the sheer volume of data that can be derived from an attack. Bearing this in mind, the project is set out to achieve specifically three primary goals:

• Create a system that models phishing attacks that can be deployed locally on a machine

• Produce meaningful information from the large volumes of raw data that can be gathered from a phishing campaign

• Provide a means of facilitating decision making around reacting to phishing cam- paigns and automating this response where possible

1.2 Scope

Outlining the scope of the project proves is an important part of the project as the scope has potential to grow quickly and often needlessly given the volume of information that can be derived from an email. This is largely due to the sheer number of different technologies involved in performing even a relatively simple information gathering procedure on some information in an email. As such, the scope is limited by the functionality that the project can realistically look to deliver. There is ultimately little intelligence built into 1.3. DOCUMENT STRUCTURE 3 the application that looks to make macro decisions about what the best ways are to react to a given attack that take the whole of the situation into account. Instead, the idea is to provide the user with as much useful information as possible with the aim of leaving decisions in the hands of the user who is better informed to act intelligently on an attack.

1.3 Document Structure

This thesis explores the development of a system and as such is structured in the following manner:

• Chapter 2 serves as an introduction to the relevant areas of research and technologies that are addressed by the project.

• Chapter 3 explains the design and development of the platform. It contains an outline of all of the necessary dependencies and software required for the project.

• Chapter 4 contains a number of case studies that look to present the usefulness and validity of the system in tracking and describing a phishing attack.

• Chapter 5 contains a conclusion and ideas for future work and summarises results and the system.

It is important to note that this paper makes significant use of technical reports and whitepapers. This is largely because annual statistics and global reports are often pub- lished as whitepapers and technical reports by companies. Additionally, many of the interactions between systems and existing API’s are done at a non-academic level or at least are not documented widely at an academic level which means that technical reports are often among the latest and best documented approaches to anti-phishing techniques. Additionally, some of the more interesting counter measures proposed in academic papers that are a year or older are already being circumvented by current phishing techniques such as bypassing filters proposed outlined by fet (2007) including content as attachments for example and so often recently published technical reports are in some senses more valuable. Chapter 2

Background

“Phishing is a form of deception in which an attacker attempts to fraudulently acquire sensitive information from a victim by impersonating a trustworthy source” (Tom Jagatic, 2007). This definition though broad, serves as a good starting point into the exploration of phishing attacks. Phishing as a practice is by no means a new attack vector employed by cyber attackers but it has certainly developed in new and increasingly complex ways. Of particular concern is that there is very little in terms of expertise, technology or manpower required in order to carry out a successful phishing attack. This means that very little outside of a list of email addresses and freely available software is needed to carry out a fully fledged and successful phishing attack.

2.1 History and background

Attackers have for a long while attempted to elicit information out of users in the form of their passwords or usernames for as long as we have used them as a form of authorisation mechanism. Initially, this was largely conducted through social engineering. In the 1990s with the explosion of interconnected networking and the internet, there was a definite move by attackers away from effort intensive social engineering toward and toward an automated system that attacked the mass consumer market (Watson et al., 2005). The combination of social engineering coupled with these technological advances has created what we refer to today as phishing.

4 2.1. HISTORY AND BACKGROUND 5

2.1.1 Phishing and Pharming

Phishing and Pharming are terms that are often found together in literature on the matter. For the sake of clarity it might help the reader to define the distinction between the two practices before continuing.

Phishing Is the merger between social engineering attacks and technological advances and manifests itself as an attempt to draw information out of an unsuspecting user which are typically usernames, passwords or other private information that can be used to perform fraudulent actions under the guise of the user. They are typically carried out through bulk emailing.

Pharming Pharming attacks ultimately look to achieve the same end result as phishing attacks but are typically more complex and technical in nature than phishing. There are several possible means through which attackers pharm a victim but the majority rely on manipulating the mechanism a computer uses to resolve a domain name to an IP address(DNS) and substitute a fraudulent IP with that of the IP actually associated with the domain name. Delving into the full complexities of pharming are beyond the scope of this discussion but it will be sufficient to have the current level of understanding in order to understand the rest of the paper (Karlof et al., 2007).

2.1.2 The Anatomy of a phishing attack

The attack typically begins with a large number of spam messages sent over various mediums to targets which include content that aims to entice a user into following a link contained in the message to a website that the attacker is in control of. Mediums of communication that attackers use include phishing attacks via SMS and telephonic attacks which are also referred to as SMShing and vishing. The message that is distributed is usually crafted to resemble closely, if not identically an authentic message the authority they are impersonating is likely to send. Typically, the message will contain something that requires that user’s immediate attention such as an “impending account suspension, a payment for a marketing survey or a report of a transaction that the user will know to be fake and therefore want to be cancelled” (Moore & Clayton, 2007b).

In a successful attack the user then connects to the URL supplied in the message is directed to the website under the attackers control as is indicated by stage 1 in figure 2.1. 2.1. HISTORY AND BACKGROUND 6

Figure 2.1: The mechanics of a Phishing attack

Notably, at this stage in the process, browsers will typically employ a number of blacklist consultations and apply heuristic checks against the URL the user has requested to open against large public databases that are often crowd sourced. Next, the user will typically be met with a website that looks identical to the site the attacker looks to impersonate. This leads the user to suspect nothing to be out of order and to proceed with entering personal information into the website as is demonstrated in stage 2 and 3 of the attack.

The credentials are often stored locally on the website itself to collect at a later stage or sometimes mailed to an email address owned by the attacker. The website originally used in an attack is on occasion hosted on a free webspace where anybody can register an account and upload data. Other times, the attacker uses a hijacked machine that he acquired previously though a security vulnerability. Stage four and five involve the attacker using the newly acquired credentials to access the users account and steal money.

The URL of a page of a phishing page is typically constructed to look similar to the domain name of the body being impersonated in the attack. An example of such a URL often closely resembles http://www.impersonatedname.freehostprovider.com/ passwordreset which at first glance appears to the average user to be close enough to the URL the user might expect to see if it came from a legitimate source and so is successfully ‘phished’ (Moore & Clayton, 2007b). There are other interesting mechanisms that attackers are currently using with the aim of bypassing traditional filters and to deceive the user which are beyond the scope of this discussion at present but which can be found in research conducted by Garera et al. (2007).In this example, the name of the impersonated entity would replace the impersonatedname portion of the URL and the name of the free webhosting provider would occupy the freehostprovider portion of the URL. Often the user finds the URL convincing purely based upon a lack of understanding 2.2. THE COST OF A PHISHING ATTACK 7

Figure 2.2: A Typical Phishing URL of how URLs are actually constructed. An example of this taken from an actual phishing attack is shown in figure 2.2 in which to the unsuspecting user, far from looking suspicious the URL looks at first glance to be completely legitimate.

2.2 The cost of a phishing attack

In 2013 alone, there were nearly 450 000 phishing attacks worldwide resulting in an estimated USD $5.9 billion in damages. Phishing attacks have and still remain a seemingly ever present and ominous threat. A separate study conducted in 2013 that observed 72,758 unique recorded phishing attacks found that the average uptime of each attack was an average of 44 hours and 39 minutes (Rasmussen et al., 2013). One of the reasons for the continued and intensified barrage of phishing attacks lies in the ‘commoditized marketplace’ which fuels vendors to maintain and provide competitive and accurate prices and services (RSA, 2014).

The Ponemon Institute releases an annual report which studies data breaches across 9 dif- ferent countries. The countries considered in the study are the United Kingdom, United States, Germany, France, Australia, India, Italy, Japan and Brazil. The study “examines the costs incurred by 277 companies in 16 industry sectors after those companies experi- enced the loss or theft of protected personal data” (Ponemon-Institute, 2013). There were several aspects to what was involved in the total costing of the attack. These included “outlays for detection, escalation, notification, and after-the-fact (ex-post) response.”

There were a number of interesting distinctions that were made between countries when it came to the data breaches. On average, Australian and US companies had the largest number of exposed records averaging 34 249 and 28 765 records exposed per breach respectively while Japanese and Italian companies had the smallest number of breached records with the average record breach 18 285 and 18 237 exposed records respectively. There is also a negative change in the volume of customers that use a service following a data breach, which was especially severe in Australia and France. There were a number 2.2. THE COST OF A PHISHING ATTACK 8 of factors that decrease the cost of a data breach including having a strong security scheme and policy in place, having an incident response scheme and appointing a Chief Information Security Officer (Ponemon-Institute, 2013).

2.2.1 Phishing and Data Breaches

Data breaches encompass a broader sphere than just phishing attacks but it is important to note that phishing attacks do play a large role in data breaches FACTS (2006). The Anti-Phishing Working Group published a preliminary report in 2008 that looked at the cost of a phishing attack in particular to an organisation. The study found that the “duration of the phishing attack is a key factor” in determining the cost of the attack but that “most costs are incurred during the first 24 hours of the attack” (Cyveillance, 2008). The study splits the cost of a phishing attack into two kinds of costs.

Hard Costs

These are financial costs that can be directly measured in terms of time, money, manpower and effort. The study outlines the following as the central hard costs involved with a phishing attack:

1. Fraudulent charges associated with the compromised payment mechanism (e.g. credit card).

2. Cash withdrawals from compromised accounts.

3. Time spent by employees dealing with the fraudulent transaction.

4. Customer service and support calls.

Soft Costs

Soft costs include the kind of intangible cost that a compromise has on an institution which are typically much harder to quantify and measure. These include:

1. The loss of customer trust in online applications. 2.3. ONLINE IDENTITY 9

2. A decline in customer satisfaction.

3. Reputation damage.

Phishing attacks form a significant part of the criminal attacks that lead to data breaches and as such, are of particular concern to organisations that hold any form of financial or sensitive personal data Wu et al. (2006).

2.3 Online Identity

Understanding phishing requires a broad understanding of a large number of the mecha- nisms and protocols that make up the internet since a phishing attack is often comprised of several stages each of which exist in different but related spheres of the internet. It is important to understand how domain registration is handled in full so as to better understand the challenges faced in counteracting phishing attacks.

2.3.1 ICANN

The Internet Corporation for Assigned Names and Numbers (ICANN) is a private, non- profit organisation that performs a variety of important jobs involved in ‘maintaining’ the internet. Broadly speaking, ICANN performs three main functions1:

1. The coordination of the assignment of technical protocol parameters.

2. The administration of certain responsibilities associated with internet DNS root zone management.

3. The allocation of internet numbering resources.

It works to coordinate the allocation of Internet Protocol address space, in allocating both IPv4 and IPv6 address space. Originally ICANN was solely responsible for the distribution of address space and domain name registration however in ICANN now allows ‘resellers’ to operate as sellers of domain names on the condition that the reseller has signed the 2009 Registrar Accreditation Agreement2 which looks to provide additional levels of protection

1https://www.icann.org/en/about/welcome 2http://www.icann.org/registrar-reports/accredited-list.html 2.3. ONLINE IDENTITY 10 for registrants and requires a greater level of accountability for registrars. In view of understanding phishing attacks, it is most useful to know the requirements placed upon the registrar. As of May 2009, ICANN requires that registrars provide the following information3:

1. The name of the Registered Name being registered;

2. The IP addresses of the primary nameserver and secondary nameserver(s) for the Registered Name;

3. The corresponding names of those nameservers;

4. Unless automatically generated by the registry system, the identity of the Registrar;

5. Unless automatically generated by the registry system, the expiration date of the registration;

6. Any other data the Registry Operator requires be submitted to it.

Understanding the role that ICANN plays in the broader scheme of the internet is an important part of piecing together some of the mitigations that are employed to counter phishing attacks.

2.3.2 WHOIS

“WHOIS is a TCP-based transaction-oriented query/response protocol that is widely used to provide information services to Internet users” (iet, 2004). ICANN is committed to enforcing its current WHOIS policy which looks to “maintain timely, unrestricted and public access to accurate WHOIS information including registrant, technical, billing and administrative contact information”4. The WHOIS information plays an important role in facilitating abuse reporting mechanisms with regard to phishing allowing fast and efficient website take-downs. Figure 3 is an example of the WHOIS information obtained from the Phishtank.org domain:

A number of important fields are provided in this WHOIS listing. In the event that this site was compromised and was being used to launch a phishing attack, it provides the

3http://www.icann.org/en/resources/registrars/raa/ra-agreement-21may09-en.htm 4http://whois.icann.org/en/history-whois 11

Listing 1 The whois information available for the Phishtank.org domain Domain Name:PHISHTAN.K.ORG Domain ID: D128067610-LROR Creation Date: 2006-08-30T23:19:41Z Updated Date: 2013-10-02T00:20:25Z Registry Expiry Date: 2014-08-30T23:19:41Z Sponsoring Registrar:PDR Ltd. d/b/a PublicDomainRegistry.com (R27-LROR) Sponsoring Registrar IANA ID: 303 Domain Status: ok Registrant ID:DI 2954579 Registrant Name:OpenDNS Hostmaster Registrant Organization:OpenDNS Registrant Street: 410 Townsend st. Registrant City:San Francisco Registrant State/Province:California Registrant Postal Code:94105 Registrant Country:US Registrant Phone:+001.4153443118 Registrant Email:[email protected] Admin ID:DI 2954579 Admin Name:OpenDNS Hostmaster Admin Organization:OpenDNS Admin Street: 410 Townsend st. Admin City:San Francisco Admin State/Province:California Admin Postal Code:94105 Admin Country:US Admin Phone:+001.4153443118 Admin Email:[email protected] Tech ID:DI 2954579 Tech Name:OpenDNS Hostmaster Tech Organization:OpenDNS Tech Street: 410 Townsend st. Tech City:San Francisco Tech State/Province:California Tech Postal Code:94105 Tech Country:US Tech Phone:+001.4153443118 Tech Email:[email protected] Name Server:AUTH1.OPENDNS.COM Name Server:AUTH2.OPENDNS.COM Name Server:AUTH3.OPENDNS.COM DNSSEC:Unsigned 2.3. ONLINE IDENTITY 12 relevant contact information about the registrant to contact and resolve the matter in a timely fashion. There is both an ‘Admin Phone’ field and an ‘Admin Email’ field both of which provide means of contacting the owner of the domain.

2.3.3 Certificate Authorities

Certificate authorities play an important role in providing identity assurance between both clients and servers on the internet. Certificate authorities play the role of “binding a public key to a particular entity” (Kurose & Ross, 2013). A Certificate authority (CA) serves two primary roles.

1. A CA verifies that an entity is who they say they are. There is no standardised procedure as to how this is to be achieved and so a large degree of unchecked trust must be placed in the authority. As such, the CA is only as good as the verification techniques that it employs (Kurose & Ross, 2013).

2. Once the CA has undertaken its verification procedure, it generates a certificate that binds the public key of the entity the identification information of the entity. The certificate is then signed by the CA (Kurose & Ross, 2013).

Certificate authorities play an important role in assuring users of the identity of the entity that they are interacting with. It is of particular importance to the Secure Socket Layer and its successor Irish et al. (2001)- the Transport Layer Security - protocol implemen- tations which operate at the Transport layer of the IP stack. Notably, with regard to phishing, CA’s play an integral role in the Hypertext Transfer Protocol Secure (HTTPS) implementation which is an combination of TLS/SSL and the Hypertext Transfer Proto- col . It is an attempt to provide credibility to a website. To lend weight to claim that the website makes about its identity, more specifically, that it is who it claims to be.

2.3.3.1 The role of Certificate Authorities in Phishing and Anti-Phishing

Recent attacks on Certificate Authorities have resulted in breaches that allow an attacker to generate and obtain fraudulent certificates (Turner et al., 2012). There are four primary methods of compromising the integrity of a Certificate based system. 2.3. ONLINE IDENTITY 13

Figure 2.3: An example of a certificate verifying the identify of an online service

1. Impersonation: This entails a person persuading the CA that he or she is someone else and being issued a certificate with the impersonated person or system’s name in it.

2. Registration Authority: The registration authority (RA) is an entity that exists between the end user and CA and reviews and approves all certificate requests. An attack on the RA would entail the attacker being able to authorise the issuing of new fraudulent certificates.

3. CA System Compromise: If the attacker is able to gain access to the CA systems then the attacker can issue fraudulent certificates.

4. CA Signing Key Compromise: In this scenario, the attacker gets access to a copy of the CA signing key and is able sign fraudulent certificates.

In each of these attack scenarios, the attack has the ability to successfully gain the unde- served trust of the end user which undermines one of the primary roles of a CA which is to reliably assure an end user of the identity of another end point on the internet. 2.4. TYPES OF PHISHING ATTACKS 14

2.3.4 PhishTank

“PhishTank is a free community site where anyone can submit, verify, track and share phishing data”5 additionally, it is “free to everyone, both the website and the data”6. It is run by OpenDNS. Phishtank provides the internet community a means and a way of sharing data pertaining to phishing attacks, both current and historical. This is an incredibly valuable resource in tracking and monitoring phishing activity between groups and targets. It maintains the URL to the reported phishing website and the status of the URL by indicating whether the URL is still available online. They provide a free API which allows developers to develop tools and software that can interface with PhishTank’s data. This allows potentially, for a worldwide real-time collaboration of the tracking and monitoring of phishing attacks on a global scale.

2.4 Types of phishing attacks

Security researchers have identified and classified several variations of phishing attacks. The distinction between the variations of phishing attacks comes not from the overall objective of the attack, but rather from the way in which the attack is conducted. The following discussion looks to dissect three of the most commonly seen attacks. It is important to bear in mind that the period of the attack considered is the most critical phase of the phishing campaign - the deception of the user.

2.4.1 Clone Phishing

Clone phishing is the most commonly seen phishing attack needing both the least skill and technical know how to execute (Kirda & Kruegel, 2006). This type of phishing, involves the attacker creating a cloned email from a legitimate email that was historically or is presently used by the authority he is trying to imitate. The cloned email looks for all intents and purposes identical to the original to the user most often bearing the images, layout and font used in the original email. Additionally, the attacker replaces the sent from field in the email with the one that the institute or body he is impersonating (Shi & Saleem, 2012). An example of this can be seen in figure 5.

5http://www.Phishtank.com/faq.php#whatisphishtank 6http://www.Phishtank.com/faq.php#doesphishtankcostany 15

Figure 2.4: A typical phishing email 2.4. TYPES OF PHISHING ATTACKS 16

Figure 2.5: Tabnabbing

2.4.2 Tabnabbing

Tabnabbing is a relatively new and creative twist on the tried and tested phishing attack and was disclosed by Aza Raskin who serves as the Creative Lead of Firefox (Suri et al., 2012). The idea behind tabnabbing is to take advantage of two of the fundamental features of web-browsing. First that the typical user today has multiple ‘tabs’ open at a time. A user might simultaneously have a tab with their online shopping, online email client and social media websites open. This allows the user to quickly navigate and switch between the sites the user frequents in a manner that is convenient and fast. Second is that with the number of tabs that the user has open it also means that the user is most often unaware of which of his accounts are open on which tab and indeed which accounts the user has signed into at present. As such, an attacker redirects a user to a completely legitimate site. This site need not replicate an institution but instead waits for page to lose focus. When the page has lost focus, the page is replaced dynamically with a phishing page. This poses a serious threat to the user, who often without much conscious effort will proceed to login to the page handing over the credentials to his account to the attacker.

2.4.3 Spear Phishing

Spear Phishing is “highly targeted phishing aimed at specific individuals or groups within an organisation” (Trend Micro, 2012). These phishing attacks differ from the traditional 2.5. ANTI-PHISHING METHODOLOGIES 17

Figure 2.6: Mechanics of a spear-phishing attack phishing attack in that they are most often more personal in that they address their targets by their name or position, rank or job role rather than using a generic title such as ‘Sir’ or ‘Madam’. These are often used to get more significant targets such as high ranking management to open phishing emails. Spear phishing “significantly raises the chances that targets will read a message that will allow attackers to compromise their networks” (Trend Micro, 2012). In most cases, a spear-phishing email will contain an attachment of some description often being file-types that are likely to be used in the business or organisation being targeted. Examples of the files include PDF and Microsoft Office documents. Spear phishing attacks usually include a period of reconnaissance in which the attacker seeks to find as much information as possible on the target publicly available before tailoring a phishing email to be as enticing as possible to the recipient (Trend Micro, 2012).

2.5 Anti-Phishing methodologies

There has been an concerted effort to fight back following the rapid growth and prominence of phishing attacks. Techniques have been developed to counter phishing attacks at different layers of the internet. Some have focused on preventing phishing emails from ever reaching the user, while some have focused on correlating ranges of IP addresses to 2.5. ANTI-PHISHING METHODOLOGIES 18 phishing syndicates and domains and adding these to globally accessible and maintained blacklists. Encouragingly, there are a number of collective bodies that have sprung up in response to the surge in phishing attacks. Such an effort will surely play a key role in stemming the damage caused by phishing attacks.

2.5.1 Anti-Phishing Collectives

There are a number of groups that have formed in recent times to combat phishing attacks typically using some form of crowd sourced initiative. The following discussion looks at discussing some of the most prominent collectives at the time of writing.

2.5.1.1 Anti-Phishing Work Group

The APWG is the worldwide coalition unifying the global response to cybercrime across industry, government and law-enforcement sectors7. They work to provide the global community with resources and a knowledge base from which to combat phishing attacks and organise a number of public awareness initiatives that look to educate and inform the public of their role in combating phishing. Additionally, APWG’s projects have created new institutions such as the eCrime Researchers Summit8 which publishes peer reviewed anti-phishing articles in IEEE.

2.5.1.2 US-CERT

The United States Computer Emergency Readiness Team leads efforts to improve cy- bersecurity posture, coordinate cyber information sharing and proactively manage cyber risks9. It provides a means of reporting phishing attacks by either submitting the phishing email or at the very least, the malicious URL in the email associated with the attackers machine. This information is collected and website and email messages analysed and distributed so as to assist in a global anti-phishing effort.

7http://apwg.org/about-APWG/ 8http://ecrimeresearch.org/events/eCrime2013/ 9http://www.us-cert.gov/about-us 2.5. ANTI-PHISHING METHODOLOGIES 19

2.5.1.3 PhishTank

PhishTank is a globally available, crowd sourced project that looks to track in real time, phishing attacks. It provides the ability to report phishing domains through URL’s which are then verified either positively or negatively by peers using the PhishTank website. The current up-to-date database of current phishing campaigns is freely available and can be integrated into tools and products free of charge.

2.5.2 Website take down

Website take down is the process of removing a website from the publicly accessible inter- net. In South Africa, the Internet Service Providers Association (ISPA) is a South African Internet industry body not for gain10. The ISPA is formally recognised as an Industry Representative Body. There are also commercial entities that perform this function.

A study conducted by Moore & Clayton in 2007 looked to assess the effectiveness of website take-down in combating phishing attacks. After studying a collection of phishing attacks that were in part conducted by a single group some important statistics were gathered. The mean lifetime of a phishing website was found to be 61.69 hours. Inter- estingly, only 28% of the websites involved lasted more than 2 days but the longest was available for over 17 weeks (Moore & Clayton, 2007b).

2.5.3 Browser Anti-Phishing mechanisms

“Internet service providers, mail providers, browser vendors, registrars and law enforce- ment” all have a significant role to play in mitigating the damage incurred through phish- ing attacks however, web browser vendors play a “key role” due to the “strategic posi- tion of the browser and the concentration of the browser market” (Sheng et al., 2009). Browsers have the potential to act as a final buffer between the user and the phishing site. There is a tangible and direct interaction between the user and the browser and thus the browser has potentially the best chance not only of informing the user of risks he or she are taking when navigating to a certain website but also of reporting and preventing additional attacks (Sheng et al., 2009). There have been two primary methods that have been employed in attempting to integrate Anti-Phishing mechanisms into browsers.

10http://ispa.org.za/about-ispa/ 2.5. ANTI-PHISHING METHODOLOGIES 20

2.5.3.1 Anti-Phishing Heuristics

There are a number of heuristics that modern browsers have built into them in order to identify phishing attempts. Machine learning algorithms play an important part in draw- ing up identification mechanisms and rule-sets (Sheng et al., 2009). One of the advantages of this method is that it allows the identification of phishing websites immediately without having to wait for a public blacklist to be updated. The danger with relying solely around heuristics for phishing detection is that a phishing attack may be designed to bypass a given heuristic rule-set. Additionally, heuristics may produce false positives.

2.5.3.2 Phishing Blacklists

Blacklisting has become one of “the predominant spam filtering techniques” (Sheng et al., 2009). There are a number of publicly available blacklists. Blacklists are used in part by browsers in blocking domains and IP addresses of known phish attempts however, this is not always a viable means of blocking attacks since phishing attacks are often launched through compromised servers and to block an entire domain on the grounds of a single phish on the domain is not always possible. Another issue with this technique is that even if a phishing campaign is identified early, there is a lag period between the period in which the phish has been reported and the blacklist preventing users from accessing the content.

2.5.4 Email Filtering and Content Filtering

Another means of combating phishing lies in between the user and the attacks. The idea is that by blocking the malicious email directed at the user, the user never has the chance to get phished. The suggestion is that for some users, by the time that a user has received a phishing email it is already too late (Sheng et al., 2009). Many email providers such as Google have integrated phishing “detection, prevention and notification” into their email services (Goodman et al., 2009). These use a combination of machine learning and heuristics to pinpoint and remove from the users mailbox phishing or spam emails.

With the development and growing efficiency of such mechanisms, so the complexity and creativity of the spammers have grown. Their response has been to change the content of messages so as not to fit too neatly a ‘traditional’ phishing email template, increasing message volume, new delivery mechanisms and attacking the anti-spam groups themselves 2.6. ABUSE REPORTING MECHANISMS 21

(Wittel & Wu, 2002). Essentially, a content based spam filter “distills” a document “into a set of features such as words, phrases, meta-data et cetera” which is then represented as a vector. From this point, “the classification algorithm uses the feature vector as a basis upon which the document is judged” (Wittel & Wu, 2002). The algorithm uses a rule set which can either be crafted or automatically generated. Machine learning algorithms are primarily driven by statistics derived from the feature vectors. Bayesian classification is one of the most widely used methods that “attempts to calculate the probability that a message is spam based upon previous feature frequencies in spam and legitimate email” (Wittel & Wu, 2002).

2.6 Abuse Reporting Mechanisms

Once a phishing attack is identified and traced back to a domain there are a number of steps that a body looking to stop the attack can take.

2.6.1 Blacklisting services

There are a number of blacklisting services available. These include services provided by Symantec11, Google’s Safebrowsing12 and PhishTank13 as some examples of such services. There is typically some form of verification that a submitted phish must undergo in order to be confirmed and blacklisted. This is because the ramifications for a domain that is incorrectly blacklisted as a phishing site are severe and could take a substantial period of time to be completely reversed.

PhishTank for example, uses a form of crowd sourced verification that involves members of the PhishTank community either agreeing that the website submitted has been correctly identified as a phishing website or disagreeing with it.

2.6.2 Placing an abuse report with domain registrar

WHOIS information kept by the registrar includes an abuse contact email, telephone number and address. This provides a means of contacting the owner of the domain in

11https://submit.symantec.com/antifraud/phish.cgi 12http://www.Google.com/safebrowsing/report phish/?rd=1 13http://www.Phishtank.com/ 2.7. SUMMARY 22 the case of needing to take down a webpage that is currently involved in a phishing attack. There are however, a number of cases in which this is not a useful route to take in combating phishing. Take for example the scenario in which an attacker has registered a domain with a registrar. It is unlikely that the attacker has registered with a company that enforces strict or stringent policies in terms of verifying information provided to them by a registrant. In this way, it is quite likely that an attacker registers the domain with false or fraudulent information in which case attempting to file a complaint against them is completely ineffectual. This mechanism is useful however in the case of an attack being launched through a compromised, legitimate website. Examining the mechanics of a phishing attack show us that often, phishing campaigns are launched from compromised systems in which the legitimate website owners are completely unaware of its being compromised. In this case, the owners of the domain can take swift and rapid steps in an attempt to locate and halt the hosting of the phishing website on their infrastructure.

One of the problems with this approach is that there is that it takes a period of time for each of the notifying and compromised parties involved to respond and act on this infor- mation. In this time period there are a large number of users that are still compromised (Moore & Clayton, 2007b).

2.7 Summary

Phishing attacks remain a significant threat to internet users. It preys upon the sus- ceptibility of the end user to divulge sensitive to a seemingly trustworthy source. This makes it exceptionally difficult to combat at a user level because to attempt to create an educated user base across the entirety of the internet seems a difficult, if not impossible task. A large number of the anti-phishing techniques currently employed experience a lag period between correctly identifying and blocked phishing campaigns. However there have been promising strides taken forward in recent developments with email and content filters as well as heuristics surrounding identification of phishing attacks. The underlying technologies play an important role in understanding the overall structure of the system and the discussion in following chapters will rely heavily upon the information discussed in this chapter. Chapter 3

Design

This chapter looks to outline some of the fundamental design decisions made in the creation of the Phishtego framework. It will look to again address the goals of the system before looking at some of the underlying architecture of the system. In doing so, it will address the mechanics of the Maltego system. The sections following this will consider the transforms and entities associated with Phishtego before looking at the process of automating the system.

3.1 System Goals

The overall goal of the project is to provide the user with a software solution that allows for the modelling, correlation and exploration of a phishing attack which can be broken down into more specific goals as were outlined in Chapter one.

There were several of design decisions that needed to be made in order to facilitate the original aims of the project in order to meet the goals of the project as outlined in Chapter one. There are a large number of technologies and concepts that are involved with something as simple as sending an email. With this in mind, attempting to track and model even a part of a phishing attack becomes a complex task both to implement but also to engage with usefully as a user. There is a host of data that we can gather from an attack, but the challenge is really to streamline this data and extract meaningful information from a users point of view. A second challenge lies in the fact that many of these technologies overlap. An email is only as good as the transport layer that transports it and the mail servers that forward and receive it. In turn the mail servers are likely

23 3.2. UNDERLYING ARCHITECTURE 24 addressed by a host-name which must be first resolved to its IP address through the DNS system which in turn opens the door to a whole new collection of technologies and concepts.

This is a good example of how complex pivoting on provided information can be both in terms of the connecting of various underlying technologies as well as the sheer volume of data that is generated when doing so. In this way, one of the underlying philosophies of the development of Phishtego is to hide as much complexity as possible from the user of the application while still performing complex and useful back-end transformations on data.

3.2 Underlying Architecture

Phishtego consists of a combination of several technologies and programming languages. The significant design decisions made around the system implementation are documented in this chapter.

3.2.1 Maltego

Maltego is a powerful graphing software solution that places an emphasis on relationships between nodes in the graph.

“Maltego uses a client/server architecture for the purposes of data collection to determine the relationships and real world links between pieces of data especially Internet infras- tructures”1. In this way, Maltego proved to be a natural choice in choosing an existing platform in which to integrate the phishing monitoring platform. Maltego generates a node graph in which nodes called entities are plotted and relationships between nodes are represented with directional arrows. In this way, both obvious but more importantly previously unrecognised relationships can be realised between entities on a graph.

The Maltego application comes in two forms:

1. Commercial edition : This can be used for commercial uses. It has no restrictions on the volume of results that can be returned by applying transforms on entities and includes frequent updates and support.

1http://www.paterva.com/malv3/303/M3GuideGUI.PDF 3.2. UNDERLYING ARCHITECTURE 25

2. Community edition : The community edition is essentially at its core the same application and functionality but has imposed some limitations. There are less frequent updates, requires registration and an API key to use and significantly is limited to only returning 12 entities for any given transform.

Since the transforms the Phishtego system exist completely independently of the Maltego edition, the decision was made that the Community edition would be sufficient as the subsequently developed transforms would run equally as well on both systems. Addition- ally, there is no restriction on the transforms that are a part of Phishtego and they may be equally integrated into either edition in the future.

Maltego is a Java based application which affords it portability across a large number of operating systems. This was important in developing the system as it allows a wider spread of use across the IT world in terms of not lending preference to one over another. The Maltego platform is also relatively undemanding with regard to its hardware requirements.

Table 3.1: Maltego : Minimum Hardware Requirements Hardware Minimum Requirement RAM 2 GB Processor 2 GHz Internet Connection speed 64 Kb Hard-drive Space 100 MB Display 1024x768

Two of the key components of the Maltego framework are the entities and transforms. Entities represent objects which transforms are run on. They are used to represent a number of things ranging from telephone numbers to IPv4 addresses. Transforms are then pieces of code that take in an entity as input and perform some kind of manipulation on this entity and often return as output a new entity. Figure 3.1 illustrates the process involved with executing a transform on an entity.

The first step in using Maltego is creating a new graph. A graph can be saved locally and loaded for editing at a later stage. A new graph is created by selecting the new graph icon in Maltego. 3.2. UNDERLYING ARCHITECTURE 26

Figure 3.1: Creating a new graph

The next step is then dragging an entity into the graph. In this example, a website entity was dragged into the graph. Selecting the entity is done with a single left click. Right clicking the entity brings up a menu from which transforms can be selected. In this example, the transform resolve to IP via DNS was selected.

Figure 3.2: Running a transform 3.2. UNDERLYING ARCHITECTURE 27

The application then passes the website entity to a piece of transform application back- end which then performs some manipulation on the entity and programatically creates and returns a new entity. In this case, the website will be passed back-end as an entity and perform some kind of look-up procedure to return an IPv4 address in the form of an IPv4 address entity.

Figure 3.3: The transform produces a new IPv4 entity

In this example, it is significant not only that we have generated previously unknown information about a given entity, but also that we have mapped the relationship between the two entities. This is fairly obvious in the case of a single transform performed on a single entity however the Maltego framework was developed so as to handle thousands of complex entities and relationships between entities. In a more complex example, we might run a number of transforms on the same IPv4 entity that we derived from the website. In this case, since the website is a part of a shared hosting scheme, we find that there are a number of domains that resolve to this IP address. In the next example, a total of 16 transforms were run on the IP address which produced a total of 49 entities including:

• People

• DNS Names

• Domains

• IPv4 Addresses 3.2. UNDERLYING ARCHITECTURE 28

• Websites

• Email Addresses

• NS Records

• MX Records

• Geographical Locations

This example perhaps more so than the useful information it generates presents an exam- ple of how the relationships between various of the underlying technologies can be useful in considering the web of relationships that bind each layer of technology together.

Figure 3.4: A more complex set of entities and relationships

In addition to providing an easy to navigate and relational graphing system, it provides the user the ability to change the way in which the graph can be represented. There are four modes in which a Maltego graph can be represented. These are:

1. Organic (Figure 11)

2. Block (Figure 12)

3. Hierarchical (Figure 13)

4. Circular (Figure 14) 3.2. UNDERLYING ARCHITECTURE 29

Figure 3.5: Block Layout

Figure 3.7: Circular Layout

3.2.2 Maltego Machines

“Maltego machines allow you to string together transforms to work with entities on a graph”2. Machines can be set to run on intervals automatically, extracting new infor- mation dynamically as it is generated. This is of particular interest to us as a means

2http://www.paterva.com/web6/documentation/developer.php 3.2. UNDERLYING ARCHITECTURE 30

Figure 3.6: Hierarchical Layout of automating the process of modelling and monitoring ongoing phishing attacks. It is possible to programatically call transforms this way and this means that not only does it mean that this can happen without interaction from the user, but it also performs more quickly that if a user were to manually call each transform.

3.2.3 Programming Languages

The transforms that were developed are written in python. This is as a result of the interpreted nature of the language as well as the speed of the development cycle that can be achieved using python. It also allows the transforms to work across various platforms in keeping with the portability of the Maltego platform. With relatively little effort, the entire Phishtego system can be setup and run on windows and most and Unix desk- top platforms. The speed penalty that an interpreted language such as python imposes does not impact on the overall experience with the system, since a large number of the transforms involve expensive I/O operations including interactions with the underlying network so that any performance increase gained with a compiled language would be insignificant given the overall time required to execute the transform. 3.2. UNDERLYING ARCHITECTURE 31

3.2.4 Transforms

In the interests of outlining the internal workings of the transform, the following describes how Maltego interacts with the transforms developed for Phishtego. As a working exam- ple, the Listing A.1 is the Phishtank.py class developed to handle interactions with the Phishtank API and which was implemented for the Phishtego framework. This example is taken from the transform which will be outlined in Table 3.3 shortly.

This class contains a constructor which requires a number of parameters including an API key, and optional updateInterval and web parameters as can be seen on line 13 in the Listing A.1. There are several methods contained within the class that include updating the Phishtank locally and validating a link against the database. For the sake of brevity, this will be the only code listing just so as to illustrate the relationships between external code, transforms and the Maltego framework. Now that we have a stand alone class that can be instantiated and URLs validated against, we look to integrate this into a transform.

The following illustrates a simple transform that makes use of the class listed.

Listing 3.1: A Maltego Transform 1 #Phishing Verification 2 from MaltegoTransform import ∗ 3 import sys 4 import phishtank 5 #−−−−− 6 7 #The Maltego Framework passes the entity to perform 8 #the transfom on as an argument to the appication 9 obj = sys.argv[1] 10 11 #We instantiate a new phishtank object with an API key 12 #and 60 minute update interval 13 phishtank = phishtank.phishtank(”xxxx”, updateInterval=60) 14 15 #We instantiate a MaltegoTransform object 16 me = MaltegoTransform() 17 18 def check ( ) : 19 #We validate the URL against the phishtank API 3.3. ENTITIES 32

20 i f phishtank .checkURL(obj ): 21 #If the URL is indeed malicious, we add an 22 #entity to the Transform object 23 me.addEntity(”phishtego.MaliciousURL”, obj) 24 25 check ( ) 26 27 #Upon completion of the transform, we return the MaltegoTransform 28 #object to the Maltego Framework 29 me . returnOutput ( )

In order to implement a transform, the Maltego library is imported as can be seen in Listing 3.1 on line 2. The transform is passed the entity to perform the transform on at a system level as an application argument. This is accessed by python on line 9 in Listing 3.1 and saved as a variable. The transform then instantiates a new instance of the class listed in Listing A.1. Next, a MaltegoTransform object is created and saved as a variable as can be seen on line 16 of Listing 3.1. A function is then defined on line 18 in Listing 3.1 which calls a method of the class defined in Listing A.1 and if this returns a true boolean value we add an entity to the MaltegoTransform object previously declared on line 16 of Listing 3.1. Finally, the transform returns the MaltegoTransform object at the end of the program on line 29.

For each of the transforms included, there are often multiple python classes that interact together in the back end of the application to produce meaningful information and repre- sent this information in the form of an entity. The following transforms are not outlined in as much detail in the interests of brevity but fundamentally operate in the same way. They are executed by the Maltego framework which performs some sort of manipulation on the entity passed to it and responds appropriately.

3.3 Entities

There are a number of entities that ship with Maltego out of the box. In some cases, these have been integrated into the Phishtego project however the majority of the project required the development of completely new entities. This is because Maltego is not designed to first and foremost explore phishing attacks. It seeks to provide intelligence gathering but provide extensibility by providing a rich API to work with. The Phishtego 3.3. ENTITIES 33

Entity Name Entity Icon Description

Domain An entity that represents an internet domain

Email Address An entity that represents an email address

IPv4 Address An entity that represents an internet domain

Email Source An entity that represents the source of an email including headers

Abuse Report Email An entity that represents an email address to contact with regard to abuse of the domain

EmailSourceDirectory An entity that represents a local directory that stores a number of email source files

Potential Phishing URL An entity that represents a link extracted from a phishing email that is potentially malicious

Suspicious Email Address An entity that represents an email address that appears as a contact in a phishing email

Phishing Target An entity that represents a target in a phishing campaign

Confirmed Phishing URL An entity that represents a verified malicious URL

Phishing Kit An entity that represents a phishing kit used by an attacker

Table 3.2: A Summary of Phishtego Entities framework makes use of a number of entities that attempt to best model some of the information that would be useful to a user wanting to gather additional information around an attack. Table 3.2 looks at the entities that are integrated or created as a part of the Phishtego framework along with a discussion of each entity and at an abstract level a motivation for usefulness of choosing to represent each entity from the point of the view of the user.

Each entity shares a relationship with one or more other entities in the framework. The following serves to expand upon each of the above entities and consider potential rela- tionships that can be shared between entities. 3.3. ENTITIES 34

3.3.1 Domain

A domain plays a centrally important role in the internet. A domain might serve as the host of a phishing campaign or command and control centre. Alternatively a domain might be used to represent a domain under the control of a user from which the user can evaluate and correlate information regarding the attack and the domain that the user controls. There are often mail servers as well as a number of potentially interesting services associated with a domain that might lead to discovering further useful information surrounding a domain.

3.3.2 Email Address:

Naturally, an email address is probably one of the most important elements in making sense of a phishing attack. Not only are victims most often targeted through their email address but are important part in the correspondence that occurs between an attacker and victim. Email addresses may also in turn yield interesting information about which domains are involved in the attack.

3.3.3 IPv4

The Internet Protocol version 4 addressing scheme provides possible one of the most useful means of identifying actors in the form of end points in a phishing attack. It is unlikely that an IP address will provide us an real information as to the physical identity or situation of an attacker behind the attack. It does however, provide a useful indication of how large an attack may be, how many domains are involved in the attack or the number of mail-servers involved in an attack. Extracting information about IP addresses often provides some of the most telling information around a phishing attack both for the use of identifying the threat and mitigating these threats.

3.3.4 Email Source

The source of the email is simply the body of text that makes up an email including the headers that are for the most part not displayed by email clients. This body of text con- tains a wealth of useful information about where the email originated from, attachments, mail servers and names. 3.3. ENTITIES 35

3.3.5 Abuse Report Email

Abuse report emails are emails that allow for the reporting of the abuse of the given domain to people that are responsible for the running of that domain. In the event that the domain being used to orchestrate a phishing campaign is not one registered by the attacker but one that the attacker has managed to take control of by compromising a host this provides a vital means of alerting the owner of the domain of the illegal activity being launched from the host. It gives the owner of the host the opportunity to begin to take steps to remove the threat and perhaps being forensic investigation into the matter.

3.3.6 EmailSourceDirectory

Though simple in concept, this entity provides the opportunity to build some complex external functionality into Phishtego. It is a simple entity in that it merely represents a directory on the local machine running Phishtego in which the source of emails can be stored.

3.3.7 Potential Phishing URL

This is a URL that has been included in a malicious email that may require further investigation. It is interesting to note both the commonality between links as well as the information that is posted to a server along with the link. The link has not been marked as malicious by either Google Safe Browse or Phishtank but is nevertheless to be treated with extreme caution.

3.3.8 Suspicious Email Address

In the event that the attacker rather than trick the victim into following a malicious link tries to illicit information out of the user via email correspondence, it would be useful for the user to be able to identify some of the potential addresses that the attacker would use to facilitate the correspondence. This looks to extract email addresses included in links, and the body of the email as well as the sender in order to identify potential threats. 3.4. TRANSFORMS 36

3.3.9 Phishing Target

This entity looks to represent a target or institution that is the target of a phishing attack. This might be a bank, financial institution, insurance company or indeed an individual. This is significant because the Phishtank API allows us to draw connections between various targets that may have previously have been thought to have been unrelated by providing information on who the target in question was with each attack.

3.3.10 Confirmed Phishing URL

A URL contained within an email is checked against two significant entities. The first is Phishtank and the second being Google Safe Browse. If either of these services confirm a URL as being a phishing page or as serving malware, this entity visually alerts the user to this fact. The visual impact of this entity is important in providing an immediate and intuitive warning to the user. This is significant in confirming the suspicions of the user who has stood correctly in treating the URL with caution.

3.3.11 Phishing Kit

In the event that Phishtank positively identifies a URL as a part of a phishing campaign, it may be possible to determine the phishing kit that was used to create and carry out the attack. This information could prove to be useful in correlating the groups behind phishing attacks.

3.4 Transforms

In the hierarchical graphical structure within Maltego, “transforms should be thought of as pieces of code that change one type of information to another”3. The following details and expands upon some of the transforms that Phishtego implements or makes use of. Each transform is identified by name and expands upon both its input and output requirements.

3https://www.paterva.com/web6/documentation/developer-local.php 3.4. TRANSFORMS 37

3.4.1 Verify Phishing Link

Table 3.3: Verify Phishing URL Transform Name verifyURL Maltego Input Entity suspiciousLink Maltego Output Entity MaliciousURL Average Run-time 8s

Performing the verification of a suspicious link invokes two sub processes. The first is a verification and look-up against the Phishtank API. If the URL appears as malicious or previously reported the Phishtego confirms this by creating a malicious URL entity. The next verification performed is done against the Google safesearch API. This is provides a vitally important part of the exploration process and might lead to the user concluding one of two things. If the URL is confirmed to be malicious it serves as a confirmation of the users suspicion and provides confirmation from a third party that the URL is a part of a broader malicious campaign. The alternative is that the URL is not identified as malicious by either of the third parties. This of course says nothing of the credibility of the email since there are a number of ways that a malicious URL might not be flagged by either of the other third parties. The first possibility is that it is malicious and simply has not been reported yet in which case it presents the end user with the opportunity to report the URL as malicious if indeed that is the case. A second possibility is that the link is in fact, not malicious in which case the user can be more confident in deciding that the link is not a real potential threat and proceed accordingly. An entity to represent the absence of a given URL from either of the services was intentionally not incorporated. This is done to avoid presenting the URL as safe even if this is the case, as there are a number of scenarios in which a URL will not be flagged by either of these services but still be unsafe. Two examples of this might be when:

1. The URL is malicious but in a part of a new attack that has not been reported to either of the services by another end point on the internet.

2. The URL is malicious but is not flagged as such by either of the services due to the fact that it is a part of a narrower spear-phishing attack (see chapter 2) that is not likely to have been encountered by other institutions. 3.4. TRANSFORMS 38

In this way the design decision was made to simply avoid returning any form of output in the event that it is not flagged by either Phishtank or Google safesearch so as not to inspire a sense of false confidence in the safety of a URL that is not flagged as malicious.

In the case of the Phishtank API, the Phishtego has a relatively thorough back-end to interact with the API. In order to reduce the bandwidth demands of the application both locally and on the Phishtank servers, the system periodically downloads and maintains locally a database from Phishtank which can be consulted and queried extremely quickly and more thoroughly than one might be tempted to with a large volume of API calls.

3.4.2 Generating Abuse Report Emails

Table 3.4: Generating abuse report emails Transform Name Prepare Abuse Report Email Maltego Input Entity IPv4 Address Maltego Output Entity None Average Run-time 0.006s

An abuse report email address is an email address that is often required by the domain registrar in the process of registering a domain. It is accessible through the WHOIS protocol and allows an internet user to lodge complaints with the owner of the domain around issues of abuse. This is significant in the case of phishing because as was previously discussed, phishing campaigns are often controlled from or launched from compromised servers on the internet. These servers were originally and probably still are running completely legitimate services online. In these cases, it is often possible to disrupt or completely halt a phishing attack at the source once the owner of the domain has been made aware of it. This is of course not true for an attack in which the owner of the domain is also the registrant as the attacker is unlikely to be sympathetic toward requests to stop the campaign. 3.4. TRANSFORMS 39

3.4.3 Directory Monitoring

Table 3.5: Monitoring a local directory for emails Transform Name Directory Monitor Maltego Input Entity EmailSourceDirectory Maltego Output Entity EmailSource Average Run-time 0.005s

This transform is possibly one of the most interesting and powerful features built into Phishtego. The function that it performs is simple in that it periodically monitors a local folder for new email source files that are included. While this in and of itself provides some immediately obvious use cases, it also presents the opportunity to integrate some more complex systems with Phishtego. A simple example of a use case would be simply to house a number of suspicious emails in a directory and use the transform to load each email individually for analysis. A more interesting use case however might be to have written a script or application that automatically retrieves suspicious emails from spam filters or alternatively from user submitted emails in a large corporation and saves them in the given directory for automatic parsing and analysing. In this way, Phishtego with the addition of a relatively simple transform provides a simple but effective means of allowing generic adapting and integration into currently existing systems that may already be in place in an organisation.

3.4.4 Link Extraction and analysis

Table 3.6: Link extraction and analysis Transform Name LinkExtractionAnalysis Maltego Input Entity EmailSource Maltego Output Entity Potential Phishing Link, Confirmed Phishing Link Average Run-time 0.005s

This transform parses the body of an email using regular expression to extract links. Isolating links in an email can be more challenging that it sounds as there are periodically strangely formed and bizarre URLs that occasionally seem to slip through the regular expression matching. Nonetheless, after several iterations the regular expression that Phishtego uses back-end when parsing the email body looks like: 3.4. TRANSFORMS 40

Figure 3.8: Regular expression used to extract links and URLs www. ( ? : [ a−zA−Z] | [ 0 − 9 ] | [ $− @ . & + ] | [! ∗ \ ( \ ),] | ( ? : % [ 0 − 9 a−fA−F][0 −9a−fA−F]))+

http[s]?://(?:[a−zA−Z] | [ 0 − 9 ] | [ $− @ . & + ] | [! ∗ \ ( \ ),] | ( ? : % [ 0 − 9 a−fA−F] [0−9a−fA−F]))+

Figure 3.9: Regular expression used to email addresses [ \w\.−]+@[ \w\.−]+

The second part of the extraction involves extracting email addresses included in the email. These may be email addresses in the FROM field in an email, email addresses included as a part of the message body or email addresses that might occur in the subject line of the email. Again, using a simple regular expression proved to be the most efficient means of extracting email addresses. The regular expression used in the extraction is listed in figure 3.9.

3.4.5 WHOIS

Table 3.7: WHOIS Transform Name phishtegoWHOIS Maltego Input Entity IPv4 Address, Domain Maltego Output Entity Abuse Report emails Average Run-time 5.05s

The WHOIS data as referred to in Chapter 2 contains information surrounding the owner of a given website. This is particularly useful to us as users in the event that the owner of a domain is unaware and not involved with the malicious activity. The WHOIS specification has an abuse report email field that if present provides a means of contacting the relevant owner. Of particular interest to us is the abuse report email which is used in the previously described transform. The WHOIS information gathered in Phishtego is gathered from a number of regional internet registry bodies including services available at the following addresses:

• whois://whois.ripe.net 3.5. AUTOMATING THE PROCESS 41

• whois://whois.apnic.net

• whois://whois.lacnic.net

• whois://whois.afrinic.net

• whois://whois.cymru.com

3.5 Automating the process

The transforms in Phishtego work together to produce a useful collection of information and relationships between entities. The transform in table 3.5 as previously mentioned provides a simple but powerful means of integrating the Phishtego system into any number of existing systems simply by storing suspected phishing or malicious emails to disk and representing this location in Phishtego using an EmailSourceDirectory entity. In the event that the saving of such emails can be automated and in the event that Phishtego could somehow be automated, the end user has a self sufficient system that the user can consult at any given time and have at a glance a fair idea of the kinds of threats that are currently posed by ongoing phishing attacks to their company or organisation.

Integrated into Phishtego is a machine which performs a number of automated transforms on an EmailSourceDirectory as well as subsequent entities that are derived from each transform. The machine performs the following tasks:

• On EmailSource entities, extract all information from them using multiple trans- forms including extracting links, email addresses and domains present in the email.

• On all email entities, verify that they do in fact exist and are not simply dummy addresses.

• Validate all SuspiciousLink entities against the online API’s previously discussed.

• On all domains, check if a website exists on port 80 for the domain. If it does, create a website entity.

• On all website entities, resolve these to IPv4 addresses and represent these as enti- ties.

• On all IPv4 entities, resolve these to WHOIS information and extract the abuse report email from the data if it is present. 3.5. AUTOMATING THE PROCESS 42

• On all AbuseReportEmail entities, prepare automated emails to be sent to report the phishing attack.

This machine as well as the ability to integrate arbitrary systems into Phishtego are perhaps the most powerful and useful features of the framework. Chapter 4

Case Studies

In order to provide some examples of how best to utilise the system, this chapter explores some case studies relating to phishing attacks. Section 4.1 looks at the use of Phishtego in fingerprinting and identifying an attack. In order to illustrate the system, Phishtego needs some data. Due to the sensitivity of phishing attacks, attaining real data from actual phishing campaigns is both difficult and probably provides little benefit over an generic phishing data.

4.1 An attack launched from a compromised server

4.1.1 Background

In this light, the test data that Phishtego uses in the following case studies are all de- rived from generic phishing campaigns that have been randomly selected from the online ‘throwaway’ email service Mailinator1. These services are typically not used as a part of a genuine online identity but are used to sign up for services that users suspect may result in receiving unnecessarily large volumes of correspondence from in the form of SPAM. Additionally, some of the less savoury online websites and services that people sign up for are run by entities that trade their email addresses to third party SPAM houses. This makes the service an ideal source of dummy data with each address typically receiving hundreds of phishing related emails each hour.

1http://www.mailinator.com

43 4.1. AN ATTACK LAUNCHED FROM A COMPROMISED SERVER 44

Listing 2 A Phishing email Subject: uBuyaPills Todayi documenter ... From: ”Sterling Green” Date: Thu, 11 Sep 2014 00:07:57 +0700 To: redacted

uBuyuTablets Herek unconscionable iGetExclusive dMedicaments Todayw
http://venisetours .com/calceolaria .php

The first case study looks at performing the process on a single phishing email. The structure of the email looks as follows:

For all intents and purposes, the email appears to be a common phishing email. The service promises to sell some sort of probably illegally peddled medicine in exchange for a credit card number and some other personal information. It serves as what appears to be a relatively standard phishing email and provides a good starting point for demonstrating the usefulness of the system.

4.1.2 Exploration and Fingerprinting

The first stage to making sense of the attack, is to pull out and identify information surrounding the email. We begin by adding the email source to Phishtego in the form of an emailSource entity as is demonstrated figure 4.1.

We then perform the a transform on the email source which produce a number of new entities. In this case, the transform run is the Link extraction and analysis transform referred to in figure 3.6. The transform in this example creates four new entities. These entities are illustrated in figure 4.2 and include:

• Two suspicious email addresses that were included in the email (A and C in figure 4.2) 4.1. AN ATTACK LAUNCHED FROM A COMPROMISED SERVER 45

Figure 4.1: Creating an email source entity

• A suspicious link entity that was also included in the email (B in figure 4.2)

• The domain associated with the suspicious link in question (D in figure 4.2)

The next step was to perform a lookup the URL included against the online services that are integrated into the framework. In this instance, it was not recognised as a known malicious domain or link. This poses an interesting situation as the domain associated with the link provided appears to look at first glance like a legitimate domain. This is a good example of how useful visually representing information can be. From this point, we further explore the domain in question which was confirmed to be hosting a website. From this point, using a set transforms we resolve the website to its IPv4 address. From this address, further exploration enabled the retrieval of gather the abuse report email from the WHOIS information of the registrant of the domain. This however would only be useful information to have in the event that the website serving the phishing content was not intentionally involved in the attack but rather had been compromised and was being used as a front for the attack because we are probably reasonable in assuming that a malicious actor would probably not show much concern for an abuse report email. 4.1. AN ATTACK LAUNCHED FROM A COMPROMISED SERVER 46

Figure 4.2: Analysis of the emailSource entity

Figure 4.3: Exploring the domain involved in the attack

4.1.3 Analysis

Before visiting the full URL in question locally, it was worth visited the landing page of the website. This landing page appears to be a legitimate website even if the site itself is dormant and under construction. 4.1. AN ATTACK LAUNCHED FROM A COMPROMISED SERVER 47

Figure 4.4: http://www.venisetours.com

However, after visiting the URL that was included with the email originally we see a quite different result. The page redirects to the actual phishing page. The domain that the user is redirected to is far more interesting and immediately looks suspicious. The URL that we are redirected to is http://kztefobn.com. This is a common tactic used by phishers to bypass SPAM filters, complicate blacklisting procedures and present the user with something that looks to be a legitimate link dha (2006) .

Figure 4.5: The redirected page

Based on this, we might conclude that the site has indeed been compromised and setup as a front for a phishing campaign by tricking users into following a legitimate looking link rather than the quite obviously suspicious http://kztefobn.com . Thus, in order to speed up the process of contacting the abuse report contact we run the abuseReportEmail 4.2. CORRELATING RELATIONSHIPS BETWEEN LARGER DATA-SETS 48 transform which generates an automated complaint ready to send to the abuse report email.

Figure 4.6: The redirected page

This first use case is a good example of how using the Phishtego system on a single suspicious email informed and facilitated a measured and sensible response to a phishing campaign by first identifying the architecture of the attack and then by automating a response mechanism. The hope is that the owner of the domain reacts appropriately and act to stop the malicious content being server off of his server.

4.2 Correlating relationships between larger data-sets

The purpose of the second case study is to illustrate the value in graphing and examining multiple phishing related emails on the same graph. It is not particularly easy to illustrate without the use of several targeted specific phishing attacks. However as the following case study shows, even with completely randomly selected emails from Mailinator in the absence of having a number of phishing emails aimed at a single organisation it is pos- sible to draw links and relationships between what appear to be completely independent phishing attacks. 4.2. CORRELATING RELATIONSHIPS BETWEEN LARGER DATA-SETS 49

4.2.1 Background

For the purposes of this study, there were 20 phishing emails randomly selected that were targeted at Mailinator users. These emails were then loaded into Maltego and ran multiple transforms of them in order to look for relationships that might exist between them. The hope is that even in completely independent email addresses there might exist some commonality and that by using the framework we are able to detect this commonality represent this visually. Figure 4.7 shows a number of these emails represented alongside each other. It is also interesting to note that at least one of the emails has already been confirmed as malicious be one or both of our external crowd sourced anti-phishing services. This is labelled in figure 4.7 as ‘Malicious Link’.

Figure 4.7: Multiple emails represented in Phishtego

As might be expected, the sample data has within it mainly ‘closed systems’ which have no relationship with any of the other phishing attacks. In this sense a closed system refers to the system of links, domains, emails and targets to be separate as is the case in figure 4.8.

Figure 4.8 shows two closed systems. Each email is a part of a distinctly different attack. Without performing any additional exploration into the closed systems it became apparent that they were less interesting for this example than exploring emails which appeared to have shared commonalities. Further exploration provided some useful insight into some common threads shared across a couple of the attacks. The emails detailed in figure 4.9 are an example of a correlation that was found after analysing a number of seemingly unrelated attacks aimed at different recipients. It is evident from the transformations 4.2. CORRELATING RELATIONSHIPS BETWEEN LARGER DATA-SETS 50

Figure 4.8: Closed Systems that the domain in question, doctorttdf.ru, is common across all four of the emails analysed.

Figure 4.9: Related Attacks

By exploring a number of seemingly random and unrelated emails in the Phishtego frame- work, we have been able to correlate and derive relationships from the emails. These emails which were addressed to different recipients seem to share some commonalities which seem to suggest the same origin too.

In this case, the domain appears to have been setup and registered by a malicious actor. 4.2. CORRELATING RELATIONSHIPS BETWEEN LARGER DATA-SETS 51

The domain does not appear to be consistent with a domain name that someone might register for legitimate causes. The likelihood that contacting the abuse report email would produce anything useful is unlikely at best. After visiting the links included in the emails and finding that the links were in fact malicious and looked to elicit personal data from users, these were reported to both Phishtank and Google SafeBrowsing. At present, these actions have to be done manually at present, but this could possibly be automated to some extend in the future which would further streamline the process. Several minutes after reporting the links as malicious and subsequently checking these links against our online validation services, the graph reflects this by representing the links with a relationship with a malicious link entity which are now confirmed as malicious by the online services. Figure 4.10 illustrates the same graph with the links now with MaliciousLink entities that share a relationship with the original links.

Figure 4.10: Related Attacks with Malicious Links reported

This use case is a good example of how Phishtego can be used to find and derive relation- ships between phishing campaigns that might originally not be obvious. In this way, the system would be a valuable asset to anyone that was interested in monitoring and track- ing phishing campaigns in general but more specifically monitoring and understanding campaigns surrounding a specific organisation or target. 4.3. AUTOMATED MONITORING 52

4.3 Automated monitoring

As has been previously mentioned, one of the possible uses of Phishtego is a completely automated monitoring system of a local directory. After creating a Maltego machine for Phishtego, the following chain of events occurs programatically every 30 seconds:

• The email source directory is checked for any new emails. If there are new emails, create new entities in the framework representing them.

• On EmailSource entities, extract all information from them using multiple trans- forms including extracting links, email addresses and domains present in the email.

• On all email entities, verify that they do in fact exist and are not simply dummy addresses.

• Validate all SuspiciousLink entities against the online API’s previously discussed.

• On all domains, check if a website exists on port 80 for the domain. If it does, create a website entity.

• On all website entities, resolve these to IPv4 addresses and represent these as enti- ties.

• On all IPv4 entities, resolve these to WHOIS information and extract the abuse report email from the data if it is present.

• On all AbuseReportEmail entities, prepare automated emails to be sent to report the phishing attack.

In this example, the email source directory initially contained 4 phishing emails. Figure 4.11 shows the graph after running the machine over a single iteration. 4.3. AUTOMATED MONITORING 53

Figure 4.11: Closed Systems

Figure 4.11 shows the first iteration of the Phishtego monitoring machine. This has automatically run transforms off the EmailSourceDirectory and generated entities and relationships from this starting point. With the aim of simulating the addition of new suspicious emails that might in a real example be taken from a spam filter or alternatively manually reported by an employee of a company for example, we manually add several new emails to the directory. Keep in mind that this need not have been the case. The emails could have been inserted into the directory automatically after being pulled down from a spam filter or have been flagged by a user on the network as suspicious for example. The following iteration of the machine demonstrates the effectiveness of the having integrated a means of automating the procedure. 4.3. AUTOMATED MONITORING 54

Figure 4.12: Closed Systems

Upon detecting the addition of the new emails, the framework creates new EmailSource entities on the graph. The machine then automates the running of resulting transforms on each entity.

Figure 4.13: Automated email retrieval and transforms

This example serves to illustrate the power of automating the exploration process. This grants the user the ability to leave the system unattended and intermittently check the status of the phishing related attacks on an institution. Chapter 5

Conclusion

5.1 Analysis of Goals

As was stated in the first chapter, there were three primary project goals:

1. Create a system that models phishing attacks that can be deployed locally on a machine

2. Produce meaningful information from the large volumes of raw data that can be gathered from a phishing campaign

3. Provide a means of facilitating decision making around reacting to phishing cam- paigns and automating this response where possible

The first goal was achieved in the creation of the Phishtego framework. This not only can be deployed as a part of the free community edition of the Maltego framework but the transforms can all be deployed and run locally on a machine without being dependent on an external server.

The second goal was achieved by carefully choosing which data to return as relevant. The system does this by making intelligent decisions about what information is relevant in an email. Consider for example, a WHOIS transform that displays only the abuse report email which facilitates the remediation aspect of the framework instead of plotting volumes of meaningless data. This and other design decisions help to keep the framework light and fast, as well as shielding the user from being overwhelmed by information. This

55 5.2. FUTURE WORK 56 leaves the user with a much clearer idea of the structure of the phishing attack without having to worry about the complex mess of data that lies beneath it.

The final goal was to facilitate decision making around possible reactions to an ongoing phishing attack. Key to achieving this was to provide the user with an understanding about the kind of attack in question. Some of the use cases in Chapter 4 highlight this process of first exploring and analysing and attack and then shifting proactively into deciding how best to deal with the attack. Phishtego also provides a means of automatically generating and inserting content into an email to send to an abuse report email.

The Phishtego framework provides a means of exploring, analysing, correlating and re- acting to phishing campaigns by illustrating relationships between actors in a phishing campaign, deriving useful information from existing data and facilitating response mech- anisms. In addition to these goals, the mechanisms behind the framework were written in such a way as to be easily automated which includes the addition of machines into the framework.

Additionally, one of the most versatile and potentially powerful features of the framework is the ability to integrate existing solutions into the framework through the use of directory monitoring which means that the project has the potential to perform either the central role within a anti-phishing system, or merely compliment any existing solution perhaps by being a visual representation of phishing related attacks.

5.2 Future Work

During the course of this research, there were a number of areas to potentially expand the project into that were beyond the scope of this work. Some of these ideas are suggested below as possible extensions to the project in the future.

5.2.1 Introducing additional online services

There is much that can be done in terms of future work. One major improvement could be to integrate additional phishing services. There are a number of new and growing services that provide online API’s which would increase the accuracy and effectiveness of the identification of phishing attacks. Some suggestions for this include: 5.2. FUTURE WORK 57

• Webroot Real-Time Anti-Phishing API

• ISIT Phishing

The more correlation that the system can infer from a given services, the more accurate the validation it can provide on a given link is possible.

Within the realm of this interaction to be built upon is the automated reporting of links to the relevant online services. At present, it is not possible to completely autonomously report a link. This is largely due to the potential abuse of this functionality. This area warrants further exploration.

5.2.2 Extension into analysis of attachments

More and more frequently, phishers are attempting to bypass phishing filters and phishing protection mechanisms. Phishers present their content through the use of attachments including images, PDF documents and office documents. There is a large volume of work that can be done with regard to analysing, processing, identifying and classifying attachments included in an email. Analysis of Malware is a large field in and of itself and the extension of the environment into this field of information security would certainly be challenging. However, this would further enhance the frameworks ability to identify more complex phishing attacks.

5.2.3 Reporting Mechanism

It would be particularly useful to be able to generate reports from an existing graph. This could be generated at set intervals which would present possible suggestions and the general ‘health’ of the organisation with regards to phishing attacks. It could potentially also highlight problem areas and areas for concern. This would also involve a fair amount of artificial intelligence to run efficiently but is something that would no doubt would be a useful addition to the framework.

5.2.4 Tool Integration

Another possibility would be to shift from dynamically modelling individual attacks aimed at organisations and move towards drawing up relationships between much larger sets of 5.2. FUTURE WORK 58 data. The Phishtank service provides a means of caching data offline. Providing a means of analysing, correlating and expanding on a specified file format would be something useful to look into. For example, specifying a CSV file that contained phishing related data that could be integrated into the system. This would further enhance the systems ability to integrate with data provided by other existing systems (once formatted according to our specification) and would be a valuable addition to the framework. References

2004. RFC 3912 - WHOIS Protocol Specification.

2006. Why Phishing Works. Montr´eal: Conference on Human Factors in Computing Systems, for ACM.

2007 (May). Learning to Detect Phishing Emails. Vol. 16. World Wide Web Conference.

Binsalleeh, H., Ormerod, T., Boukhtouta, A., Sinha, P., Youssef, A., Debbabi, M., & Wang, L. 2008. On the Analysis of the Zeus Botnet Crimeware Toolkit. Technical report. National Cyber Forensics and Training Alliance Canada and the Computer Security Laboratory, Concordia University.

Carmel, David, Mishne, Gilad, & Lempel, Ronny. 2005 (May). Blocking Blog Spam with Language Model Disagreement. Technical report. Informatics Institute, University of Amsterdam.

Central Intelligence Agency. 2009. Country Comparison :: Internet users. Online. Avail- able from: https://www.cia.gov/library/publications/the-world-factbook/ rankorder/2153rank.html.

Cyveillance. 2008. The Cost of Phishing: Understanding the True Cost Dynamics Behind Phishing Attacks. Technical report. Cyveillance.

EMC. 2013 (June). Fraud Report Bugat Trojan Joins the Mobile Revolution. Technical report. EMC.

FACTS, PHISHING. 2006. Phishing mongers and posers. Communications of the ACM, 49(4), 21.

Garera, Sujata, Provos, Niels, Chew, Monica, & Rubin, Aviel D. 2007. A framework for detection and measurement of phishing attacks. Pages 1–8 of: Proceedings of the 2007 ACM workshop on Recurring malcode. ACM.

59 REFERENCES 60

Goodman, J.T., Rehfuss, P.S., Rounthwaite, R.L., Mishra, M., Hulten, G.J., Richards, K.G., Averbuch, A.H., Penta, A.P., & Deyo, R.C. 2009 (Dec. 15). Phishing detection, prevention, and notification. US Patent 7,634,810.

Gu, Guofei, Perdisci, Roberto, Zhang, Junjie, & Lee, Wenke. 2008. BotMiner: Clustering Analysis of Network Traffic for Protocol- and Structure-Independent Botnet Detection. Technical report. Georgia Institute of Technology.

Irish, John, Morgan, Stephen, Pittelli, Frank, & Varga, Michael. 2001. Internet authenti- cation with multiple independent certificate authorities.

Karlof, Chris, Tygar, J.D., Wagner, David, & Shankar, Umesh. 2007. Dynamic Pharming Attacks and Locked Same-origin Policies for Web Browsers. In: Fourteenth ACM Conference on Computer and Communications Security.

Kirda, Engin, & Kruegel, Christopher. 2006. Protecting Users against Phishing Attacks. The British Computer Society.

Kurose, James, & Ross, Keith. 2013. Computer Networking : A top down approach. Pearson.

Milletary, Jason. 2005. Technical Trends in Phishing Attacks. Technical report. United States Computer Emergency Readiness Team.

Moore, Tyler, & Clayton, Richard. 2007a. An Empirical Analysis of the Current State of Phishing Attack and Defence. In: WEIS.

Moore, Tyler, & Clayton, Richard. 2007b. Examining the impact of website take-down on phishing. Pages 1–13 of: APWG eCrime Researchers Summit.

Ponemon-Institute. 2013. 2013 Cost of Data Breach Study: Global Analysis. Technical report. Symantec.

Rasmussen, Rod, Aaron, Greg, & Routt, Aaron. 2013 (September). Global Phishing Survey: Trends and Domain Name Use in 1H2013. Technical report. Anti-Phishing Working Group.

RSA. 2013 (February). Fraud Phishing Report - The same wolf just different sheeps clothing. Technical report. RSA.

RSA. 2014. RSA MONTHLY FRAUD REPORT FRAUD REPORT 2013 A YEAR IN REVIEW. Technical report. RSA. REFERENCES 61

Sheng, Steve, Wardman, Brad, Warner, Gary, Cranor, Lorrie Faith, Hong, Jason, & Zhang, Chengshan. 2009. An Empirical Analysis of Phishing Blacklists.

Shi, Junxiao, & Saleem, Sara. 2012. Computer Security Research Reports : Phishing. Technical report. University of Arizona.

Suri, Rableen Kaur, Tomar, Deepak Singh, & Sahu, Divya Rishi. 2012. An Approach to Perceive Tabnabbing Attack. INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH, 1(6), 90–94.

Tom Jagatic, Nathaniel Johnson, Markus Jakobsson Filippo Menezer. 2007. Social Phish- ing. Communications of the ACM, 50(10), 72–80.

Trend Micro. 2012. Spear-Phishing Email: Most Favored APT Attack Bait. Technical report. Trend Micro.

Turner, Paul, Polk, William, & Barker, Elaine. 2012. Preparing for and Responding to Certification Authority Compromise and Fraudulent Certificate Issuance. Technical report. National Institute of Standards and Technology.

Watson, David, Holz, Thorsten, & Mueller, Sven. 2005 (May). Know your Enemy: Phish- ing. Technical report. The Honeynet Project & Research Alliance.

Wittel, Gregory, & Wu, S. Felix. 2002. On Attacking Statistical Spam Filters. Department of Computer Science University of California.

Wu, Min, Miller, Robert C, & Little, Greg. 2006. Web wallet: preventing phishing attacks by revealing user intentions. Pages 102–113 of: Proceedings of the second symposium on Usable privacy and security. ACM. Appendix A

Appendix

Listing A.1: An External Python Class 1 #Phishtank integration 2 import u r l l i b 2 3 import r e q u e s t s 4 import time 5 import os 6 import pandas 7 import time 8 #−−−−−−− 9 10 class phishtank : 11 12 #Constructor. Initialize and set program parameters including API key 13 def i n i t ( s e l f , key, updateInterval = 45, web = True): 14 s e l f . runstate = True 15 s e l f . api key = api key 16 self.updateInterval = updateInterval 17 s e l f . data = ”” 18 s e l f . web = web 19 currentAttacks = {} 20 try : 21 f = open(’lastupdate’, ’r’) 22 s e l f . lastUpdate = float (f.read().replace(”\n” , ”” ) ) 23 f . c l o s e ( )

62 63

24 except : 25 self.updatePhishTankData() 26 pass 27 s e l f . update ( ) 28 29 #Initiate the run sequence 30 def run ( s e l f ) : 31 while (self.runstate): 32 s e l f . update ( ) 33 print ”sleeping 60 seconds” 34 time . s l e e p (60) 35 36 #Determine whether the application needs to update again 37 def checkLastUpdate( self ): 38 f = open(’lastupdate’, ’r’) 39 s e l f . lastUpdate = float (f.read().replace(”\n” , ”” ) ) 40 f . c l o s e ( ) 41 42 #Update the Phishtank repository locally 43 def updatePhishTankData( self ): 44 print ”updating phishtank” 45 url = ”http://data.phishtank.com/data/{0}/ online −v a l i d . csv ” 46 . format ( s e l f . api key ) 47 response = urllib2.urlopen(url) 48 self.data = response.read() 49 f = open(”phishtank.data”, ’a’) 50 f . write ( s e l f . data ) 51 f . c l o s e ( ) 52 f = open(”lastupdate”, ’w’) 53 f . write ( str (time.time())) 54 f . c l o s e ( ) 55 print ”finished updating phishtank” 56 57 #Determine whether or not the phishtank data needs to be updated 58 def update(self ): 59 self.checkLastUpdate() 60 print ”Checking update” 64

61 now = time . time ( ) 62 print ”last update was ” + str ( int (now − self .lastUpdate)) + 63 ” seconds ago. Interval is ”+ str (self .updateInterval ∗ 60) 64 i f ( int (now − self .lastUpdate) > self .updateInterval ∗ 6 0 ) : 65 print ” updating ” 66 self.updatePhishTankData() 67 print ”Done” 68 return 69 print ”Not updating” 70 71 #Verify against the database whether a URL is present 72 def checkURL(self , url): 73 i f s e l f . web : 74 x = urllib2.urlopen(”http://checkurl.phishtank.com/checkurl/... 75 . format ( u r l ) ) 76 if ”true” in x.read().lower(): 77 return True 78 e l s e : 79 return False 80 self.checkLastUpdate() 81 i f ( r >0): 82 return True 83 e l s e : 84 return False