Inside the Scam Jungle: a Closer Look at 419 Scam Email Operations Jelena Isacenkova1*, Olivier Thonnard2,Andreicostin1, Aurélien Francillon1 and David Balzarotti1
Total Page:16
File Type:pdf, Size:1020Kb
Isacenkova et al. EURASIP Journal on Information Security 2014, 2014:4 http://jis.eurasipjournals.com/content/2014/1/4 RESEARCH Open Access Inside the scam jungle: a closer look at 419 scam email operations Jelena Isacenkova1*, Olivier Thonnard2,AndreiCostin1, Aurélien Francillon1 and David Balzarotti1 Abstract 419 scam (also referred to as Nigerian scam) is a popular form of fraud in which the fraudster tricks the victim into paying a certain amount of money under the promise of a future, larger payoff. Using a public dataset, in this paper, we study how these forms of scam campaigns are organized and evolve over time. In particular, we discuss the role of phone numbers as important identifiers to group messages together and depict the way scammers operate their campaigns. In fact, since the victim has to be able to contact the criminal, both email addresses and phone numbers need to be authentic and they are often unchanged and re-used for a long period of time. We also present in detail several examples of 419 scam campaigns, some of which last for several years - representing them in a graphical way and discussing their characteristics. Keywords: Email spam; Nigerian scam; 419 scam; Cyber crime; Campaigns 1 Introduction human factors: pity, greed, and social engineering tech- Nigerian scam [1], also called ‘419 scam’ as a reference to niques. Scammers use very primitive tools (if any) com- the 419 section in the Nigerian penal code [2], has been pared with other forms of spam where operations are a known problem for several decades. The name encom- often completely automated. A distinctive characteristic passes many variations of this type of scam, like advance of email fraud is the communication channel set up to fee fraud, fake lottery, black money scam, etc. Originally, reach the victim: from this point of view, scammers tend the 419 scam phenomenon started by postal mail, and to use emails and/or phone numbers as their main con- then evolved into a business run via fax first, and email tacts [5], while other forms of spam are more likely to later. 419 scam is a popular form of fraud in which the forward their victims to specific URLs. For instance, a fraudster tricks the victim into paying a certain amount of previous study of spam campaigns [6] (in which scam money under the promise of a future, larger payoff. The was considered a subset of spam) indicates that 59% of prosecution of such criminal activity is complicated [3] spam messages contain a URL. However, even though 419 and can often be evaded by criminals. As a result, reports scam messages got eclipsed by the large amount of spam of such crime still appear in the social media and online sent by botnets, they still pose a persistant problem that communities, e.g. 419scam.org [4], exist to mitigate the causes substantial personal financial losses for a number risk and help users to identify scam messages. of victims all around the world. Nowadays, 419 scam is often perceived as a particu- The traditional spam and scam (not 419) scenarios have lar type of spam. However, while most of the spam is been already thoroughly studied (e.g. [6,7]), where a big sent today in bulk through botnets and by compromised part of existing unsolicited bulk email identification tech- machines, 419 scam activities are still largely performed niques rely on high volumes of similar messages. However, in a manual way. Moreover, the underlying business and 419 messages are more likely to be sent in lower copies and operation models differ. Spammers trap their victims from webmail accounts. Thus, criminals aim to stay unno- through engineering effort, whereas scammers rely on ticed by the traditional spam filters and avoid drawing attention to abused webmail accounts. The exact distribu- *Correspondence: [email protected] tion methods of 419 scam messages have not been stud- 1Eurecom, Route des Chappes, Sophia Antipolis, France ied as deeply as, for example, the distribution of botnet Full list of author information is available at the end of the article spam. However, based on Microsoft Security Intelligence © 2014 Isacenkova et al.; licensee Springer. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Isacenkova et al. EURASIP Journal on Information Security 2014, 2014:4 Page 2 of 18 http://jis.eurasipjournals.com/content/2014/1/4 Reports [8], 419 scam messages constitute on average to Therestofthepaperisorganizedasfollows:Westartby 8% of email spam traffic (data over the last 5 years). describing the scam dataset (Section 3), to which we apply A recent study by Costin et al. [5] describes the use of our cluster analysis technique to extract scam campaigns, phone numbers in a number of malicious activities. The and compare the usage of email addresses and phone authors show that the phone numbers used by scammers numbers (Section 4). In Section 5, we focus on a number are often active for a long period of time and are reused of individual campaigns to present their characteristics. over and over in different emails, making them an attrac- Finally, we draw our conclusions in Section 6. tive feature to link together scam messages and identify possible campaigns. In this work, we test this hypothe- 2 Related work sis by using phone numbers and other email features to Scammers employ various techniques to harvest money automatically detect and study scam campaigns by using from ingenuous victims. Tive [13] introduced the tricks a public dataset. To the best of our knowledge, this is the of 419 fee fraud and the philosophy of the tricksters first in-depth study of 419 email campaigns. While a pre- behind. Stajano and Wilson [14] studied a number of scam liminary version of this study was published in [9], this is techniques and demonstrated the importance of security an extended version of the study. engineering operations. Herley [15] analysed attack deci- Our goal is to study how scammers orchestrate their sions as binary classification problems studying the case scam campaigns, by looking at the interconnections of 419 scammers. The author looks into the economical between email accounts, phone numbers, and email topics aspects of adversaries trying to understand how scammers used by scammers. To this aim, we use a novel multi- find viable victims out of millions of users, so that their criteria decision algorithm to effectively cluster scam business still remains profitable. A brief summary of 419 emails that are sharing a minimal number of common- scam schemes was presented by Buchanan and Grant [3] alities, even in the presence of more volatile features. indicating that Internet growth has facilitated the spread Because of this set of commonalities, scam emails origi- of cyber fraud. They also emphasized the difficulties of nating from the same scammer(s) are likely to be grouped adversary prosecution - one of the main reasons why 419 together, enabling us to gain insights into scam cam- scam is still an issue today. A more recent work by Oboh paigns. Additionally, we also evaluate the quality and et al. [16] discussed the same problem of prosecution in a consistency of our clustering results. To this aim, we per- more global context taking the Netherlands as an example. form a threshold sensitivity analysis, as well as evaluate the Another work by Goa et al. [17] proposed an ontol- homogeneity of clusters using the graph compactness and ogy model for scam 419 email text mining demonstrating Adjusted Rand Index as metrics. high precision in detection. A work by Pathak et al. [6] In our analysis, we have identified over 1,000 differ- analysed email spam campaigns sent by botnets, describ- ent campaigns and, for most of them, phone numbers ing their patterns and characteristics. The authors also represent the cornerstone that allows us to link the differ- showed that 15% of the spam messages contained a phone ent pieces together. We also discovered some larger-scale number. A recent patent has been published by Coomer campaigns (so-called ‘macro-cluster’), which are made of [18] on a technique that detects scam and spam emails loosely inter-connected scam clusters reflecting different through phone number analysis. This is the first mention- operations of the same scammers. We believe these can ing of phone numbers being used for identifying scam, be attributed to different scam runs orchestrated by the but with no technical implementation details. Costin et al. same criminal groups, as we observe the same phone [5] studied the role of phone numbers in various online numbers or email accounts being reused across different fraud schemes and empirically demonstrated it’s signifi- sub-campaigns. cance in 419 scam domain. Our work extends the study As demonstrated by our experiments, our methods and by focusing on scam email and campaign characterization findings could be leveraged to pro-actively identify new that relies on phone numbers and email addresses used by scam operations (or variants of previous ones) by quickly scammers. associating a new scam to ongoing campaigns. We believe that this would facilitate the work of law enforcement 3 Dataset agencies in the prosecution of scammers. Our approach In this section, we describe the dataset we used for ana- could also be leveraged to improve investigations of other lyzing 419 scam campaigns and provide some statistics cybercrime schemes by logging and investigating various of the scam messages. There are various sources of scam groups of cybercriminals based on their online activities. often reported by users and aggregated afterwards by In this regard, our methodology already proved its util- dedicated communities, forums, and other online activ- ity in the context of other security investigations, such as ity groups.