Finding Fraud in Large and Diverse Data Sets
Total Page:16
File Type:pdf, Size:1020Kb
Business white paper Finding fraud in large and diverse data sets Applying real-time, next-generation analytics to fraud detection and prevention using the HP Vertica Analytics Platform Developments in data mining In the effort to identify and deter fraud, conventional wisdom still applies: Follow the money. That simple adage notwithstanding, the task of tracking fraud and its perpetrators continues to vex both private and public organizations. Clearly, advancements in information technology have made it possible to capture transaction data at the most granular level. For instance, in the retail trade alone, transmissions of up to 500 megabytes daily between individual point-of-sale sites and their data centers are typical.2 Logically, such detail should result in greater transparency and greater capacity to fight fraud. Yet, the sheer volume of data that organizations now maintain, pulled from so many sources and stored across a range of locations has made the same organizations more vulnerable.3 More points of entry amount to more opportunities for fraud. In its annual Global Fraud Report, The Economist found that 50% of all businesses surveyed acknowledged they were vulnerable to fraud; 35% of North American companies specifically cited IT complexity for increasing their exposure to risk. Accordingly, the application of data mining as a security measure A solution for real-time fraud detection has become increasingly germane to modern fraud detection. Historically, data mining as a means of identifying trends from raw Fraud saps hundreds of billions of dollars each year from the statistics can be traced back to the 1700s with the introduction of bottom line of industries such as banking, insurance, retail, Bayes’ theorem. Since then, statistical analysis has evolved as a healthcare, government social services, and service providers. In means for institutions to measure outcomes, identify customer the United States alone, estimates range as high as $994 billion in behavioral trends, and make forecasts to support management annual fraud-related losses.1 A practice of pattern recognition that decisions. Most recently, the development of advanced algorithms combines the analysis of current transactional data with known has made it possible to rapidly discern patterns, even from the vast fraudulent activities, in addition to other statistical techniques, stores of disparate data. Subsequently, data mining can identify can yield predictive models for real-time prevention. This paper patterns within transaction records to shed light on potentially examines how the HP Vertica Analytics platform is ideal for real- fraudulent actions by vendors, customers, or employees. The time fraud detection, and provides a real-world example of a credit fraudulent pattern can then become a valuable point of reference; card fraud scenario called skimming. business parties that conduct the same type of transactions can then be scored based on their likelihood for fraud. 2 There are three relatively recent developments External fraud vs. internal fraud in statistical algorithms that support pattern Fraud’s impact is felt across a vast spectrum of industries, and is matching for fraud detection. They include: manifested in a range of 30 categories of victimization, including bank credit cards, mail, loans, and utilities. Complaints are • registered by government institutions, including legal and criminal Front-end outlier detection in multivariate justice organizations, as well businesses of all sizes and individual data streams4 citizens. The targets of these complaints are both external— 5 consumers and organized criminals—and internal—employees of • Neural networks the victimized organizations. Historically, internal fraud generates • Social network analysis6 the greatest attention, notably due to the big-ticket crimes carried out by rogue securities traders or corporate procurement staff. Among the recent cases involving a trader, Kweku Adeboli with Switzerland’s UBS bank was accused of stealing $2.03 billion through false accounting practices. Other forms of internal fraud Taking a toll: the figures on fraud that subject institutions to large losses include the theft of trade secrets and technology and asset misappropriation by employees. In all its forms, fraud hits business hard. The Association of Certified Fraud Examiners estimates that companies worldwide While big-ticket fraud grabs the headlines, it’s a relatively small slice lose 5% of revenues to fraud, or about $3.5 trillion.7 U.S. of the overall casebook. In fact, less than 5% of all fraud complaints organizations alone lose about $994 billion per year. And roughly involve more than $5,000 in losses to the victims. Rather, it’s small- half of the organizations surveyed by the ACFE fail to recover any of scale external fraud that represents the significant majority of their fraud-related losses. In its annual Global Fraud Report,8 The losses. These crimes come in the form of credit card fraud, medical Economist found that 75% of all organizations were victims of fraud reimbursement claims, social welfare benefits, check scams, or false at some level, including 66% in North America. invoicing. Of these external fraud crimes, credit card fraud is a study in scope and severity. Globally, fraudulent use of payment cards Meanwhile, in the U.S., the volume of consumer fraud complaints (including general purpose and private label credit cards, debit cards, has escalated dramatically over the past decade. The Consumer and prepaid payment cards) generated $7.6 billion in losses in 2010,11 Sentinel Network, a service of the U.S. Federal Trade Commission, up 10.2% from the previous year. The United States sustained a identified approximately 990,242 fraud-related complaints disproportionate share of those losses; while the U.S. registered 27% submitted to various authorities during 2011,9 up from 137,306 of worldwide payment card business in 2010, it reported nearly half in 2001. Cybercrime, which is reflected in the trend of consumer (47%) of all losses, or $3.56 billion. complaints, also continues to grow in severity. The Internet Crime Complaint Center (IC3) reported 314,246 complaints in 2011—up Although there are numerous cases where internal fraud and 3.4% over 2012—representing $485.3 million in losses. On average, external fraud are linked, and internal fraud sets the stage for IC3, and a partnership between the FBI, the National White Collar external fraud, we want to focus on the distinction between internal Crime Center, and the Bureau of Justice Assistance, fields 26,000 fraud and external fraud; in this paper we intend to delve into the complaints per month.10 subject of external fraud, and how HP Vertica Analytics Platform provides a solution to help detect and prevent such incidents.12 3 Proactive vs. reactive: the HP Vertica The truth is in the transactions Analytics Platform Identifying likely credit card skimmers from transaction data starts with a comprehensive set of transaction data combined with a list of Analysis of transaction data can provide a retroactive means known fraud events. In this example, we’ll present two methods— of detecting fraud, but real-time use of transaction data can one approximate and one more precise—to use in conjunction with proactively step in to stop fraud. Faced with the complexities of Big transaction records and thus yield the probability that a particular Data, traditional relational database systems can be ill equipped transaction source (a store, online merchant, etc.) is skimming. For to turn data analysis into quick decisions, as the cost of designing, either approach, required information for this analysis includes a list building, and deploying can be prohibitive.13 The HP Vertica Analytics of shopping transactions—who shopped at which merchants—and a Platform, based on a grid-enabled, column-oriented system, is list of which shoppers reported fraudulent transactions. Such data can specifically designed to provide real-time intelligence from data be culled from basic transaction history that bears a date stamp. From warehouse operations. HP Vertica Analytics Platform includes the this historical data, it’s then possible to determine the typical fraud capability to conduct queries from 50-to-1000 times faster than rate for each merchant and apply that rate as a score, or risk factor. traditional row-oriented RDBMS. On that basis, let’s launch this example based on a data set of In the following scenario, HP Vertica Analytics Platform will be transactions from purchases made at three merchants: applied to one of the most insidious forms of crime that is described above: credit card fraud; and more specifically, credit card skimming. Merchant 1. Honest Abe’s, which has no skimming record Merchant 2. Sketchy’s, which represents a 50% chance The scoop on skimming of skimming Within the category of credit card fraud, skimming is responsible Merchant 3. McFraudster, which always skims for roughly 15% of all credit card fraud, and bankrate.com reports that skimming results in about $1 billion in annual losses. Note: Because our data is synthetic, we already know who is Skimming is the term for a fraudulent charge made on a credit skimming. Our challenge in this exercise is to see if the results card number by someone other than the card holder. A skimming match up with that knowledge. incident is commonly recognized after a customer makes a series of transactions with his credit card, only to report at a later date that one or more of the charges to his bill was fraudulent. Typically, the complaint reflects that someone copied the victim’s credit card information and used the information to make the unauthorized transaction. Skimming crimes are commonly traced to unscrupulous merchants or their employees who have the opportunity to collect card information during a legitimate purchase. Subsequently, a pattern matching approach to data analysis can determine which merchants establish a track record for being associated with fraudulent transactions. While skimming can be the work of isolated operators, it’s also known to be run by sophisticated scam artists on an international scale.