<<

International Journal on Advanced Computer Theory and Engineering (IJACTE) ______

Cybercrime Analysis and Mining Methodologies

1Deepti Gaur, 2Neha Aggarwal Department of & Information Technology ITM University, Gurgaon, Haryana, India. Email: [email protected]

critical, new and special problems of crime, although the Abstract— In this paper authors presented about the crime a latest emerging area in the field of crime problem is as old as man himself. In addition to . Paper also include the complete this, the techniques employed to commit crime are new survey of all the mining methodologies available along with in the sense that they make use of modern knowledge the of data mining steps involved in the crime. Crime can and technique. The rise in crime both national and be national or international but its always a distractive international is generally thought as the result of process in the society. interplay between socio-economic changes. The Index Terms—Crime Data Mining, Precision, Recall, circumstances surrounding the individual offender such Hotspots, Techniques ,CRISP-DM methodology. as his personality, physical characteristics , family background, environmental surrounding such as I. INTRODUCTION peer groups, neighbors etc have been subject of the Crime is identified as an act which is punishable by study of crime. (Andargachew, 1988). So by, legislation in accordance with Thakur[9]. However, an the attributes of criminals will be helpful act that is considered as a crime in one place and time to design and implement proper crime prevention may not be true in another place or time. According to strategies. The Governments usually establish Andargachew (1988), a criminal is an individual person organizations such as courts, prosecutions and police, who has violated the legally forbidden act. In , there which are responsible for the maintenance of law and are some factors that have to be taken into account to order in their respective country. These agencies and convict whether a person should be considered as a other related organizations are responsible to curb the criminal or not. Among these, an individual should be of rate and occurrence of crimes. The crime prevention competent age in light with the law ; and there must be a agencies need to issue and implement crime prevention well-predefined punishment for the particular act strategies[8]: committed.  Prevention safeguards the life and property of Offense has increasingly become as complex as human the society whom the authorities are in duty to . Contemporary technological improvement and protect. huge development in communication have facilitated  It avoids much of difficulty to the prey equally criminals of every place of the planet to spend a crime bodily and mental. applying advanced equipment in one single place and then escape to a different place[9]. Now adays the globe  Crime elimination rules out litigation, which is facing the proliferation of problems such as for follows along the way of sensing a crime. example illicit drug trafficking, smuggling, hijacking,  Prevention also saves the authorities from the kidnapping, and terrorism. difficulty of producing crime at all strange The level of crime also depends upon the situation and hours of the afternoon and evening and of using also varies from state to state . immediate activity for the investigation. Crime Prevention II. DATA MINIG The causes for the growing rate of crime include Data Mining could be the computational procedure for unemployment, economic backwardness, over exploring patterns in large information sets involving population, illiteracy and inadequate equipment of the practices at the junction of synthetic intelligence, police force. The form of seriousness and size of the machine understanding, data, and repository programs. crime, may rely on the form of a society and thus its The entire goal of the info mining process is always to nature changes with the growth and development of the remove information from a information collection and social system. In every generation it has its own most convert it in to an understandable framework for further ______ISSN (Print): 2319-2526, Volume -3, Issue -4, 2014 37 International Journal on Advanced Computer Theory and Engineering (IJACTE) ______use. Besides the organic examination step, it requires B. Empty Grid Cells repository and information management aspects, Empty grid cells need to be taken from the datasets information pre-processing, product and inference because they have a detrimental yet counter instinctive considerations, interestingness metrics, complexity area effect. They enhance the efficiency of the considerations, post-processing of found structures, classifiers. It is simple for almost any given classifier to , and on the web updating. precisely estimate that nothing may happen in an empty The actual information mining job could be the grid cell. That ―intelligence‖ is really artificial. An automated or semi-automatic examination of large empty grid cell is defined as missing any rely for the amounts of information to remove previously unknown reason that cell in some of the investigated classes interesting patterns such as for instance categories of around the entire schedule being analyzed. Many empty information records (cluster examination), unusual grid cells have two explanations. One, the limits of the records (anomaly detection) and dependencies city aren't rectangular like the grid getting used is, and (association mining). That usually requires two, there are many places within the city limits such as applying repository methods such as spatial indices. for example airport runways, bodies of water, and These patterns will then be seen as a kind of overview of community start spaces wherever these activities only the feedback information, and may be used in further don't happen. The result is empty grid cells that have to examination, for instance, in machine understanding and be removed[5][6]. predictive analytics. For example, the info mining step 2. Handling Information may recognize multiple communities in the info, which will then be properly used to acquire more precise One challenge in offense prediction, just like different prediction effects by a choice support program. Neither unusual occasion prediction, is that locations and cool the information variety, information preparation, or places are unbalanced. That's cool places are a whole lot effect model and revealing are part of the information more widespread than hotspots. Inside our dataset, that mining step, but do fit in with the general KDD process is especially true with the bigger quality 41-by-40 grid. as additional steps[3][7]. This research paper contain the It has the consequence of puzzling the necessary following sections: Data Generation that describes the measures of detail, recall, and F1. In particular, the F1 data set ; Handling of information; techniques involved report of locations is far less than the F1 report of cool in Data Mining. places as the classifiers are properly qualified on cool spots. The computation on F1 report inside our examine 1. Data Generation is defined the following: The research data was gleaned from multiple cities F1= (2*precision*Recall)/ Precision + Recall agencies. Every real data entry is a record for an crime or related event. Each record contains the type of crime , Where, the location of crime in longitude and latitude, and time Precision = TP / (TP+FP) - date of the crime incident happened . Before beginning with data mining , a preprocessing is required to make it Recall = TP / (TP+FN ) suitable for classification. Where, A. Data Grid TP= predicts the true Hotspots i.e., no. Of true positives For the deployment of this crime prediction model the police-department requirement is to forecast the crime FP= predicts the false Hotspots i.e., no. Of false such as residential burglary over space and time. positives Accordingly, across a uniform grid the model classifies FN= predicts the false Coldspots i.e., no. Of false burglaries monthly. The city is divided into negatives checkerboard-like cells by the help of grid. Now each cell contain data combined into six categories namely To solve this matter, we adjust the weight of hotspots Arrest, Residential Burglary[4], Commercial Burglary , and cold spots. By raising the weight of hotspots on the Motor Vehicle Larceny and Street Robbery, basis of the proportion between hotspots and coldspots, Foreclosure. On a monthly basis each cell is populated. the information set may be balanced ahead of the The researched data was of two resolutions . The first classification process. The weight function is identified measure is 24-by-20 square grid cells and the other by these: measure is 41-by-40. The cells in the 24-by-20 grid measure distance is one-half mile square. In 41-by-40 grid, the distance measure is over one-quarter mile square. In both cases, data set is a matrix on monthly basis of the six earlier mentioned categories. The two where, resolutions as finer resolution make grid to be interrogated with more detail toward the inherent spatial C = Total number of coldspots and information in the dataset. Conversely, lower resolution H = total number of hotspots has effect of generalizing the spatial knowledge. ______ISSN (Print): 2319-2526, Volume -3, Issue -4, 2014 38 International Journal on Advanced Computer Theory and Engineering (IJACTE) ______

III. DATA MINING IN CRIME IV. CRIME DATA MINING TECHNIQUES Most law enforcement agencies today are faced with By increasing performance and lowering errors, offense large volume of data that must be processed and data mining practices can aid police function and permit transformed into useful information (Brown, 2003). investigators to spend their time to other useful tasks. A Data mining can greatly improve crime analysis and aid number of the practices are standard and some are in reducing and preventing crime. Brown (2003) stated currently in used .The flow graph of practices is show "no field is in greater need of data mining technology below that assist in showing the practice involved in than law enforcement." One potential area of application Crime Data Mining as follows: is spatial data mining tools which provides law FLOW CHART: enforcement agencies with significant capabilities to learn crime trends on where, how and why crimes are committed (Veenendaal and Houweling, 2003). Brown (2003) developed a spatial data mining tool known as the Regional Crime Analysis Program (ReCAP), which is designed to aid local police forces (e.g. University of Virginia (UVA), City of Charlottesville, and Albemarle County) in the analysis and prevention of crime. This system provides crime analysts with the capability to sift on data to catch criminals. It provides spatial, temporal, and attribute matching techniques for pattern extraction[10]. Data mining is just a powerful software that permits offender investigators who may possibly absence considerable training as data analysts to investigate big listings rapidly and efficiently[1]. Table 1explains some types of offense, such as for example traffic violations and arson, primarily problem police at the town, district, and state levels. Table 1: Crime data national and international level

Entity Extraction determines unique styles from knowledge such as for example text, images, or sound materials[2]. FIt has been used to instantly identify individuals, addresses, vehicles, and particular faculties from police narrative reports. In pc forensics, the removal of pc software metrics including the information design, program movement, organization and level of remarks, and usage of variable names- can help more research by, for instance, group related applications published by hackers and searching their behavior. Entity Extraction gives simple information for crime analysis, but their performance depends greatly on the availability of extensive levels of clear insight data. Clustering methods group knowledge goods in to courses with related faculties to maximize or reduce intraclass similarity- for instance, to recognize suspects who perform violations in related methods or separate ______ISSN (Print): 2319-2526, Volume -3, Issue -4, 2014 39 International Journal on Advanced Computer Theory and Engineering (IJACTE) ______among groups belonging to different gangs. These and intangible things and information, and associations methods do not have some predefined courses for among these entities. More analysis can show important assigning items. Some experts utilize the -based roles and subgroups and vulnerabilities in the network. place to instantly connect different things This approach permits visualization of criminal such as for example individuals, companies, and networks, but investigators however mightn't manage to vehicles in crime records. Using link analysis methods uncover the network's true leaders when they hold a to recognize related transactions, the Financial Crimes reduced profile. Enforcement Network AI Program exploits Bank Similarity Measures :Whether two entities are similar is Secrecy Act knowledge to guide the detection and semantically dependent on application and is defined by analysis of money laundering and different economic the user[5]. There are different similarity measures for crimes. Clustering crime incidents can automate a major different types of data. For quantitative data, we can use part of crime analysis but is limited by the high Euclidian distance, Minkowski distance and other computational depth an average of required. measures to measure the similarity. For qualitative Association rule mining finds often occurring product attributes, a simple and commonly used approach is sets in a repository and gifts the styles as rules. That binary similarity measure. Suppose ai and bi are the method has been applied in system intrusion detection to values of the i-th attributes of A and B respectively. Let obtain association rules from consumers' connection si (A, B) denote the similarity on the i-th attribute history. Investigators can also use this method to system between A and B. si(A, B)=1 if ai=bi and 0 if ai≠bi. In criminals' users to help find possible future system this way, qualitative data can be converted into attacks. Much like association rule mining, consecutive quantitative data and some similarity measures for sample mining finds often occurring sequences of goods quantitative data can be used. If the sets have a weighted around some transactions that happened at different structure, the similarity is defined by taking into account times. In system intrusion detection, this approach can the values of weights wi: identify intrusion styles among time-stamped data. Featuring concealed styles benefits crime analysis, but to obtain significant results involves rich and very structured data. Deviation detection employs unique actions to study Now We look into current methodologies for crime data knowledge that varies markedly from the remaining mining, which are available in current crime data mining portion of the data. Also called outlier detection, literature. CRISP-DM methodology (CRISP-DM: Cross- investigators can use this method to fraud detection, Industry Standard Process for Data Mining) like system intrusion detection, and different crime analyses. SEMMA methodology (SEMMA: Sample, Explore, Nevertheless, such activities will often seem to be Modify, Model, Assess) refers to more general process standard, rendering it difficult to recognize outliers. of data mining. CIA intelligence methodology refers to life cycle of converting data into intelligence, which is Classification finds frequent attributes among different also a well-known methodology. Van der Hulst's crime entities and organizes them in to predefined methodology is specifically developed for criminal classes. That method has been used to recognize the networks, including specific steps for identifying and origin of e-mail spamming based on the sender's analysing criminal networks. Last but not the least, linguistic styles and structural features. Often used to AMP A(Actionable Mining and Predictive Analytics) predict crime tendencies, classification can lower the methodology is developed by McCue for better full time needed to recognize crime entities. understanding of crime data mining.Table2 include Nevertheless, the method requires a predefined details of available methodologies. classification scheme. Classification also involves fairly complete instruction and testing knowledge must be high degree of missing knowledge could restrict forecast accuracy. String comparator methods assess the textual fields in sets of repository files and compute the likeness between the records. These methods can find misleading information—such as for example name, handle, and Cultural Safety number-in criminal records. Investigators may use string comparators to analyze textual knowledge, but the methods often require intensive computation. Social system analysis explains the roles of and relationships among nodes in a conceptual network. Investigators can make use of this method to create a system that shows thieves roles, the movement of real ______ISSN (Print): 2319-2526, Volume -3, Issue -4, 2014 40 International Journal on Advanced Computer Theory and Engineering (IJACTE) ______

V. CONCLUSION: [5] D.E. Brown, S.C. Hagen. 2003. ―Data association methods with applications to law In this paper author presented the systematic method of enforcement. Decision Support Systems‖, 34 crime detection at national and international level. As (4): 369– 378. crime data is increasing now to control the crime is again become a difficult task so to solve this problem [6] Bao, H (2003). ―Knowledge Discovery And author is presenting a systematic way of mining crime Data Mining Techniques And Practice‖. data detection classification in such a way so that it’s http://www.netnam.vn/unescocourse/knowlegd become easy to solve the crime problem throughout the e/3-1.htm world. [7] George Kelling and Catherine Coles. Fixing REFERENCES Broken Windows: Restoring Order and Reducing Crime in Our Communities, ISBN: 0- [1] U.M. Fayyad and R. Uthurusamy, ―Evolving 684-83738-2. Data Mining into Solutions for Insights,‖ Comm. ACM, Aug. 2002, pp. 28-31. [8]. P. Chapman, J. Clinton, R. Kerber, T. Khabaza, T. Reinartz, C. Shearer, R. Wirth, "CRISP-DM [2] W. Chang et al., ―An International Perspective 1.0 step-by-step data mining guide", Technical on Fighting Cybercrime,‖ Proc. 1st NSF/NIJ report, The CRISP-DM Consortium, Symp. Intelligence and Security Informatics, http://www.crispdm. orglCRlSPWP-0800.pdf], LNCS 2665, Springer-Verlag, 2003, pp. 379- August 2000. 384. [9]. Thakur, C. (2003).‖ Crime Control‖, http:// [3] C. Morselli, inside Criminal Networks, New ncthakur. itgo.com /chand3c.htm York - USA, Springer Science+Business Media LLC.2009. [10]. S. Ruggieri, D. Pedreschi and F. Turini, ―Data mining for discrimination discovery‖. ACM [4] G. Wang, H. Chen, and H. Atabakhsh, Transactions on Knowledge Discovery and data ―Automatically Detecting Deceptive Criminal 4(2), Article 9, ACM,2010. Identities,‖ Comm. ACM, Mar. 2004, pp. 70- 76.



______ISSN (Print): 2319-2526, Volume -3, Issue -4, 2014 41