Data Mining with Cloud Computing: - an Overview
Total Page:16
File Type:pdf, Size:1020Kb
International Journal of Advanced Research in Computer Engineering & Technology (IJARCET) Volume 5 Issue 1, January 2016 Data Mining with Cloud Computing: - An Overview Miss. Rohini A. Dhote Dr. S. P. Deshpande Department of Computer Science & Information Technology HOD at Computer Science Technology Department, HVPM, COET Amravati, Maharashtra, India. HVPM, COET Amravati, Maharashtra, India Abstract: Data mining is a process of extracting competitive environment for data analysis. The cloud potentially useful information from data, so as to makes it possible for you to access your information improve the quality of information service. The from anywhere at any time. The cloud removes the integration of data mining techniques with Cloud need for you to be in the same physical location as computing allows the users to extract useful the hardware that stores your data. The use of cloud information from a data warehouse that reduces the computing is gaining popularity due to its mobility, costs of infrastructure and storage. Data mining has attracted a great deal of attention in the information huge availability and low cost. industry and in society as a whole in recent Years Wide availability of huge amounts of data and the imminent 2. DATA MINING need for turning such data into useful information and Data Mining involves the use of sophisticated data knowledge Market analysis, fraud detection, and and tool to discover previously unknown, valid customer retention, production control and science patterns and Relationship in large data sets. These exploration. Data mining can be viewed as a result of tools can include statistical models, mathematical the natural evolution of information technology. algorithms and machine learning methods Security and privacy of user’s data is a big concern (algorithms that improve their performance when data mining is used with cloud computing. This paper introduces the basic concept of cloud computing automatically through experience, such as neural and data mining firstly, and sketches out how data network or decision trees). Consequently, data mining is used in cloud computing; mining consist of more than collecting and managing data; it also includes analysis and prediction. Data Keywords: Cloud Computing, Data mining, data mining mining is becoming common in both the private and in cloud computing public sectors. Industries such as banking, insurance, medicine, and retailing commonly used data mining 1. INTRODUCTION to reduce cost, enhance research, and increase sales. We are in an age often referred to as the information In public sector, data mining applications initially age. In this information age, because we believe that were used as a means to detect fraud and waste, but information leads to power and success, and thanks to have grown to also be used for purpose such as sophisticated technologies such as computers, measuring and improving program performance. The satellites, etc., we have been collecting tremendous main objective of Data Mining is to find patterns amounts of information. Initially, with the advent of automatically with minimal user input and efforts. computers and means for mass digital storage, we Data Mining is a powerful tool capable of handling started collecting and storing all sorts of data, decision making and for forecasting Future trends of counting on the power of computers to help sort market. Data Mining tools and techniques can be through this amalgam of information. Unfortunately, successfully applied in various fields in various these massive collections of data stored on disparate forms. structures very rapidly became overwhelming. This Data Mining, also popularly known as initial chaos has led to the creation of structured Knowledge Discovery in Databases (KDD), refers to databases and database management systems the nontrivial extraction of implicit, previously (DBMS). Today, we have far more information than unknown and potentially useful information from we can handle: from business transactions and data in databases. While data mining and knowledge scientific data, to satellite pictures, text reports and discovery in databases (or KDD) are frequently military intelligence. treated as synonyms, data mining is actually part of Data mining has been an effective tool to the knowledge discovery process. The following analyze data from different angles and getting useful figure shows data mining as a step in an iterative information from data. Classification of data, knowledge discovery process. categorization of data, and to find correlation of data The Knowledge Discovery in Databases patterns from the dataset. Many Organizations now process comprises of a few steps leading from raw start using Data Mining as a tool, to deal with the ISSN: 2278 – 1323 All Rights Reserved © 2015 IJARCET 211 International Journal of Advanced Research in Computer Engineering & Technology (IJARCET) Volume 5 Issue 1, January 2016 data collections to some form of new knowledge. The terms referring to data mining are: data dredging, iterative process consists of the following steps: knowledge extraction and pattern discovery. Data cleaning: also known as data cleansing, it is a phase in which noise data and irrelevant data are removed from the collection. Data integration: at this stage, multiple data sources, often heterogeneous, may be combined in a common source. Data selection: at this step, the data relevant to the analysis is decided on and retrieved from the data collection. Data transformation: also known as data consolidation, it is a phase in which the selected data is transformed into forms appropriate for the mining procedure. Data mining: it is the crucial step in which clever techniques are applied to extract patterns potentially useful. Pattern evaluation: in this step, strictly interesting patterns representing knowledge are identified based on given measures. Figure 2.1 KDD Process Knowledge representation: is the final phase in which the discovered knowledge is visually 2.1 Data Mining Techniques represented to the user. This essential step uses visualization techniques to help users understand and Characterization: Data characterization is a interpret the data mining results. summarization of general features of objects in a It is common to combine some of these steps target class, and produces what is called together. For instance, data cleaning and data characteristic rules. The data relevant to a user- integration can be performed together as a pre- specified class are normally retrieved by a database processing phase to generate a data warehouse. Data query and run through a summarization module to selection and data transformation can also be extract the essence of the data at different levels of combined where the consolidation of the data is the abstractions. result of the selection, or, as for the case of data Discrimination: Data discrimination produces what warehouses, the selection is done on transformed are called discriminate rules and is basically the data. comparison of the general features of objects between The KDD is an iterative process. Once the two classes referred to as the target class and the discovered knowledge is presented to the user, the contrasting class. evaluation measures can be enhanced, the mining can Association analysis: Association analysis is the be further refined, new data can be selected or further discovery of what are commonly called association transformed, or new data sources can be integrated, rules. It studies the frequency of items occurring in order to get different, more appropriate results. together in transactional databases, and based on a Data mining derives its name from the threshold called support, similarities between searching for valuable Identifies the frequent item sets. Another threshold, information in a large database and mining rocks for confidence, which is the conditional probability than a vein of valuable ore. Both imply either sifting an item appears in a transaction when another item through a large amount of material or ingeniously appears, is used to pinpoint association rules. probing the material to exactly pinpoint where the Classification: Classification analysis is the values reside. It is, however, a misnomer, since organization of data in given classes. Also known as mining for gold in rocks is usually called “gold supervised classification, the classification uses given mining” and not “rock mining”, thus by analogy, data class labels to order the objects in the data collection. mining should have been called “knowledge mining” Prediction: Prediction has attracted considerable instead. Nevertheless, data mining became the attention given the potential implications of accepted customary term, and very rapidly a trend successful forecasting in a business context. There that even overshadowed more general terms such as are two major types of predictions: one can either try knowledge discovery in databases (KDD) that to predict some unavailable data values or pending describe a more complete process. Other similar ISSN: 2278 – 1323 All Rights Reserved © 2015 IJARCET 212 International Journal of Advanced Research in Computer Engineering & Technology (IJARCET) Volume 5 Issue 1, January 2016 trends, or predict a class label for some data. The Community cloud, the service is shared by several latter is tied to classification. organizations and made available only to those Clustering: Useful for exploring data and finding groups. The infrastructure may be owned and natural groupings.