Data Mining in Cloud Computing
Total Page:16
File Type:pdf, Size:1020Kb
Database Systems Journal vol. III, no. 3/2012 67 Data mining in Cloud Computing Ruxandra-Ştefania PETRE Bucharest Academy of Economic Studies [email protected] This paper describes how data mining is used in cloud computing. Data Mining is used for extracting potentially useful information from raw data. The integration of data mining techniques into normal day-to-day activities has become common place. Every day people are confronted with targeted advertising, and data mining techniques help businesses to become more efficient by reducing costs. Data mining techniques and applications are very much needed in the cloud computing paradigm. The implementation of data mining techniques through Cloud computing will allow the users to retrieve meaningful information from virtually integrated data warehouse that reduces the costs of infrastructure and storage. Keywords: Cloud Computing, Data mining Introduction engineering, spatial data etc. The Internet is becoming an The emerging Cloud Computing trends 1in creasingly vital tool in our everyday provides for its users the unique benefit of life, both professional and personal, as its unprecedented access to valuable data that users are becoming more numerous. can be turned into valuable insight that can It is not surprising that business is help them achieve their business objectives. increasingly conducted over the Internet. Perhaps one of the most revolutionary 2 Some aspects regarding Cloud concepts of recent years is Cloud Computing Computing. Cloud computing represents both the The Cloud, as it is often referred to, software and the hardware delivered as involves using computing resources – services over the Internet. hardware and software – that are Cloud Computing is a new concept that delivered as a service over the Internet defines the use of computing as a utility, that (shown as a cloud in most IT diagrams). has recently attracted significant attention. Many companies are choosing as an In Figure 1 below it is illustrated the alternative to building their own IT computing paradigm shift on the last half infrastructure to host databases or century through six distinct phases: [1] software, having a third party to host • Phase 1: people used terminals to them on its large servers, so the company connect to powerful mainframes would have access to its data and shared by many users. software over the Internet. • Phase 2: stand-alone personal The use of Cloud Computing is gaining computers became powerful enough popularity due to its mobility, huge to satisfy users’ daily work. availability and low cost. On the other • Phase 3: computer networks allowed hand it brings more threats to the security multiple computers to connect to of the company’s data and information. each other. At an equally significant extent in recent • Phase 4: local networks could years, data mining techniques have connect to other local networks to evolved and became more used, establish a more global network. discovering knowledge in databases • Phase 5: the electronic grid becoming increasingly vital in various facilitated shared computing power fields: business, medicine, science and and storage resources. 68 Data mining in Cloud Computing • Phase 6: Cloud Computing allows Table 1 Top Cloud Computing Companies the exploitation of all available and Key Features [3] resources on the Internet in a Cloud Name Key Feature scalable and simple way. Sun More available Microsystems application than any Sun Cloud other open OS. IBM Integrated power Dynamic management to help you Infrastructure plan, predict, monitor and actively manage power consumption of your BladeCenter servers. Amazon EC2 Designed to make web- Figure 1. Computing paradigm shift of scale computing easier the last half century [1] for developers. Google App No limit to the free trial As it is defined by the National Institute Engine period if you do not of Standards and Technology, “Cloud exceed the quota computing is a model for enabling allotted. ubiquitous, convenient, on-demand Microsoft Currently offering a network access to a shared pool of Azure “development configurable computing resources (e.g., accelerator” discount networks, servers, storage, applications, plan. 15-30 % discount and services) that can be rapidly off consumption charges provisioned and released with minimal for first 6 months. management effort or service provider AT&T Use fully on-demand interaction. Synaptic infrastructure or combine This cloud model is composed of five Hosting it with dedicated essential characteristics, three service components to meet models, and four deployment models.” [2] specialized requirements. The essential characteristics of cloud GoGrid Cloud Free load balancing and computing are on-demand self-service, Computing free 24/7 support. broad network access, resource pooling, Salesforce Offers cloud solutions rapid elasticity and measured service. for automation, customer The service models that compose cloud service and platform, computing are Software as a Service respectively. (SaaS), Platform as a Service (PaaS) and Transparency through Infrastructure as a Service (IaaS). real-time information on The deployment models of cloud system performance and computing are private cloud, community security at cloud, public cloud and hybrid cloud. trust.salesforce.com. Table 1 presents details on the top cloud computing companies and their products Cloud computing represents all possible key features: resources on the Internet, offering infinite computing power. As cloud computing is becoming a more significant technology trend, it could reshape the IT sector and the IT marketplace. Database Systems Journal vol. III, no. 3/2012 69 3 Some aspects regarding Data mining Regression Technique for predicting Data mining represents finding useful a continuous numerical patterns or trends through large amounts outcome such a customer of data. lifetime value, house Data mining is defined as a “type of value, process yield database analysis that attempts to rates. discover useful patterns or relationships Attribute Ranks attributes in a group of data. The analysis uses Importance according to strength of advanced statistical methods, such as relationship with target cluster analysis, and sometimes employs attribute. Use cases artificial intelligence or neural network include finding factors techniques. A major goal of data mining most associated with is to discover previously unknown customers who respond relationships among the data, especially to an offer, factors most when the data come from different associated with healthy databases.” [4] patients. The most important data mining Anomaly Identifies unusual or techniques and their description are Detection suspicious cases based presented in table 2 below: on deviation from the norm. Common Table 2 Data mining techniques [5] examples include health Cloud Name Key Feature care fraud, expense report fraud, and tax Clustering Useful for exploring data compliance. and finding natural Feature Produces new attributes groupings. Members of a Extraction as linear combination of cluster are more like existing attributes. each other than they are Applicable for text data, like members of a latent semantic analysis, different cluster. data compression, data Common examples decomposition and include finding new projection, and pattern customer segments and recognition. life sciences discovery. Classification Most commonly used Considering the varied data mining technique for predicting techniques and the great need for a specific outcome such discovering patterns and trends in data that as response / no- would lead to knowledge that could not be response, high / medium obtained otherwise, it’s no wonder that data / low value customer, mining is used in the most varies field of likely to buy / not buy. activity. Association Find rules associated “Data mining, the extraction of hidden with frequently co- predictive information from large databases, occurring items, used for is a powerful new technology with great market basket analysis, potential to help companies focus on the cross-sell, root cause most important information in their data analysis. Useful for warehouses. Data mining tools predict future product bundling, in- trends and behaviors, allowing businesses to store placement, and make proactive, knowledge-driven defect analysis. decisions. 70 Data mining in Cloud Computing The automated, prospective analyses • the customer only pays for the data offered by data mining move beyond the mining tools that he needs – that analyses of past events provided by reduces his costs since he doesn’t retrospective tools typical of decision have to pay for complex data mining support systems.” [6] suites that he is not using exhaustive; Businesses can make predictions about • the customer doesn’t have to how well a product will sell or develop maintain a hardware infrastructure, new advertising campaigns by using as he can apply data mining through these new relationships reflected by the a browser – this means that he has to data mining algorithms. pay only the costs that are generated The medical sector benefits from the data by using Cloud computing. mining techniques, as well as the Using data mining through Cloud computing geographical data being better analyzed reduces the barriers that keep small by using data mining. companies from benefiting of the data Governments can discern illegal or mining instruments. embargoed activities done by individuals, “Cloud Computing denotes the new trend in