Predictive Modelling Analytics Through Data Mining
Total Page:16
File Type:pdf, Size:1020Kb
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056 Volume: 04 Issue: 09 | Sep -2017 www.irjet.net p-ISSN: 2395-0072 Predictive Modelling Analytics through Data Mining Lakshay Swani1, Prakita Tyagi2 1 Software Engineer, Globallogic Pvt. Ltd, Sector-144, Noida, UP, INDIA 2 Software Engineer, Infosys, Bengaluru, INDIA ---------------------------------------------------------------------***--------------------------------------------------------------------- Abstract – With this paper, we try to explore Predictive 1.1 Data Mining Analytics through a combination of various Data Mining techniques over Big Data. The survey provides an indication Data mining is the process of discovering patterns and of acceleration in the area of Predictive Analytics for the anomalies form large sets of data through the application of enhancement of businesses and researchers by applying statistics, machine learning and database systems [17]. The business intelligence for providing forecasting ability to objective of data mining is to extract the useful data from data proceed through the development of business. Also, the set and process them to obtain information that has an paper portrays the incorporation of the characteristics of Big understandable structure. It includes provision for Data as the base of Data Mining through the application of automated and semi-automated analysis of large quantities of Apache Hadoop in achieving the above. Predictive analytics data set to extract recognizable patterns, dependencies uses many techniques from data mining, statistics, modeling, between the data, groups or clusters of records and the machine learning, and artificial intelligence to analyze relation between these records as well as the anomalies current data to make predictions about future. It is occurring within the system. Data mining could be applied to considered as the next frontier for innovation. identify multiple groups and clusters of data and forward them for further analysis through machine learning or KEYWORDS: Predictive Analytics, Data Mining, Big Data, predictive analytics. Hadoop, Machine Learning Data mining commonly involves the following 6 tasks 1. INTRODUCTION 1. Anomaly Detection - In data mining, anomaly With the ever increasing online content generated at an detection is the detection of events, items or accelerated rate, petabytes of data is produced each day. The observations that do not conform to a pattern that is explosion in the user generated data from the social media expected or to an established normal behavior and businesses and organizations, there has always been a 2. Association Rule Learning – This involves search for effective management and storage of the enormous discovering and describing the relationship data created. Examples of the above fact includes an between the various variables present within the approximation of over 7 terabytes of data generated each day data set. It is intended to identify and discover by twitter, over 25 petabytes of data being handled by Google extensive rules within the data in the databases. and over 500 terabytes of data being produced each day at This information could be useful in undertaking Facebook [6]. Such extensive data might mean nothing when decisions before forth going any dilemma that could unorganized but when organized into recognizable structure, be an integral part of future perspective of the it could be extremely useful in analysis and future prediction business of the organization [16]. of growth strategies for an organization [28]. This gave rise to 3. Clustering – Clustering involves identifying the prediction domain through data mining [29]. What is similarities between the various objects of the data required to obtain the information from the raw data, is set and abstracting them into classes of similar highly computational systems that could process the raw data objects with similar properties. It involves to convert it to meaningful data that could be recognized and discovering structures and groups in the data have analyzed. As the enormous data that is obtained online is similar properties or are similar to each other in unstructured and unorganized, the need arise to store, distinction, without using the available structures structure and process this available raw data in a cost present in the data set. effective and efficient way. Traditionally, the IT industry dint 4. Classification – It includes the process of creating had the capabilities to handle such large sum of data. But with generic or generalized known structures and drastic excellence shown in the IT industry, various ways and properties and apply them to the data that has been tools have been devised for effective and efficient storage and clustered. processing of the data available [30][31]. Apache Hadoop is 5. Regression – It involves development of a function one such tool that has been emerging as one of the most that is responsible for predicting various properties successful in handing and managing big unorganized data [4]. and factors using regression techniques. © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 5 International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056 Volume: 04 Issue: 09 | Sep -2017 www.irjet.net p-ISSN: 2395-0072 6. Summarization – This task involves and revolves leading to a well thought and effective marketing in around development of a compact representation of turn making the business bloom. data set, including generation of report and visualization. 3. Business Future Perspective – Big Data analysis can 1.2 Need for Data Analytics inevitably predict the future of the business keeping in mind the current scenarios. It helps the business In all areas of business, a well-structured actionable data in taking effective measurements that could has endless benefits. With the high volume, variety and possibly lead to the better future of the velocity of data that is being input to an organization, organization. With proper analysis, decision making analysing the data could prove to be extremely useful for the abilities are provided to the organization and could aspect of the business. We provide the following key reasons lead to a perspective growth of its business. With highlighting the need of Big Data and strategic analysis the market trend changing rapidly, the trend of the within an organization referring to how it could help with changes that would be of impact to the business can the business growth. be predicted efficiently and measures could be taken in handling those impacts. With strategic 1. Smarter Organizations – Through a well thought out analysis, certain crucial decision points could be analytics strategy, an organization work more handled within an organization so as to achieve a efficiently and smartly in achieving its goals. For foreseen goal of the organization. example, through thorough analysis of data patterns within a police department, it could be susceptible 2. Data Mining Techniques for them to identify the crime scenes and hotspots and in turn help the department to work efficiently 1. Association – Association is one of the most familiar in solving and preventing crimes from happening and most commonly known technique of data around the globe. In medical industry, proper mining. In association, the similarities between analysis over the disease pattern over an area or different objects is identified for recurring patterns group of people could help the doctors to effectively and a correlation is established between the objects. devise and predict the possibilities of a disease. In The objects are generally of the same type so as to other case, weather forecasting industry has the define the correlation among them to identify the analytics over the weather patterns as its base that patterns. This could be exhibited through an helps the organization with proper and accurate example of a customer who always bought weather predictions. Thus we see, ranging from chocolates along with milk. The milk in this case gets criminal justice to real estate to health care to associated with chocolates and thus a pattern is weather forecasting, Big Data analytics in being emerged that provides information about the sale of leveraged to provide effective and efficient milk and chocolates and it suggests that the next outcomes. time the person buys the chocolate, he’ll be buying milk as well. 2. Behavioural Marketing – The base of a successful 2. Classification – Classification, as the name suggests business or organization is its reach to the targeted is the process of identifying an object by relating consumers. With effective and efficient marketing with it, multiple attributes that describe that object strategies, a business could bloom to its highest thoroughly. Classification is used to build up the levels. With the marketing for the business reaching reference or idea of an object by describing it using to its desired public, proper strategic analysis needs multiple attributes to define its particular class. Let to be done so as to make the business reach to the us try to understand the same though an example of target audience. This is where the behavioural classification of cars. Given a car as an object, we can marketing comes into play. Consumers are targeted identify or classify it through multiple attributes on the basis of the websites they consume or such as shape, number of seats, transition type etc. searches for a commodity.